Excision of an Active CACTA-Like Transposable Element From DFR2 Causes Variegated Flowers in Soybean [Glycine max (L.) Merr.]

Active endogenous transposable elements, useful tools for gene isolation, have not been reported from any legume species. An active transposable element was suggested to reside in the W4 locus that governs flower color in soybean. Through biochemical and molecular analyses of several revertants of the w4-m allele, we have shown that the W4 locus encodes dihydroflavonol-4-reductase 2 (DFR2). w4-m has arisen through insertion of Tgm9, a 20,548-bp CACTA-like transposable element, into the second intron of DFR2. Tgm9 showed high nucleic acid sequence identity to Tgmt*. Its 5′ and 3′ terminal inverted repeats start with conserved CACTA sequence. The 3′ subterminal region is highly repetitive. Tgm9 carries TNP1- and TNP2-like transposase genes that are expressed in the mutable line, T322 (w4-m). The element excises at a high frequency from both somatic and germinal tissues. Following excision, reinsertions of Tgm9 into the DFR2 promoter generated novel stable alleles, w4-dp (dilute purple flowers) and w4-p (pale flowers). We hypothesize that the element is fractured during transposition, and truncated versions of the element in new insertion sites cause stable mutations. The highly active endogenous transposon, Tgm9, should facilitate genomics studies specifically that relate to legume biology.

I N soybean [Glycine max (L.) Merr.], five loci W1, W3, W4, Wm, and Wp control the pigmentations in flowers and hypocotyls ). Soybean plants with genotype W1_ w3w3 W4_ Wm_ Wp_ produce wild-type purple flowers ( Figure 1) and purple hypocotyls. Mutations at the W4 locus in the W1_ background result in altered pigment accumulation patterns in petals and reduced levels of purple pigments in flowers and hypocotyls. Four mutant alleles, w4, w4-m, w4-dp, and w4-p have been mapped to this locus. The w4 allele represents a spontaneous mutation, which produces near-white flowers ( Figure 1) and green hypocotyls (Hartwig and Hinson 1962;Groose and Palmer 1991). The w4-m allele was identified from a cross between two experimental breeding lines with white and purple flowers, respectively (Palmer et al. 1989;Weigelt et al. 1990). w4-m is characterized by variegated flowers (Figure 1) and green hypocotyls with purple sectors (Groose et al. 1988).
w4-m has been proposed to harbor a class II transposable element (Palmer et al. 1989). Presumably, somatic excision of the putative transposable element results in the variegated (Groose et al. 1988) and germinal excision wild-type phenotypes, purple flowers and purple pigments on hypocotyls (Palmer et al. 1989;Groose et al. 1990). The mutable line carrying w4-m undergoes germinal reversion at a very high frequency, about 6% per generation ). Approximately 1% of the progeny derived from germinal revertants contain new mutations in unlinked loci, presumably resulting from reinsertion of the element (Palmer et al. 1989). For example, female partial-sterile 1 (Fsp1), female partial-sterile 2 (Fsp2), female partialsterile 3 (Fsp3), and female partial-sterile 4 (Fsp4) were isolated from progenies of germinal revertants with purple flowers and were mapped to molecular linkage groups (MLG) C2, A2, F, and G, respectively (Kato and Palmer 2004). Similarly, 36 male-sterile, female-sterile mutants mapped to the st8 region on MLG J (Kato and Palmer 2003;Palmer et al. 2008a), 24 necrotic root (rn) mutants mapped to the rn locus on MLG G (Palmer et al. 2008b), and three Mdh1-n y20 mutants, mapped to a chromosomal region on MLG H (Palmer et al. 1989;Xu and Palmer 2005b), were isolated among progenies of germinal revertants.
In addition to germinal revertants with purple flowers, the w4 mutable line also generated intermediate stable revertants that produce flowers with variable pigment intensities ranging from purple to near-white ( Figure 1). Two stable intermediate revertants, w4-dp and w4-p, are allelic to W4. Plants carrying w4-dp or w4-p alleles produce dilute purple flowers or pale flowers, respectively ( Figure 1) (Palmer and Groose 1993;Xu and Palmer 2005a).

MATERIALS AND METHODS
Primers and probes: All the primers and probes used in this study are listed in supporting information, Table S1 and Table  S2, respectively.
Plant materials: Soybean lines differing for W4 alleles were planted at the Bruner Farm, the United States Department of Agriculture (USDA) greenhouse or growth cabinet, Iowa State University (Ames, IA). Their genotypes and phenotypes are described in Table 1. For analyses of anthocyanins, flavonols, and RNAs, petals were collected from floral buds 1 day before anthesis. For DNA analyses, genomic DNA was extracted from young leaves.
Extraction and analysis of anthocyanins: To extract anthocyanin pigments, freeze-dried flower petals were incubated in 1% (v/v) HCl in methanol for 3 hr at room temperature and centrifuged at 13,000 rpm for 10 min. Half of the supernatants was used for spectrophotometric analysis in a Beckman DU 640 nucleic acid and protein analyzer. The other half was hydrolyzed by boiling for 30 min. Hydrolyzed extracts were subjected to spectrophotometric analyses. The anthocyanidin contents were expressed as the absorbance at 535 nm (A 535 ) per milligram of dried petals per milliliter of solvent.
High performance liquid chromatography (HPLC) analysis of flavonols: The flavonol aglycone samples of soybean flowers and authentic standard solutions of myricetin, quercetin, and kaempferol (Sigma, St. Louis, MO) were prepared according to Burbulis et al. (1996), and stored at À20°. Samples (100 ml) were injected into a C-18 RP column attached to a Waters gradient HPLC system (Millipore, Billerica, MA) and eluted at a flow rate of 1.0 ml/min using the following linear gradient of HPLC-grade acetonitrile in HPLC-grade H 2 O (pH 3.0, adjusted with glacial acetic acid): 0 to 0% for 5 min, 0 to 10% for 5 min, 10 to 30% for 60 min, 30 to 100% for 5 min, 100 to 100% for 2 min, 100 to 0%, for 2 min, and 0 to 0% for 5 min. The system was run and data were acquired using Waters Millennium software, version 3.2. Elutents were analyzed by a photodiode array 996 detector (PDA996) at 255 nm and quantified by comparing to authentic standards.
RNA preparation, RT-PCR, and RNA blot analysis: Total RNA was prepared from immature petals using RNeasy mini kit (QIAGEN, Valencia, CA). cDNAs were synthesized from 2 mg total RNAs using oligo-dT and SuperScript II reverse transcriptase (Invitrogen, Carlsbad, CA) and diluted twofold for PCR. Primers for PCR are listed in Table S1. For RNA blot analyses, 20 mg total RNAs was separated on a 1.0% formaldehyde-agarose gel and blotted onto a Zeta Probe Nylon membrane (BioRad, Hercules, CA) by capillary transfer.
DNA preparation and DNA blot analysis: Genomic DNA was extracted from young leaves by following the CTAB method (Keim et al. 1988), purified with equal volumes of phenol, phenol/chloroform (1:1/v:v), and chloroform (Sambrook et al. 1989). For DNA blot analysis, 10 mg genomic DNA was digested with desired restriction enzymes and separated on a 0.8% (w/v) agarose gel. DNA blot analysis was conducted as previously described (Sambrook et al. 1989).
BAC library screening and W4 gene cloning: A BAC library (Bhattacharyya et al. 2005) was screened using a partial DFR cDNA probe (Table S2). Positive clones were confirmed by DNA blot analysis. Sequence of the full-length DFR2 gene was obtained through primer walking sequencing method. The BAC DNA for sequencing was extracted using the QIAGEN large constructs miniprep kit. Genomic library construction and screening: Two genomic libraries were constructed in the Lambda FIXII/XhoI vector (Stratagene, La Jolla, CA) using the DNA prepared from leaves of the T322 line homozygous for w4-m ). The DNA from the libraries was transferred to 137-mm nitrocellulose disks (Stratagene) (Sambrook et al. 1989). Aproximately 0.4 million plaques of the first library and 1.5 million plaques of the second library were screened with a DFR2 cDNA fragment (Table S2). Positive clones were confirmed by Southern blot analysis, PCR, and sequencing. The lambda DNA for sequencing was extracted using the QIAGEN Lambda Midi kit.
DNA sequencing and sequence analysis: All the sequencing projects were conducted in an ABI 3730 DNA analyzer at the Iowa State University DNA facility. The local alignments were performed using BLAST (bl2seq) from NCBI (http:/ / www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi). The global alignments and multiple alignments were conducted using ClustalW2 from EBI (www.ebi.ac.uk/clustalw2). Gene prediction was performed with GENSCAN (http:/ /genes.mit. edu/GENSCAN.html). Polypeptide sequences were deduced from the DNA sequence using ExPASy translate tool (http:/ / ca.expasy.org/tools/dna.html). Conserved domains in protein were searched with CDS program of NCBI (http:/ / www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi).

RESULTS
The w4 mutation blocks conversion of dihydromyricetin to delphinidin-3-monoglucoside: The anthocyanins and flavonols present in flowers of four soybean lines, Harosoy (W4), T322 (w4-m), T321 (w4-dp), and T369 (w4-p), were investigated (Table 1). Anthocyanin extracts showed the maximum absorption peak at 535 nm with l max 450-650 nm ( Figure S2). The peak shifted to 543 nm when the extracts were hydrolyzed by boiling ( Figure S2). These spectral characteristics suggested that the main pigment in soybean flowers could be delphinidin-3-monoglucoside or its derivatives, petunidin-3-monoglucoside, and malvidin-3-monoglucoside (Harborne 1958), similar to the main pigment malvidin in soybean hypocotyls, stem, and subepidermal tissues (Nozzolillo 1973). Malvidin is generated through glycosylation and methylation of delphinidin. The anthocyanin contents in flower petal samples were investigated at 535 nm. The highest anthocyanin level was observed in wild-type purple petals (Harosoy) and purple petal sectors of T322, followed by pale flowers (T369) and dilute purple flowers (T321). The lowest anthocyanin content was observed in white petal sectors of T322 ( Figure 2A).
Delphinidin-3-monoglucoside or its derivatives are believed to be the main pigments in soybean flowers. The flavonol myricetin is synthesized from a precursor of delphinidin-3-monoglucoside, dihydromyricetin by the enzyme flavonol synthase (FLS; Figure S1). HPLC analyses revealed enhanced accumulation of myricetin in petals of T321 and T369, and white petal sectors of T322 ( Figure 2B) that showed less anthocyanin pigment accumulation ( Figure 2B). These results suggested that the lesion in w4 mutants is from dihydromyricetin to delphinidin-3-monoglucoside ( Figure S1).
Mutations in the W4 locus were associated with reduced DFR2 transcript levels: We analyzed the w4 mutants for steady state transcript levels of three structural genes, F3H, DFR, and ANS ( Figure S1). The probes for F3H and ANS were a cDNA fragment (BM093886) and an RT-PCR product (Table S1 and  Table S2), respectively. Results showed that steady state transcript levels of F3H and ANS were comparable among the soybean lines ( Figure 2C). The probe for DFR was an RT-PCR product generated from immature flowers using primers DFR1F and DFR1R (Table S1 and  Table S2). It encoded a protein named DFR2 that showed 81% amino acid identity to DFR1 (AF167556). The steady state DFR2 transcript level was highest in wild-type Harosoy (W4) and the purple petal sectors of T322, reduced in T369 (w4-p) and T321 (w4-dp), and undetectable in the white petal sectors of T322 ( Figure  2C). Similar results for steady state DFR2 transcript levels were observed in RT-PCR analyses ( Figure 2D). These data suggested that reduced levels of anthocyanin pigments in w4 mutants ( Figure 2A) were the results of lower DFR2 expression levels.
In DFR2, the $1.7-kb HindIII fragment includes the promoter, exon I, a part of exon II, and an EcoRI site ( Figure 3A). Since no DFR2-specific polymorphisms for EcoRI-digested DNA was observed among wild-type and T321 (w4-dp) or T369 (w4-p) lines ( Figure 3B), the aberrations in these two mutants should reside in the $1.2-kb HindIII-EcoRI fragment containing the upstream promoter ( Figure 3A). These results showed that dfr2 mutations were generated from insertions among the w4 alleles, and therefore W4 most likely encodes DFR2 (Figure 3, B and C).
The insertion in DFR2 intron II is a CACTA-like element Tgm9: Southern analyses suggested that an insertion was located between DFR2 exon II and VI in T322 (w4-m) (Figure 3). We isolated a 1357-bp insertion in DFR2 intron II, 438 bp downstream of the exon II/ intron II junction ( Figure 4A). The insertion harbors a HindIII site at the 39 end, which led to generation of an $2.3-kb HindIII-PstI fragment for the w4-m allele when the DNA blot was hybridized with the DFR39 probe ( Figure 3C).
The inserted element generated a 3-bp (AAT) target site duplication (TSD), similar to TSD generated by CACTA-type transposons (Pereira et al. 1986;Rhodes and Vodkin 1988;Nacken et al. 1991;Inagaki et al. 1994) and contained structures similar to the 39 end of the CACTA elements. It carried a 30-bp terminal inverted repeat (TIR) starting with 59-CACTA-39 similar to the ones in other soybean Tgm elements (Table S4) and a 700-bp highly repetitive region in the subterminal repeat (STR) region next to 39-TIR (Figure 4). It does not contain other structures such as 59 end TIR and transposase gene(s), suggesting that it is a truncated version of a transposable element, most likely generated from an imperfect excision of the entire element. We named the entire element Tgm9.
To clone the entire Tgm9, we constructed and screened a genomic library carrying $20 genome equivalents DNA prepared from T322 that showed high levels of both somatic excision and germinal reversion. Two nonoverlapping plaques, 16 and 25 carrying 59 and 39 ends of Tgm9, respectively, were sequenced ( Figure S4 A). By conducting a long range PCR and then a sub PCR (Figure S4 B), a 19-bp (GTTTTGTTGATCATTTACA) missing Tgm9 sequence between the two adjacent ends of clones 16 and 25 was obtained (Figure S4 A). Tgm9 was 20,548 bp (GQ344503). It contained 59-and 39-TIR starting with 59-CACTA-39, and transposase genes (Figure 4B).
The truncated Tgm9 element was identical to the 39 end of Tgm9 except for a novel 26-nt sequence (59-ATTACGTACCATTCAGTGAAATCACG-39), which with its downstream 17-nt sequence (59-TACCATTCAGT GAAATC-39) formed two 20-bp tandem direct repeats (59-ACGTACCATTCAGTGAAATC-39) at the 59 end of the truncated element ( Figure 4B). We were able to PCR amplify the truncated element from T322. Therefore, the truncated element most likely arose from imprecise excision of the element. The novel 26-nt sequence was presumably generated through slipped mispairing accompanied by intragenic recombination and deletion as has been documented for generation of a direct repeat (Tavassoli et al. 1999).
Alternate splicing generated transposase transcripts in Tgm9: Zabala and Vodkin (2008) identified 24 exons from the Tgmt* element. All these exons were found in Tgm9 ( Figure 4B, exons VI-XXVII) and their expression was detected by RT-PCR in T322. The exons contained two open reading frames (ORF), ORF1 and ORF2 ( Figure 4B). By conducting rapid amplification of 59 complementary DNA ends (59-RACE), we were able to identify three additional exons (exons I, II, and III) at the 59 end of the transcripts ( Figure 4B). RT-PCR experiments revealed four types of transposase transcripts, t1-t4 ( Figure 5).
W4 encodes DFR2: To determine whether the variegated flower phenotype is caused by excision of Tgm9 from DFR2, we investigated .320 progenies of 21 families descended from a single T322 progenitor for hypocotyls and flower colors in greenhouse ( Figure 7A). Nine families carried at least some progenies that were either germinal (purple hypocotyls and flowers) or somatic revertants (variegated flowers and purple sectors on hypocotyls). Six other families produced at least some progenies that showed somatic excisions. The average rates of germinal reversion and somatic excisions were 4 and 25%, respectively ( Figure 7B), which were comparable to earlier estimates . A larger proportion (.70%) of the progenies had only white flowers. Imprecise excision of Tgm9 leading to truncated element ( Figure 4B) in the target site could be one of the reasons for generation of high proportions of progenies with white flowers ( Figure  7A).
We sequenced Tgm9 insertion sites of independent germinal revertants with purple flowers and observed distinct footprints among the independent germinal revertants ( Figure S5). These results confirmed that excision of Tgm9 from the DFR2 intron II resulted in the expression of DFR2, and thereby, gain of purple flower phenotype. Therefore, W4 encodes DFR2 and somatic excision of the element results in variegated flower phenotype.
w4-dp and w4-p alleles were generated from reinsertion of Tgm9 into the DFR2 promoter: T321 (w4-dp) and T369 (w4-p) mutants were descended from T322 (w4-m). Sequencing of the Tgm9 insertion site confirmed that Tgm9 was excised from DFR2 in both mutants and left behind 4-and 0 (precise excision)-bp footprints in T321 and T369, respectively ( Figure S5). The 944-bp insertion (EU068463) in T321 was amplified using primers DFR4S and DFR4R (Table S1). It is identical to the 59 end of Tgm9. Two nucleotides (C and T) at the 39 end of insertion site (À1044) were deleted (Figure 8). We failed to PCR amplify the entire insertion in T369. Its 39 end (381 bp), PCR amplified with primers Tn391S and DFR4R (Table S1), was identical to the 39 end of Tgm9 and located upstream of the À1034th nt of the DFR2 promoter.
The insertion sites in the w4-dp and w4-p alleles were only 9 bp apart ( Figure 8). The promoter regions between the insertion sites and the transcription start site (TSS) were PCR amplified and sequenced from T321 (w4-dp), T369 (w4-p), and T322 (w4-m). No rearrangements in this region occurred in the mutants ( Figure S6). Therefore, the region upstream of Tgm9 insertion sites is important for full expression of DFR2. The upstream promoter regions of structural anthocyanin biosynthesis genes contained cis regulatory elements that affect pigmentation patterns or intensity (Coen et al. 1986;Almeida et al. 1989;Lister et al. 1993). Putative cis-regulatory elements CCAAT motif (Gelinas et al. 1985) and E-box (CACGTG) (Ephrussi et al. 1985) are located upstream of Tgm9 insertion sites in T321 (w4-dp) and T369 (w4-p) (Figure 8), which were moved away from the TSS in both mutants, presumably resulting in reduced expression of DFR2 ( Figure 2C).
Tgm9 is a low copy number element: CACTA elements usually have relatively low copy numbers (,100 copies) (Kunze et al. 1997). An earlier study showed that the soybean genome contained 30-42 copies of the Tgmlike elements (Rhodes and Vodkin 1988). The genomic DNA from three NILs, T322 (w4-m), T321 (w4-dp), and T325 (W4), were digested with EcoRI or double digested with HindIII and PstI, and DNA blots were hybridized to the 39 end of Tgm9. More than 10 copies of the Tgm9-like sequences were detected (Figure 9). T325 was isolated as a germinal revertant with purple flowers from T322. HindIII and PstI digested DNA showed the excision of Tgm9 from the DFR2 intron II and reinsertion into a new locus (Figure 9).
The recently available soybean genome sequence (http://www.phytozome.org) was searched for Tgm9 59 end (400 bp), 39 end (700 bp), GmTNP1, and GmTNP2 sequences. The 59 end showed similarities to 32 sequences. The 39 end and GmTNP1 showed similarities to $100 sequences of the soybean genome. At least Each primer combination produced two products. Products containing exon III were marked with stars or those without were with arrowheads. Lane 1, marker. (B) Schematic representation of GmTNP1 and GmTNP2 transcripts produced by alternative splicing. Four types of transcripts (t1-t4) with or without exon III or IV were detected. Transcripts t1 and t2 carrying no exon IV, amplified by primers P1 and P4 in A, encode GmTNP1; and t3 and t4 carrying exon IV, amplified by primers P1 and P2 in A, encode GmTNP2. Two ORFs and positions of their start/stop codons and 59-UTR are shown. 1500 bp GmTNP2 sequences showed similarity to 1000 sequences of the genome. This suggested that a TNP2like domain could be conserved among different CACTA elements such as Tgm5 (Rhodes and Vodkin 1988) or functionally related distant proteins. Among the Tgm9-like sequences, one localized to scafold_57 from nt 95,650 to 13,598 is 99% identical to Tgm9. We named this sequence Tgm10. Compared to Tgm9, Tgm10 is truncated for the first $4100-bp sequence, contains a gap in its 59 end, and a 1049-bp insert in exon XXIII ( Figure S7). Tgmt*, Tgm 9, and Tgm10 could be variants from a progenitor element or  alternatively, highly active Tgm9 could be a progenitor of Tgmt* and Tgm10.

DISCUSSION
In soybean, the w4-m allele regulates variegated flower color in petals and purple sectors on stems or hypocotyls. By applying biochemical and molecular approaches, we have established that somatic excision of a CACTA-type transposable element Tgm9 from DFR2 encoding dihydroflavonol-4-reductase results in variegated flowers in mutable T322 line carrying the w4-m allele. Tgm9 is $20.5 kb long and a member of the CACTA super family of transposons (Pereira et al. 1986;Rhodes and Vodkin 1988;Nacken et al. 1991;Inagaki et al. 1994). It generates 3-bp target-site duplication upon insertion. Its 59 and 39 ends carry imperfect terminal inverted repeats (TIRs) flanking the conserved CACTA sequence. Subterminal regions are highly structured and contain multiple copies of putative transposase binding motif (AACCGTCTTAR) (Figure 4) (Gierl et al. 1988). It excises at a high frequency ( Figure  7). Excision of Tgm9 generated 8-to 5-bp footprints ( Figure S5), which are comparable to the ones created by other CACTA elements such as petunia PsI (Snowden and Napoli 1998).
The excision mechanism in Tgm9 could be similar to one considered for En/Spm (Gierl et al. 1988(Gierl et al. , 1989Frey et al. 1990). Through alternative splicing, Tgm9 produces two distinct transposases, GmTNP1 (755 aa, Tam1 TNP1-like transposase) and GmTNP2 (1063 aa, Tam1 TNP2-like transposase) ( Figures 4B, 5, and 6). Organization of GmTNP2 and GmTNP1 is comparable to the one observed for transposases TNPA and TNPD in the maize En/Spm element (Pereira et al. 1986;Masson et al. 1989). GmTNP1 is presumably a DNAbinding protein like TNPA, recognizing and binding to the short repetitive motif of the subterminal regions (Gierl et al. 1988). GmTNP2 most likely is an endonuclease-like TNPD (Gierl et al. 1989;Frey et al. 1990). It binds to GmTNP1, interacts with Tgm9 TIRs, pulls the two ends of the element together to form a loop, and excises the element from its insertion site.
Tgm9 showed high identity to the Tgmt* element (EU190440, 20,544 bp) isolated from the soybean t* allele (Zabala and Vodkin 2008) (Table S3). As shown here and earlier, Tgm9 is an active element (Palmer et al. 1989;Groose et al. 1990) (Figure 7); whereas, Tgmt* at the soybean t* allele seems not to Figure 8.-Characterization of the w4-dp and w4-p alleles arisen following excision of Tgm9 from the DFR2 intron II. The positions of nucleotide or restriction sites relative to TSS (11) are shown. Solid triangles indicate the location of insertions in w4-dp and w4-p alleles. E, EcoRI; H, HindIII; and P, PstI. Figure 9.-Organization of Tgm9 among soybean lines that vary for the W4 alleles. The probe is the 39 end of Tgm9. Soybean lines, their W4 alleles, and restriction enzymes used were labeled above individual lanes. Lanes 1, T322 (w4-m); 2, T321 (w4-dp); and 3, T325 (W4). T321 and T325 were isolated as intermediate or full revertant lines, respectively, from T322. Polymorphic bands are shown with *. The EcoRI-specific polymorphisms among three lines were arisen most likely due to cytosine methylation in some of the EcoRI sites. The strong $4.5-kb HindIII-PstI band in T325 indicated additional copy resulting from reinsertion of Tgm9 into a new locus.
be Vodkin 2003, 2008). The transposase genes were silenced in line 37609 (t*) (Zabala and Vodkin 2008). High similarity between Tgmt* and Tgm9 suggested that Tgm9 could be the progenitor element of Tgmt*. Tgmt* is comparable to the cryptic spm element from the maize a-m2-8167B allele that contained an intact spm element with no activity (Masson et al. 1987;Banks et al. 1988).
Like most CACTA elements, Tgm9 is a low copy transposable element (Rhodes and Vodkin 1988;Kunze et al. 1997). Active low copy endogenous transposable elements have been considered useful tools in gene cloning and functional genomics studies (Maes et al. 1999;Walbot 2000;Ramachandran and Sundaresan 2001). We expect that highly active Tgm9 should facilitate functional genomics studies in soybean. Genetic data strongly suggested that mutations such as necrotic root (rn), male-sterility, and female sterility (st8) (Palmer et al. 2008a,b) most likely resulted from insertion of Tgm9. Except for two mutations in fertility genes, no reversions events have been observed among the mutants presumably tagged by Tgm9 (R. Palmer, unpublished data).
Truncated Tgm10 and fractured Tgm9 in w4-dp and w4-m allele (EU068463 and EU068464; Figure S7) suggested existence of fractured Tgm9 elements in the soybean genome. Fractured Ac (fAc) elements have been documented in maize (Ralston et al. 1989;Zhang and Peterson 1999). We hypothesize that the element is frequently fractured during transposition events and truncated Tgm9 derivatives cause stable mutations. If our hypothesis is correct, then the element will be useful in creating stable mutations and cloning soybean genes through Tgm9-tagging experiments. To date, to our knowledge no active, endogenous transposable elements have been cloned from any legume species. Therefore, Tgm9 is expected to expedite the genomics research in soybean, and thereby contribute significantly toward our understanding of the legume biology.   Figure 7a and two intermediate germinal revertants T321 (w4-dp) and T369 (w4-p) were selected for determining foot prints left behind by Tgm9 in DFR2 intron II through PCR by compare to the wild-type DFR2 (WT ) from cv. Williams 82 . Nucleotides representing the target site duplication are underlined. Footprint nucleotides left by Tgm9 germinal excision are in bold font. FIGURE S7.-Schematic representation of Tgm10. Tgm10 is located at the 3' end of Scaffold_57. Except for a ~4100bp deletion in 5' end, a gap in 5'end, and a 1049 bp insertion flanked with a 7 bp direct repeat in exon XXIII, Tgm10 is 99% identical to Tgm9 element.

TABLE S1
Primers used in this study a Primers used for sequencing the element are not listed here. ANS1F and ANS1R were designed from a partial coding sequence of an ANS gene identified from soybean seed coats (AF325853). DFR1F and DFR1R were designed according to the consensus sequence of three legume DFR genes (AF167556 from G. max; AF117263 from Lotus corniculatus; and AY389346 from Medicago truncatula).

TABLE S2
Probes used in this study

Probe a Description
ANS partial cDNA cDNA fragment amplified from purple petals of T322 using ANS1F and ANS1R primers.
DFR3' cDNA fragment amplified from petals of T322 using DFR3S and DFR3R primers.
Tgm9 3' end PCR fragment amplified using primers TN3'2S and TN3'1R from a lambda clone containing the w4-m allele, isolated from the 1st T322 lambda genomic library.
a All the probes were labeled with α-32 P-dATP using Primer-it II randomly labeling kit (Stratagene, La Jolla, CA).