The specification of floral organ identity in the higher dicots depends on the function of a limited set of homeotic genes, many of them members of the MADS-box gene family. Two such genes, APETALA3 (AP3) and PISTILLATA (PI), are required for petal and stamen identity in Arabidopsis; their orthologs in Antirrhinum exhibit similar functions. To understand how changes in these genes may have influenced the morphological evolution of petals and stamens, we have cloned twenty-six homologs of the AP3 and PI genes from two higher eudicot and eleven lower eudicot and magnolid dicot species. The sequences of these genes reveal the presence of characteristic PI- and AP3-specific motifs. While the PI-specific motif is found in all of the PI genes characterized to date, the lower eudicot and magnolid dicot AP3 homologs contain distinctly different motifs from those seen in the higher eudicots. An analysis of all the available AP3 and PI sequences uncovers multiple duplication events within each of the two gene lineages. A major duplication event in the AP3 lineage coincides with the base of the higher eudicot radiation and may reflect the evolution of a petal-specific AP3 function in the higher eudicot lineage.
FLOWERS are a defining characteristic of the angiosperms. The typical hermaphroditic angiosperm flower contains both sterile and reproductive organs. These organs are generally organized into whorls, with a particular organ type arising from a single node on the axis of a determinate floral meristem. The flowers of the model species Arabidopsis thaliana display the typical higher eudicot floral organization. The first and second whorls of the flower contain the sterile sepals and petals, respectively. The third whorl contains the stamens, the male reproductive structures which produce pollen. The female reproductive structures, the carpels, arise in the fourth whorl and contain the ovules. Numerous variations on this basic floral architecture exist within the angiosperms. These differences in organ number, structure and phyllotaxy are critical morphological characters in the study of angiosperm systematics.
The evolution of angiosperm floral diversity has been a subject of considerable study. While the stamens and carpels are thought to have each evolved only once, it is widely accepted that the sterile organs have evolved many times within the angiosperms, although the details of these events are unresolved (Cronquist 1988; Takhtajan 1991; Drinnanet al. 1994; Endress 1994). Much of the controversy has centered on the number and the nature of petal derivation events within the various angiosperm lineages (Figure 1). Phylogenetic analyses of the angiosperms based on a large rbcL data set have identified two major monophyletic clades (Chaseet al. 1993; Craneet al. 1995; Qiuet al. 1993). Both of these groups, the eudicots and the monocots, are rooted within an unresolved basal grade of magnolid dicots. The eudicot clade can be further subdivided into the lower eudicots, comprising the Ranunculidae, basal Hamamelididae and basal Rosidae, and the higher eudicots, made up of the bulk of the flowering plants, including the majority of the model species used for genetic analysis (Drinnanet al. 1994). Stamenally-derived petals, called andropetals, have evolved many times within the lower eudicots and at least once at the base of the higher eudicot clade and the monocot clade (Takhtajan 1991). A second type of petals, bracteopetals, are derived from sepals or other sterile subtending organs. The diverse magnolid dicots include species which have been characterized as possessing bracteopetals, as well as a number of species which are considered to have andropetals (Takhtajan 1991). The designations of petals as being andropetallous or bracteopetallous have been based primarily on morphological characters.
The isolation of floral homeotic mutants in Arabidopsis and Antirrhinum has provided an inroad into the dissection of the genetic mechanisms underlying floral diversity and the derivation of floral organs (Komakiet al. 1988; Bowmanet al. 1989; Carpenter and Coen 1990). Genetic analysis of these mutants has led to a model where three classes of genes, known as the A, B, and C classes, interact in a combinatorial manner to specify particular organ identities (Bowmanet al. 1991; Coen and Meyerowitz 1991; Meyerowitzet al. 1991). Mutations in the B group genes display transformations of the petals into sepals and the stamens into carpels, indicating that the corresponding wild-type gene products are required for specifying petal and stamen identity (Jacket al. 1992; Goto and Meyerowitz 1994). Like many of the organ-identity genes, the Arabidopsis B group genes APETALA3 (AP3) and PIS-TILLATA (PI) encode products which contain a MADS domain, a highly conserved region of approximately 57 amino acids which has been shown to play a role in DNA binding and protein dimerization (Normanet al. 1988; Pollock and Treisman 1991). The plant representatives of the MADS-box family are further distinguished by the presence of a 70-amino acid region called the K domain (Figure 2; Maet al. 1991; Davies and Schwarz-Sommer 1994). This portion of the protein is predicted to form two to three amphipathic helices which may facilitate protein-protein interactions (Pnueliet al. 1991). Separating the MADS and K domains is a short intervening region (I) of approximately thirty amino acids which, along with the K domain, has been shown to play a role in dimerization specificity (Krizek and Meyerowitz 1996; Reichmann et al. 1996a). The C-terminal portions of the proteins vary considerably in size and sequence within the family and have yet to be assigned any specific function.
While many of the currently characterized MADS domain proteins appear to function as homodimers, the AP3 and PI gene products are thought to act together as a heterodimeric transcription factor. This conclusion is supported by the finding that both the AP3 and PI gene products are required for DNA binding and nuclear localization (McGonigle et al. 1996; Riechmannet al. 1996a; Hillet al. 1998). Furthermore, the AP3 and PI proteins have been shown to bind to each other in immunoprecipitation experiments (Goto and Meyerowitz 1994; Riechmannet al. 1996a). The maintenance of AP3 and PI expression depends upon the presence of both gene products, suggesting that the AP3/PI heterodimer promotes the transcription of AP3 and PI, perhaps directly (Jacket al. 1992; Goto and Meyerowitz 1994). Interestingly, phylogenetic analysis of the entire plant MADS box gene family has shown that the AP3 and PI lineages are the products of a duplication event which makes them more closely related to each other than to any of the other MADS-box genes (Doyle 1994; Puruggananet al. 1995; Purugganan 1997; Theissenet al. 1996).
The conservation of B group functions across the angiosperms is being addressed through studies of AP3 and PI homologs in several higher eudicot species. In general, the expression patterns of these genes are all quite similar to those seen in Arabidopsis (Theissen and Saedler 1995; Irish and Kramer 1998). The mutant phenotypes, in those species where they have been analyzed, are generally consistent with a conserved role for AP3 and PI in promoting the establishment of petal and stamen identity (Sommeret al. 1990; Trobneret al. 1992; Angenentet al. 1993; van der Krol and Chua 1993). Furthermore, the AP3 ortholog from Antirrhinum, DEFICIENS (DEF), is able to largely replace endogenous Arabidopsis AP3 function (Irish and Yamamoto 1995; Samachet al. 1997). Similarly, both DEF and the Antirrhinum PI ortholog, GLOBOSA (GLO), have been shown to promote stamen and petal identity when ectopically expressed in Nicotiana (Davieset al. 1996). Taken together, these results support the conclusion that AP3 and PI orthologs are responsible for determining petal and stamen identity within the higher eudicots.
Since the petals of the higher eudicots are thought to be homologous, it is not surprising that all the higher eudicots studied to date appear to have a conserved petal developmental pathway. In order to examine how independent petal derivation events may be reflected in the petal developmental program, it is necessary to examine species whose petals are not homologous to those of the higher eudicots. Accordingly, we have cloned AP3 and PI homologs from eleven lower eudicot and magnolid dicot species in an effort to understand the evolution both of these gene lineages and of the pathways of petal specification. We present a phylogenetic analysis of the B group genes which indicates that the path of B group gene evolution is more complex than previously thought. Our analysis suggests that there are, in fact, two paralogous AP3 lineages in the higher eudicots: one represented by the well-studied AP3 ortholog group and the other containing the tomato AP3 paralog TM6 and several related genes. The data suggest that these two lineages are the result of a gene duplication event which occurred after the divergence of the Buxaceae in the lower eudicots but before the diversification of the higher eudicots. Sequence analysis reveals that the AP3-like genes of the lower eudicots and magnolid dicots are actually more similar to the members of the TM6 lineage than they are to the higher eudicot AP3 lineage. Although we have also identified duplication events in the PI lineage, none appear to date back to the base of the higher eudicots. In addition, the PI homologs we have isolated display a greater overall conservation of sequence than do the AP3 lineage members. Based on these observations, we present a model for the evolution of the B group gene lineage and discuss how duplication and divergence in this gene lineage may have influenced the evolution of petals.
MATERIALS AND METHODS
Species sampled and sources of plant material: The species included in this analysis are given in Tables 1–3, along with family membership, general collection information and GenBank accession numbers for the gene sequences. Throughout the text we have followed the taxonomic designations of Cronquist (1981) for dicots (the only exception to this being the designation of the Ranunculidae as an independent subclass) and Dahlgren et al. (1984) for monocots. The choice of taxa was influenced by both phylogenetic position and specimen availability.
Cloning and analysis: For each species, total RNA was prepared using Trizol (GIBCO BRL, Gaithersburg, MD) from whole flower buds collected across a range of developmental stages. Poly-A mRNA was extracted from total RNA using Magnetight Oligo (dT) particles (Novagen, Madison, WI). Single-stranded cDNA was synthesized by priming with the oligonucleotide 5′-CCGGATCCTCTAGAGCGGCCGC(T)17 from 500 ng of poly-A RNA. This poly-T primer was used with a second primer with the sequence 5′-GGGGTACCAA(C/T)(A/C)GI CA(A/G)GTIACITA(T/C)TCIAAG(A/C)GI(A/C)G-3′ in a polymerase chain reaction (PCR) to amplify MADS-box-containing cDNAs (primary PCR reaction). PCR analysis was performed in 100 μl of PCR buffer (10mmTris pH 8.3, 50mmKCl, 1.5mmMgCl2) containing 50 pmol and 20 pmol of 5′ and 3′ primer, respectively, 200 μmol of each deoxyribonucleotide triphosphate, and 2.5 units of AmpliTaq Gold polymerase (Perkin Elmer, Foster City, CA). Amplification began with a Taq-activation step of 12 min at 95°C, followed by 10 cycles of 20 sec denaturing at 95°, 30 sec annealing at 38° and 1 min extension at 72°. The program was completed by 30 cycles of 20 sec denaturing at 95°, 30 sec annealing at 42° and an extension time of 1 min at 72°. Amplified products were analyzed on a 1% agarose gel, revealing one or more distinct fragments of ≥0.6 kb. The reactions were directly cloned using the TA and TOPO-TA Cloning kits (Invitrogen, Carlsbad, CA). Clones were analyzed based on size and all fragments over 0.6 kb were sequenced using fluorescent sequencing methods by the Keck Foundation Biotechnology Resource Laboratory at Yale University.
In Papaver nudicaule, Dicentra eximia, Ranunculus bulbosus and Pachysandra terminalis, 3′ primers targeted to AP3- or PI-specific C-terminal motifs were used in conjunction with the degenerate 5′ primer to specifically amplify AP3 and PI homologs from the primary PCR reaction (same conditions as above). The PI-specific primer has the sequence 5′-TGIA(A/G)(A/G)TTIGGITGIA(A/T)(T/G)GGITG and the AP3-specific primer has the sequence 5′-CIAGICGIAG(A/G)TC(A/G)T. >For these species, clones from both the primary PCR reaction as well as the AP3- and PI-specific secondary amplifications were analyzed. The complete 3′ sequence of the clones generated with the AP3- or PI-specific primers was obtained using 3′ RACE (3′ RACE primer sequences available upon request). The complete cDNA sequences of PnAP3-2 and PnPI-1 were obtained using the 5′ RACE System for Rapid Amplification of cDNA Ends, Version 2.0 (Gibco BRL, Gaithersburg, MD), in conjunction with several different oligonucleotides (sequences available upon request).
For the cloning of LeAP3 from Lycopersicon esculentum, a Solanaceae AP3- specific primer with the sequence 5′-A(A/G)IGC(A/G)AAIGTIGTIAT(A/G)TC was designed from the C-terminal consensus D(I/L)TTFAL. This 3′ primer was used in conjunction with the degenerate 5′ MADS-box primer to specifically amplify LeAP3. The complete 3′ sequence was obtained using 3′ RACE (primer sequence available upon request).
Phylogenetic analysis: The sequences of all of the published B group representatives were obtained from GenBank (see Tables 2 and 3 for accession numbers). Protein sequences were aligned using CLUSTALW and refined by hand, taking both nucleotide and amino acid sequences into consideration (see appendices 1 and 2). The alignments were further modified (for phylogenetic analysis) by encoding gaps as single characters in a supplemental data matrix. In this approach, gaps are treated as single events, thus preventing their overweighting (based on gap length) in the subsequent phylogenetic analyses.
We generated parsimony trees based on the aligned protein sequences using the PAUP 4.0* package (Phylogenetic Analysis Using Parsimony, Version 61, used by permission of the author, Swofford 1993). Parsimony trees were found using the heuristic search algorithm, generating 1000 replicate runs using random stepwise addition of sequences. Multiple equal-length parsimony trees were collapsed into 50% majority rule consensus trees. Bootstrap values for all resolved nodes in the consensus tree were derived from the partition functions obtained from 1000 replicate bootstrapping runs. These runs were similarly produced using a heuristic search via random stepwise addition and under the TBR (tree-bisection-reconnection) algorithm for branch swapping.
Distance matrices were derived from the PI and AP3 datasets under a mean character difference criterion. Amino acid substitutions were weighted in accordance with the BLO-SUM substitution matrix. Trees were subsequently generated via the Neighbor-Joining algorithm (NJ) as implemented in PAUP. The resulting trees represent the “minimum evolution” networks connecting the sequences. Bootstrap values for resolved nodes are derived from 1000 replicate runs, again using the NJ algorithm.
For both the parsimony and distance analyses, the AP3 trees were rooted using six PI sequences (GLO, PI, DaPI, LtPI, PhPI and OsMADS2); conversely, six AP3 sequences (AP3, DEFA, SLM3, MfAP3, PhAP3 and CpAP3) were used to root the PI trees. These choices for outgroups were based, first of all, on the fact that AP3 and PI are known to be paralogous lineages and are, therefore, each other's natural outgroup. Secondly, while the AP3 and PI orthologs can be reasonably aligned to each other throughout their entire length, unambiguous alignments cannot be generated between the B group sequences and the remaining members of the MADS-box gene family.
Similar analyses were conducted using nucleotide data sets aligned in accordance with the protein alignments (data not shown, see results).
Cloning of B-group genes from lower eudicots and magnolid dicots: Our strategy for cloning B group gene members from various lower eudicot and magnolid dicot species depends on the presence of a highly conserved sequence in the putative DNA-binding α-helix of the MADS-box domain (Shore and Sharrocks 1995). We designed a degenerate primer based on the decapeptide sequence NRQVTYSKR using the sequence of the published higher eudicot AP3 and PI orthologs. This region of the MADS domain is largely invariant across the predicted products of the plant MADS-box genes examined to date and includes three amino acids that are invariant across all members of the MADS-box family (Doyle 1994; Shore and Sharrocks 1995). The tyrosine residue within this domain is diagnostic for the B group genes. The use of this B-group specific 5′ primer in conjunction with the completely nonspecific poly-T 3′ primer allowed for the amplification of AP3 and PI orthologs with little bias against divergent paralogs. Since the predicted length of the eudicot B group members ranges from 180 to 230 amino acids, we examined all clones longer than approximately 600 bp.
We used Syringa vulgaris (lilac), a higher eudicot, as a positive control to test the primary PCR reaction and cloning strategy. Both AP3 and PI orthologs were isolated from Syringa and their sequences showed very high similarity to that of the other Asterid B group representatives. For each lower eudicot and magnolid dicot species, we analyzed 10 to 36 clones (Table 1). It should be noted that in several species, an exhaustive survey has not been undertaken (<10 clones analyzed, Table 1). In the case of Lycopersicon, this is due to the fact that we were only interested in establishing the sequence of one gene (see below). Analyses of the Papaveraceae and Ranunculaceae were initiated with Papaver californicum and Delphinium ajacis but subsequent analyses were carried out with more convenient representative species (P. nudicaule, R. bulbosus and Caltha palustris).
A small number of the sequenced clones obtained from the primary PCR reactions proved not to be MADS-box gene representatives (5/171 total clones sequenced) and two were found to be members of the AGA-MOUS-like family (one from P. nudicaule and one from Delphinium ajacis). Conceptual translation of all of the other cDNAs showed that they encode B group-type MADS-box gene products (164/171 total clones sequenced). We assigned the novel genes to the AP3 or PI classes based on the overall sequence similarity to the known AP3 and PI representatives and on the presence of specific diagnostic sites in the MADS, K and C domains. AP3 and PI-like proteins can be distinguished from one another at MADS-box residues 29, 35 and 47. Within the K box, PI homologs possess a highly conserved sequence KHExL (appendix 1, residues 88 to 92). The comparable sequence in the K box of the AP3 homologs is (H/Q)YExM (appendix 2, residues 85 to 89). We also found that the C-terminal portions of the predicted proteins contained diagnostic motifs for each lineage (see below). Each unique cDNA was named using the first letter of the genus and species from which it was isolated followed by either AP3 or PI, depending on its sequence similarity.
Sequence and phylogenetic analysis of PI homologs: We have identified a total of twelve new PI-like genes which have been cloned from nine species (Table 2). In most of the species surveyed, only one PI-like gene was identified, while in P. nudicaule, R. bulbosus and Piper magnificum, two distinct PI-like genes were found. The sequences of the predicted products of the new PI clones align well with the previously studied higher eudicot and monocot representatives. Particularly striking is an approximately twenty amino acid region at the C-terminal end of the predicted proteins that displays extremely high conservation (Figure 3A). This domain, which we will refer to as the PI motif, has a core consensus sequence of MPFxFRVQPxQPNLQE. Four of the positions are completely invariant and the remainder show very strong conservation of chemical characteristics. Overall, the PI motif appears to be a strongly hydrophobic domain, although several charged and polar amino acids are present within the region. The motif bears no strong similarity to any known structural elements and a BLAST search for similar sequences in GenBank yields only PI homologs.
There are only two instances of marked divergence in sequence or structure among the newly isolated PI homologs. One is found in the predicted products of the D. eximia (Fumariaceae) and P. nudicaule (Papaveraceae) genes, DePI and PnPI-1, both of which contain an approximately 20 amino acid insertion upstream of the PI motif as well as a 10 to 12 amino acid addition at the C-terminal end of the protein (Figure 3A). These novel regions are characterized by stretches of A/C-rich repetitive DNA sequence. At the nucleotide level, the two cDNAs clearly align with one another through these novel regions, sharing 58% identity in the 100 bp upstream of the PI motif and 70% identity in the downstream region. It is likely that the event(s) that produced these insertions occurred before the last common ancestor of the Papaveraceae and Fumariaceae, which are sister families within the Papaverales. The second example of major sequence deviation within the PI homologs is seen in the P. nudicaule gene PnPI-2. The PnPI-2 cDNA encodes a truncated protein of only 164 amino acids, as opposed to the usual length of 180 to 210 amino acids. Alignment of the PnPI-1 and PnPI-2 nucleotide sequences reveals that the similarity between the PnPI-1 and PnPI-2 transcripts is quite high (80% identity) up to the point of the stop codon in PnPI-2, after which the similarity declines considerably.
Parsimony and distance-based phylogenies of the PI sequences were produced by analysis of the complete PI protein data set using several AP3-like sequences as the outgroup. The parsimony analysis resulted in 8 trees of equivalent length, from which a 50% majority rule consensus tree with a consistency index of 0.972 was derived, shown in Figure 4A. The topology of the parsimony tree is similar in many ways to that of the distance-based tree shown in Figure 4B. In both trees, all of the higher eudicot PI sequences are grouped into a single clade. The distance and parsimony trees do differ in the placement of the PnPI-1/PnPI-2 paralog pair relative to the other Ranunculid PI orthologs. One unexpected result is the position of the P. magnificum gene, PmPI-2, as sister to the higher eudicot clade in the parsimony analysis. This is not the expected position of a representative of a basal magnolid dicot. The other Piperaceae representatives, PmPI-1 and PhPI, are located at the base of the tree. In diverging from its paralog, PmPI-2 may have independently acquired sequence motifs characteristic of the higher eudicot PI representatives. In the distance analysis, PmPI-2 is placed at the base of the tree, closer to the other Piperaceae representatives. We note that the low bootstrap support seen for certain nodes reflects the restricted number of characters on which the node is based, rather than on the existence of alternative, better supported topologies. Analysis of a nucleotide PI data set yielded trees which displayed a significant number of unresolved polytomies due largely to the effects of saturation (data not shown). Where resolution was obtained using the nucleotide sequences, the structure of the consensus tree did not differ from that found using the protein sequences.
Sequence and phylogenetic analysis of the AP3 homologs: The higher eudicot L. esculentum (tomato) has been previously found to contain an AP3 paralog, TM6, which is considered to be orthologous to the Solanum tuberosum gene PD2 (Pnueliet al. 1991; Garcia-Maroto et al. 1993). Solanum (potato) contains another gene, STDEF, which appears to be orthologous to the AP3 lineage members of the other Asterids and higher eudicots. Lycopersicon was examined in an attempt to find an AP3 -like gene which would be more similar to the other higher eudicot AP3 orthologs than the previously described TM6. We recovered several clones of a cDNA, LeAP3 whose predicted product displays 93% amino acid identity to that of STDEF from Solanum (Garcia-Maroto et al. 1993). This level of similarity is comparable to that seen between the TM6 and PD2 gene products from Lycopersicon and Solanum, respectively. This result confirms the presence of two separate AP3-related genes in both tomato and potato.
We have also cloned eleven AP3-like genes from a total of nine lower eudicot and magnolid dicot species (Table 3). P. terminalis and P. nudicaule were the only species that we found to have two AP3 -like genes. The Ranunculid and magnolid dicot AP3 representatives show many differences when compared to the higher eudicot AP3 orthologs. Foremost among these are the considerable differences in the sequences of the predicted C-termini. The higher eudicot AP3 gene products display a highly conserved C-terminal motif with the consensus sequence D(L/I)TTFALLE (euAP3 motif, Figure 3B). The majority of the AP3 clones from the Ranunculidae and the magnolid dicots have a completely different predicted C-terminal sequence with the consensus YGxHDLRLA (paleoAP3 motif, Figure 3B). The predicted product of an AP3 -like gene from Zea mays, SILKY-1, also displays the paleoAP3 motif with only one amino acid difference (C. Padilla and R. Schmidt, personal communication). The paleoAP3 motif is slightly divergent, but recognizable, in PtAP3-1 and PtAP3-2 from P. terminalis. Interestingly, the pa-leoAP3 motif aligns very well with the C-terminal end of the predicted TM6 and PD2 proteins (Figure 3B). Similarity to the paleoAP3 motif is also seen in the product of a putative B group-related gene which has been isolated from the Pteridophyte (fern) Ceratopteris (Munsteret al. 1997; Figure 3B). The product of this fern gene, CRM3, has relatively low similarity to the AP3 gene products through most of its length, but the predicted C-terminal end aligns well with the paleoAP3 motif. This observation is remarkable considering the fact that the Pteridophytes diverged from the land plants roughly 400 million years ago (mya) (Stewart and Rothwell 1993). The Papaver clones PnAP3-1 and PcAP3 are unique among the Ranunculid AP3 genes in that they do not show high conservation of the paleoAP3 motif.
One other motif that we recognized in the new AP3 gene products is a region with similarity to the PI motif. The Ranunculid AP3 representatives have a region upstream of the paleoAP3 motif with the consensus sequence FxFRLQPSQPNLH. This is very similar to the core of the PI motif consensus sequence (Figure 3B). When all of the AP3-like gene products are aligned, the PI motif-derived sequence can be observed in almost all of the AP3 -like proteins. While this sequence has been very highly conserved in the PI lineage, in the AP3 lineage it has diverged to differing degrees. TM6 and PD2 show a much greater conservation of the PI motif than do the other higher eudicot AP3 gene products. The PI motif is particularly divergent in RAD1 and RAD2 from Rumex, while in CMB2, an AP3 paralog from Dianthus, a truncation has eliminated most of the PI motif-derived sequence. In the Michelia and Peperomia AP3 representatives, the PI motif sequence is unrecognizable.
Figure 5, A and B, show the results of parsimony and distance analysis of the complete AP3 protein data set. The parsimony analysis produced 27 trees of equivalent length, from which we generated a 50% majority rule consensus tree with a consistency index of 0.89. The topmost clade in both the parsimony and distance trees contains a grouping of the higher eudicot Brassicaceae, Asterid and Caryophyllid AP3 orthologs (Figure 5A). The members of this clade will be referred to as the euAP3 lineage. The second main clade in both trees is made up of several other higher eudicot AP3-like genes: TM6, PD2, AsAP3 and CMB2. The orthology of PD2 and TM6 has been suggested previously (Garcia-Maroto et al. 1993). Likewise, a recent analysis of the relationships of the MADS-box genes also found an association between AsAP3 and TM6 (Purugganan 1997). CMB2 is an AP3-like gene that was isolated from Dianthus caryophyllus. The inclusion of CMB2 in the TM6 lineage is supported byseveral positions which are synapomorphic for this clade, including 69Met, 118Gly, 147Lys and 211Val. Our analyses define all of these genes as descendants of a lineage that we will refer to as the TM6 lineage.
In the parsimony analysis, the remaining AP3-like genes of the lower eudicots and magnolid dicots constitute a paraphyletic group at the base of the tree. The two very similar Pachysandra genes are paired together and are the most closely related to the higher eudicot clades. In contrast, the distance analysis defines a clade composed of PtAP3-1, PtAP3-2 and the Ranunculid and Magnoliaceae sequences. This result highlights the similarities between the Pachysandra and other lower eudicot sequences. In both trees, PnAP3-1 and PcAP3 from P. nudicaule and P. californicum, respectively, are paired together with very high certainty, suggesting that they define a paralogous lineage. Similar to the PI analysis, the low bootstrap support found for several nodes reflects the limited number of characters on which the node is based. All resolved nodes shown in Figure 5A were present in all equally parsimonious trees. As with PI, a nucleotide AP3 data set was analyzed, but the resulting trees were highly unresolved (data not shown). Where resolution was obtained using the nucleotide sequences, the structure did not differ from that found using the protein sequences.
If the character of each representative's predicted C-terminal motif is mapped on Figure 5A, we see that the paleoAP3 motif is present throughout the basal paraphyly and along the TM6 lineage. We will refer to the lower dicot and magnolid dicot genes which exhibit the paleoAP3 motif as the paleoAP3 lineage. The euAP3 motif is synapomorphic for the clade defining what we have called the euAP3 lineage. Diagnostic amino acid character states from throughout the protein sequence can be mapped onto the tree as well. Although the C-termini of the predicted PtAP3-1 and PtAP3-2 proteins resemble the paleoAP3 motif, in other positions the sequence is more similar to the euAP3 genes. These euAP3-like positions include 54Leu, 55His, 72Leu, 148-Asn and 150Ile. The euAP3 clade is also defined by several synapomorphic residues, aside from the C-terminal euAP3 motif. These changes include 2Gly→Ala, 7Glu→Gln, 28Ile Leu, 115Asp Ser/Cys, 144His→ and → Lys 226Phe→Leu. Interestingly, the primitive states of positions 2 and 7 in the MADS box are identical to the character state in the PI lineage members.
Morphological evolution proceeds by changes in developmental pathways due to alterations in the structure and regulation of particular developmental genes. The organ identity genes of the A, B, and C classes are logical starting points for a study of changing floral morphologies. The B group genes, in particular, appear to be a likely site of developmental flexibility in floral evolution. While A and C group members might be expected to show very high levels of constraint due to their pleiotropic roles in floral meristem identity and determinacy, as well as in organ identity, the functions of the B group genes appear to be limited to establishing organ identity. Moreover, the fact that the petals and stamens display enormous morphological plasticity within the angiosperms may be directly reflected as changes in the B group genes. These predictions are supported by substitution rate analysis, which shows that the AP3/PI group is evolving 20–40% faster than are all of the other plant MADS-box genes (Purugganan 1997). Furthermore, although the B group genes appear to have a conserved function in determining petal identity in the higher eudicots, these petals are not thought to be homologous with those of the lower eudicots, magnolid dicots and monocots. The independent derivation events that gave rise to the different petals of the angiosperms may be reflected in the diversified structure and function of the B group genes.
In this study we present the isolation of twenty-six new B group genes from a total of thirteen species distributed throughout the angiosperms. One of the most striking findings of the analysis of these genes is the high frequency of gene duplications within both the AP3 and PI lineages. We have identified several key ancestral duplications, in addition to what appear to be more recent duplication events. Pinpointing the exact time at which these duplications have occurred is difficult, however. The functional redundancy that results from having multiple gene copies can serve to release constraint on the sequence of the paralogs. Thus, although a high sequence similarity between paralogs would seem to suggest a relatively recent duplication event, low sequence similarity could reflect either an ancient duplication or the rapid divergence of a paralogous lineage. If we can identify orthologs of both duplication products in more than one species, then it is clear that the duplication event occurred before the last common ancestor of the species in question.
The B group gene ancestor and the AP3/PI duplication event: The AP3 and PI genes have previously been shown to be members of closely paralogous gene lineages (Doyle 1994; Puruggananet al. 1995; Purugganan 1997; Theissenet al. 1996). We report further evidence for the paralogy of the AP3 and PI lineages, most notably the presence of the PI motif-derived sequence in the AP3 homologs. Based on the common sequence characteristics of the AP3 and PI lineages, we can reconstruct some of the characters of the B group ancestral gene lineage. The basal representatives of both lineages contain the PI motif as well as several diagnostic amino acids in the MADS domain (2Gly, 7Glu). This would suggest, based on parsimony, that the most recent B group ancestor also displayed these traits. The predicted product of the CRM3 gene from Pteridophyte Ceratopteris may reveal another aspect of the B group ancestor. If the CRM3 C-terminal domain is homologous to the paleoAP3 motif, then we must add possession of the paleoAP3 motif to the list of characteristics of the B group ancestor. The fact that the predicted product of CRM3 does not possess the PI motif could reflect an origin of this domain that postdates the split of the ferns from the land plants (Figure 6).
All of the species which we have thoroughly surveyed contain both an AP3 and a PI representative, suggesting that the duplication event which produced these two lineages predates the diversification of the angiosperms. Initially after the duplication event, both of the products would have exhibited the PI motif, the characteristic MADS-box residues and, possibly, the paleoAP3 motif. In the PI lineage, the PI motif and the MADS-box characters appear to have been highly conserved throughout while the paleoAP3 motif was lost, possibly by a single truncation event. In contrast, along the AP3 lineage, the PI motif diversifies dramatically, along with additional changes throughout the protein (Figure 6).
The PI lineage: The high degree of conservation of the PI motif throughout the members of the PI lineage suggests that it has a critical function. The requirement of the C terminus for in vivo function has been demonstrated for the products of several plant MADS-box genes, including AP3 (Krizek and Meyerowitz 1996; Riechmann and Meyerowitz 1997). In vitro studies have shown, however, that the C regions of the AP3 and PI proteins are not required for heterodimerization or the binding of the complex to DNA (Krizek and Meyerowitz 1996; Riechmannet al. 1996b). Furthermore, none of the PI gene mutants isolated in Arabidopsis or Antirrhinum has lesions in this region (Goto and Meyerowitz 1994; Jacket al. 1992; Sommeret al. 1990; Trobneret al. 1992). For these reasons, it is difficult to speculate on the functions of the PI motif, aside from suggesting that it is involved in protein interactions which are important for the overall function of the PI orthologs.
Many other duplication events have followed the creation of the separate AP3 and PI lineages. In the PI lineage, five different paralog pairs have been identified, in Petunia hybrida, R. bulbosus, P. nudicaule, P. magnificum and Oryza sativa. The phylogenetic analysis suggests that these are the products of independent duplication events which occurred after the last common ancestor of any of the species included in this analysis. In two of these species, P. hybrida and O. sativa, there is some evidence for functional divergence of the paralogs (Angenentet al. 1993; Chunget al. 1995). Although polyploidy is a common cause for the presence of paralogs within plant genomes, none of these species are known polyploids (Bennett and Smith 1976, 1991; Bennett and Leitch 1995). Duplications such as these, as well as insertion events like those seen in the PI orthologs of P. nudicaule and D. eximia, may be useful as synapomorphies to define taxonomic groups.
The AP3 lineage: We have defined two distinct AP3 lineages which appear to be present throughout the higher eudicots. These are clearly the products of an ancient duplication event that occurred before the diversification of the higher eudicots. We cannot rule out the possibility that this duplication event occurred very early in the angiosperms and we have simply not detected the members of the euAP3 lineage in the lower eudicots or magnolid dicots. The sequence and phylogenetic analyses do not support an extremely early duplication, however. Most notably, the Pachysandra PtAP3-1 and PtAP3-2 genes appear to represent a mosaic of paleoAP3 characters and characters of the euAP3 and TM6 lineages of the higher eudicots. Although the C-terminal regions of the predicted products of the Pachysandra genes bear more similarity to the paleoAP3 motif, other positions in the protein sequences ally them with the higher eudicot genes. Based on these observations, we propose that the duplication which produced the euAP3 and TM6 lineages from a paleoAP3 ancestor occurred after the last common ancestor of the Buxaceae and the higher eudicots but before the diversification of the major higher eudicot subclasses (Figure 7).
We see the results of additional taxon-specific duplication events in the AP3 lineages of Brassica oleracea, Rumex acetosa, and P. terminalis. The AP3 paralog pairs of these species all show relatively high levels of sequence similarity. We have also identified several more divergent paralogous AP3 lineages. One such lineage is defined by PcAP3 and PnAP3-1 from P. californicum and P. nudicaule, respectively. Other than being florally expressed, the role of these Papaver genes is unknown. Evidence for a functionally divergent AP3 paralog is seen in the Medicago sativa gene, NMH7 (Heard and Dunn 1995). The sequence of NMH7 is highly diverged from the sequence of the other euAP3 lineage members, and may reflect the fact that the function of this paralog has also diverged considerably, being involved in mediating root nodulation (Heard and Dunn 1995). Our analysis, which is based on amplifying floral cDNAs, would not have detected any paralogs such as NMH7.
Implications of the AP3 duplication event and subsequent divergence: While the PI lineage appears to be highly conserved, the AP3 lineage has experienced significant diversification, some of which clearly correlates with duplication events. If the fern CRM3 gene is, in fact, a representative of the B group ancestral lineage which possesses a paleoAP3 motif, then this small region has been conserved, to some degree, from the Pteridophytes, dating back some 400 mya, up through the lower Rosids, which first appeared approximately 80 mya (Drinnanet al. 1994; Stewart and Rothwell 1993). This highly conserved motif was lost, however, in the euAP3 lineage where we see fixation of a new conserved C-terminal motif. In addition, we see changes throughout the predicted products of the euAP3 lineage genes, including the MADS-box domain. The euAP3 lineage would, therefore, appear to have undergone considerable sequence divergence in regions that were highly conserved, some for at least 300 million years. We assume that this reflects a shift in the functional repertoire of the euAP3 lineage members relative to the ancestral paleoAP3 representatives.
There are many possible ways in which the functions of the euAP3, paleoAP3 and TM6 lineage members may differ. The euAP3 lineage members function primarily in the establishment of petal and stamen identity. The role played by euAP3 representatives in stamen identity is likely to be the ancestral function of these genes since stamens are thought to have evolved only once, before the diversification of the angiosperms. In contrast, the function of euAP3 lineage members in petal identity may be more recently acquired, reflecting a de novo evolution of petals at the base of the higher eudicot radiation. Similarly, it seems likely that the paleoAP3 lineage members also function in stamen identity, but each time that petals were derived, whether from stamens or from bracts, the paleoAP3 ortholog(s) may have been recruited to new roles. In the Ranunculidae, for example, andropetals are thought to have been derived many times, even within single families (Drinnanet al. 1994). Each of these events may have resulted in changes of the functions of the B group genes. The functions of the TM6 lineage members in the higher eudicots are not well understood. Expression patterns have only been characterized for TM6 itself, which is highly expressed in the developing petal, stamen and carpel, but a role in the formation of these organs has not been established (Pnueliet al. 1991). Our analysis reveals that the TM6 lineage members still retain sequence similarity to the ancestral paleoAP3 lineage, but the individual representatives appear to have undergone significant diversification since the euAP3/TM6 duplication event.
These results reveal a complex pattern of gene duplication and divergence in the AP3 and PI lineages. While it appears that the euAP3 motif is found in all of the higher eudicot AP3 homologs that are known to play a role in determining petal and stamen identity, it is the paleoAP3 motif that defines the AP3 homologs of the lower eudicots and magnolid dicots. To understand what changes in function may underlie the sequence divergence observed in the euAP3 lineage relative to the ancestral paleoAP3 lineage, investigation of the function of the paleoAP3 lineage members in the lower eudicots and magnolid dicots will be necessary. These and other comparative developmental approaches are critical to defining the relationships between gene duplication, gene expression and the morphological diversification that is the hallmark of the angiosperm radiation.
The authors thank the members of the Irish lab for helpful discussions. We are very grateful to C. Padilla, R. Schmidt, M. Barrier and M. Purugganan for providing unpublished sequence information. We also thank David Swofford for kindly allowing us to use Version 61 of PAUP 4.0*. This work was supported by grants from the National Science Foundation (IBN 9630916) and United States Department of Agriculture (97 01286) to V.F.I., a National Science Foundation graduate fellowship to E.M.K. and the Yale Science Development Fund to R.L.D.
Communicating editor: E. Meyerowitz
- Received December 24, 1997.
- Accepted March 2, 1998.
- Copyright © 1998 by the Genetics Society of America