Genetics, Vol. 153, 427-444, September 1999, Copyright © 1999
Maize R2R3 Myb Genes: Sequence Analysis Reveals Amplification in the Higher Plants
Pablo D. Rabinowicz1,a,
Edward L. Braun1,b,
Andrea D. Wolfec,
Ben Bowend, and
Erich Grotewolda,b
a Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724,
b Department of Plant Biology, Ecology and Organismal Biology, Ohio State University, Columbus, Ohio 43210
c Department of Evolution, Ecology and Organismal Biology, Ohio State University, Columbus, Ohio 43210
d Pioneer Hi-Bred International, Inc., Johnston, Iowa 50131
Corresponding author:
Erich Grotewold, Department of Plant Biology, Plant Biotechnology Center, 206 Rightmire Hall, 1060 Carmack Rd., Ohio State University, Columbus, OH 43210., grotewold.1{at}osu.edu (E-mail)
Communicating editor: V. SUNDARESAN
 | ABSTRACT |
|---|
Transcription factors containing the Myb-homologous DNA-binding domain are widely found in eukaryotes. In plants, R2R3 Myb-domain proteins are involved in the control of form and metabolism. The Arabidopsis genome harbors >100 R2R3 Myb genes, but few have been found in monocots, animals, and fungi. Using RT-PCR from different maize organs, we cloned 480 fragments corresponding to a 4244 residue-long sequence spanning the region between the conserved DNA-recognition helices (MybBRH) of R2R3 Myb domains. We determined that maize expresses >80 different R2R3 Myb genes, and evolutionary distances among maize MybBRH sequences indicate that most of the amplification of the R2R3 Myb gene family occurred after the origin of land plants but prior to the separation of monocots and dicots. In addition, evidence is provided for the very recent duplication of particular classes of R2R3 Myb genes in the grasses. Together, these findings render a novel line of evidence for the amplification of the R2R3 Myb gene family in the early history of land plants and suggest that maize provides a possible model system to examine the hypothesis that the expansion of Myb genes is associated with the regulation of novel plant cellular functions.
THE regulation of gene expression is a fundamental process in all living organisms. Transcription factors are classified in structural families according to the presence of specific DNA-recognition motifs (PABO and SAUER 1992
). One such family is constituted of proteins containing the Myb-homologous DNA-binding domain (Myb-domain), originally identified in the v-myb oncogene found in the avian myeloblastosis virus (KLEMPNAUER et al. 1982
). Proteins containing the Myb DNA-binding domain have since been found in all eukaryotes in which they have been sought (reviewed in LIPSICK 1996
). Myb domains are usually formed by two or three imperfect 51- or 52-residue repeats (R1, R2, and R3). Each repeat encodes three
-helices, with the second and third helices forming a helix-turn-helix (HTH) structure when bound to DNA, which is similar to motifs found in the
repressor and homeodomain proteins (GABRIELSEN et al. 1991
; OGATA et al. 1994
). R2 and R3 are usually sufficient for sequence-specific DNA binding. A growing class of proteins that contain a single Myb repeat has been identified. From an evolutionary point of view, these proteins are distantly related to Myb domains containing the R2 and R3 Myb repeats (ROSINSKI and ATCHLEY 1998
), probably forming a distinct family of DNA-recognition motifs with sequence similarity to the Myb repeats. For this reason, only Myb-domain proteins containing multiple Myb repeats were considered in this study.
According to this criterion, ~10 Myb-domain proteins have been identified to date in vertebrates, including c-Myb, A-Myb, and B-Myb (e.g., LIPSICK 1996
). c-Myb plays an essential role in controlling the proliferation and differentiation of hemopoietic cells. The cellular functions of A-Myb and B-Myb are less well understood. A single c-myb-like gene has been reported to date in Drosophila, and the fungi and slime molds express only a handful of Myb-like sequences (reviewed in LIPSICK 1996
).
In sharp contrast, dicotyledonous plants express a large number of Myb-homologous proteins (reviewed in MARTIN and PAZ-ARES 1997
). The number of R2R3 Myb genes has been estimated to be at least 40 in Petunia hybrida (AVILA et al. 1993
) and >100 in Arabidopsis (ROMERO et al. 1998
). Plant Myb domains usually have two Myb repeats most similar to the R2 and R3 repeats of their animal homologs, defining the R2R3 Myb-domain proteins family (LIPSICK 1996
; MARTIN and PAZ-ARES 1997
). Evolutionary studies based on the sequences of Myb domains from several organisms indicate that plant Myb ancestors may have had three Myb repeats and that the first repeat was lost (LIPSICK 1996
; ROSINSKI and ATCHLEY 1998
). While these studies did provide some clues on how vertebrate Myb genes evolved from a single ancestor, little information was obtained regarding the expansion of the plant R2R3 Myb gene family or about the evolution of the large number of plant Myb-domain proteins. The recent analysis of the sequence of >80 R2R3 Myb genes from Arabidopsis thaliana added important data on the functional and structural relationships among members of this family of transcription factors in dicots (ROMERO et al. 1998
), but the details of the early evolution of plant Myb-domain proteins remain obscure. It is clear that sequence similarity downstream of the R2R3 region of Arabidopsis Myb proteins is very low with the exception of a small number of residues conserved within specific groups of Myb sequences, which may be functionally related (KRANZ et al. 1998
).
Plant Myb-domain proteins have some of the expected characteristics of regulators that play an important role in the evolution of plant form (DOEBLEY and LUKENS 1998
). Plant Myb proteins regulate the differentiation of epidermal cells to trichomes (OPPENHEIMER et al. 1991
), the determination of cell shape (NODA et al. 1994
), and the development of leaf form (WAITES et al. 1998
). The complex control of phenylpropanoid (SABLOWSKI et al. 1994
; MOYANO et al. 1996
) and flavonoid biosynthesis (reviewed in MOL et al. 1998
) by Myb proteins indicates a key function played by this family of regulatory factors in the evolution of plant metabolic diversity. The role of plant Myb proteins in mediating the response to viral infection (YANG and KLESSIG 1996
), hormones (GUBLER et al. 1995
), and drought (URAO et al. 1993
) suggests that they play important roles in the interaction of plants with the environment. Together, these findings have led to the suggestion that the expansion of the Myb gene family occurred in conjunction with the development of new cellular functions (MARTIN and PAZ-ARES 1997
) and that the role of Myb proteins is to provide plasticity for plant metabolism and development (ROMERO et al. 1998
).
Although few genes encoding R2R3 Myb-domain proteins have so far been identified in monocots, there is little reason to believe that there are significantly fewer Myb genes in monocots than in dicots. However, sequence data on monocot Myb genes should provide key information on the patterns of evolution in the Myb gene family. We investigated the phylogenetic relationships among expressed R2R3 Myb genes in the model monocot Zea mays. Sequence analyses of the region between the conserved DNA-recognition helices of Myb domains (MybBRH) revealed that maize expresses >82 R2R3 Myb genes. The alignment of the corresponding amino acid sequences provides novel insights into plant Myb-domain sequence conservation and confirms that maize and animal Myb domains have substantial differences. Based upon the accumulation of synonymous and nonsynonymous substitutions in the maize MybBRH sequences, we found that the dramatic expansion of the R2R3 Myb gene family in higher plants occurred within the past 500 million years. Surprisingly, our data suggest that there has also been a very recent expansion of the R2R3 Myb gene family in maize, which is not evident in Arabidopsis. The recent duplications of maize Myb genes suggest that the expansion of this family of transcription factors could be directly related to plant evolution and diversity. Maize and other grasses may provide excellent model systems for understanding the evolution of novel functions after Myb gene duplication.
 | MATERIALS AND METHODS |
|---|
Materials:
The B73 maize inbred line was used for these studies. Maize tissues included the following: 18 days after pollination (dap) ears, including pericarp, cob glumes, aleurone, endosperm, and embryo tissues; roots from ~15-day-old seedlings grown in sterile soil; 15-day-old seedlings; whole tassels 24 days prior to anthesis; and nonpollinated silks immediately after emergence from husks.
The primers used for PCR amplification corresponding to the sequences encoding the DNA-recognition helices of R2R3 Myb domains (Figure 1) were:
- pMyb5', 5'-AARWSNTGYMGNYTNMGNTGG-3'

View larger version (11K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 1.
Myb-domain structure indicating the region from which the MybBRH sequences are derived. Arrows indicate the primers used for MybBRH sequence amplification. R1, R2, and R3 indicate each of the three Myb repeats. Helices forming the double helix-turn-helix motif characteristic of R2R3 Myb domains are shown shadowed with the DNA-recognition helices in black. The numbers correspond to the residues of the P gene (GROTEWOLD et al. 1991 ), used as a reference here. The Myb-domain region represented by the MybBRH segment corresponds to the last five residues (approximately residues 5862) of the R2 DNA-recognition helix, the flexible linker region that joins the R2 and R3 Myb repeats (approximately residues 6371), and the first and second helices of the R3 Myb repeat.
|
|
- pMyb3', 5'-CCARTARTTYTTNAYNTSRTTRTC-3'
- vMyb5', 5'-TGYMGNGARMGNTGGCAYAAYCAY-3'
- vMyb3', 5'-CCARTGRTTYTTNAYNSHRTTRTCNGT-3'
with R = A/G, W = A/T, M = A/C, H = A/T/C, Y = C/T, S = G/C, and N any base.
RNA isolation and cDNA synthesis:
Total RNA was isolated by pulverizing frozen maize tissue in liquid nitrogen and extracting total RNA using the Trizol reagent (GIBCO, Grand Island, NY) according to the manufacturer's instructions. Messenger RNA was isolated using streptavidin-coated paramagnetic beads (PolyATract mRNA isolation system III; Promega, Madison, WI). First-strand cDNA synthesis was done by incubating 13 µg of poly(A)+ RNA with 0.25 µg of random hexadeoxynucleotides (Promega) in 12 µl of water at 70° for 10 min, chilling in ice, and carrying out reverse transcription using SuperscriptII (GIBCO-BRL, Gaithersburg, MD), following the manufacturer's instructions, in a 20-µl reaction.
Amplification and cloning of PCR products:
PCR was carried out using 0.51 µl of cDNA and 125 pmol of degenerate primers in a 25-µl reaction volume containing 200 µM of the four dNTPs, 3 mM MgCl2, 1 unit of Taq polymerase (Perkin Elmer-Cetus, Norwalk, CT) in the manufacturer-provided buffer. Thermocycling conditions were 5 min at 95°, then 40 cycles of 1 min at 95°, 1 min at 62°, and 30 sec at 72°, and a final extension of 20 min at 72°. Using this approach, 50250 ng of a single band of ~180-bp PCR product was obtained, which was separated on 1.5% agarose gels. Negative controls with only the primers were always included to ensure that no contamination with other Myb genes frequently used in the lab had occurred. No PCR products were detected when the mRNA was amplified prior to reverse transcription indicating minor, if any, DNA contamination. The 180-bp band was cut off the gel together with a small agarose region (24 mm) surrounding the band and extracted using a QIAquick (QIAGEN, Chatsworth, CA) spin column. Bands were cloned into the pT-Adv vector (CLONTECH, Palo Alto, CA), and QIAprep plasmid DNA samples were sequenced using the ABI PRISM dye terminator system (dRhodamine; Perkin Elmer) and an ABI automated sequencer.
Sequence analysis:
Sequences obtained were analyzed using the IntelliGenetics GeneWorks 2.5 and Oxford Molecular MacVector 6.0 programs. Homology searches with databases were done using Blast 2.0 searches of the "nr" database at NCBI (http://www.ncbi.nlm.nih.gov). Nucleotide sequences were translated into inferred amino acid sequences.
Phylogenetic analysis:
The amino acid sequences were aligned using Clustal V (HIGGINS et al. 1992
). The results were adjusted to obtain the best alignment for each region based on conserved sequences. For the large-scale analysis of Myb evolution based upon amino acid sequences, PAUP* 4.0b1 (SWOFFORD 1998
) was used for maximum-parsimony analysis, and PAUP* 4.0b1, PHYLIP 3.57 (FELSENSTEIN 1993
), and PUZZLE 4.0 (STRIMMER and VON HAESELER 1998
) were used for distance analyses. The maximum-parsimony bootstrap analysis used 2000 bootstrap replicates. Each bootstrap replicate used 100 random addition sequence replicates with no branch swapping, although this strategy (bootstrapping without branch swapping) may underestimate the actual relative nodal support of a clade (WOLFE and DEPAMPHILIS 1998
). Distance analyses of amino acid sequences were conducted using 2000 bootstrap replicates, using PAUP* 4.0b1 to perform neighbor joining of p-distances (mean character differences) or components of PHYLIP 3.57 to generate the bootstrap datasets and perform neighbor joining of distances estimated using the PAM, JTT, and JTT +
models of amino acid evolution (DAYHOFF et al. 1978
; JONES et al. 1992
). JTT distance estimates were obtained using PUZZLE 4.0. Exact ties in neighbor-joining trees were broken randomly in both PAUP* 4.0b1 or the neighbor-joining program from PHYLIP 3.57. The shape parameter of the
-distribution (designated
, estimated to be 0.62) used to describe site-to-site rate variation in these amino acid sequences was estimated using maximum likelihood in PUZZLE 4.0, using a four-category discrete approximation to a continuous
-distribution (YANG 1994
). Model fit was examined using the likelihood-ratio test, with likelihoods calculated using PUZZLE 4.0, by computing a test statistic (
= 2[ln L1 - ln L0]) that can be compared to a
2 distribution (the degrees of freedom correspond to the number of additional free parameters in the model; see HUELSENBECK et al. 1996
). Although models that are not nested (such as the JTT and PAM models of amino acid evolution) cannot be tested in this manner, the model that has the highest likelihood is preferred.
Phylogenetic analyses of nucleotide sequences using a more limited set of aligned sequences employed PAUP 4.0b1 (SWOFFORD 1998
) to identify phylogenetic trees, using the unweighted maximum-parsimony as well as the minimum-evolution criteria using F81 +
distance estimates (SWOFFORD et al. 1996
). Maximum-parsimony bootstrap analyses used branch-and-bound searches while the distance bootstrap analyses used neighbor-joining trees estimated from F81 +
distances (using
= 0.67, calculated from the nucleotide alignment). The
-parameter of the
-distribution was estimated using the F81 model with a four-category discrete approximation to a continuous
-distribution in PAUP 4.0b1. Molecular clock analyses with complete MybBRH nucleotide sequences were conducted as described (KIMBALL et al. 1997
), except PUZZLE 4.0 was used for the likelihood calculations. To "linearize" this tree (TAKEZAKI et al. 1995
), four rapidly evolving MybBRH sequences (1C47, IM44, IF25, and IF35) were removed.
Synonymous and nonsynonymous distances (KS and KA, respectively) were calculated using MEGA 1.01 (KUMAR et al. 1993
) based upon the method proposed previously (NEI and GOJOBORI 1986
). Because of the extreme codon bias in maize MybBRH sequences, we report synonymous and nonsynonymous p-distances. An additional benefit of using p-distances is the low variance of this measure, which is important when comparing relatively short sequences. The codon bias of the MybBRH sequences was examined by extracting codon positions expected to be degenerate and nondegenerate based upon the genetic code and evaluating the composition of these positions using MacDNAsis version 2.0 (Hitachi Ltd., San Bruno, CA). Twofold and fourfold degenerate codon positions exhibit extremely similar compositions (data not shown), so they were not separated for this analysis.
Synonymous and nonsynonymous p-distances were corrected for multiple substitutions when used to compute the ratio of nonsynonymous to synonymous mutations (KA/KS), and the models of sequence evolution used to correct p-distances for multiple substitutions were used to generate saturation plots. The KA/KS ratio can also be used to compute the mean rate of nonsynonymous substitutions, which is calculated as the product of the KA/KS ratio and the rate of synonymous mutations. Nonsynonymous distances were corrected for multiple substitutions based upon Jukes-Cantor (JC) distances with a
-distribution to accommodate among-site rate variation (JIN and NEI 1990
). Saturation for nonsynonymous mutations was calculated independently of the JC +
model of sequence evolution, using an empirical estimate based upon the sum of three-quarters of the proportion of nonsynonymous sites exhibiting fourfold variation, two-thirds of the proportion of nonsynonymous sites exhibiting threefold variation, and one-half of the proportion of nonsynonymous sites exhibiting twofold variation. The variation of nondegenerate codon positions was determined using MybBRH sequences from both maize and A. thaliana. Synonymous distances were assumed to accumulate according to the F81 model of sequence evolution (SWOFFORD et al. 1996
), which is appropriate for sequences with biased nucleotide composition that do not exhibit a highly biased transition/transversion ratio (MUSE 1996
). The expected saturation for synonymous mutations corresponds to one minus the sum of the squares of the proportion of each nucleotide (SWOFFORD et al. 1996
). Unsaturated comparisons were considered to be those with synonymous p-distances
0.3 and nonsynonymous p-distances
0.1. This corresponds to a corrected synonymous distance of 0.4210 substitutions per synonymous site and a corrected nonsynonymous distance of 0.1193 substitutions per nonsynonymous site.
 | RESULTS |
|---|
Identification of R2R3 Myb genes expressed in maize:
To estimate the number of R2R3 Myb genes expressed in maize, we made use of the very high conservation of the two DNA-recognition helices of R2R3 Myb domains. DNA fragments encoding regions of Myb domains flanked by the two DNA recognition helices of R2 and R3 (MybBRH, see Figure 1) were amplified by PCR, cloned, and sequenced. RT-PCR was carried out on mRNA extracted from several maize organs (see MATERIALS AND METHODS) using the degenerate primers pMyb5' and pMyb3' that correspond to the DNA-recognition helices of all plant Myb-domain proteins aligned by Avila and coworkers (AVILA et al. 1993
). When primers corresponding to the DNA-recognition helices of animal Myb-domain proteins were used (vMyb5' and vMyb3'), no detectable PCR product was obtained (not shown). The 180-bp PCR products were extracted from agarose gels, making sure to include potentially less-represented fragments of slightly larger or smaller sizes. After cloning of the PCR products, the sequences were determined. To decide if any given clone corresponded to a possible Myb domain, we translated the coding strand (determined by the primers) in three frames and analyzed for the presence of residues that are invariant in all Myb-domain proteins identified to date. Less than 2% of all the sequences obtained showed no homology to Myb in any of the three possible reading frames and were discarded. From a total of 480 Myb sequences analyzed, 71 clones representing 30 different sequences were obtained from 18 dap ears, 120 clones representing 36 different sequences were obtained from roots, 131 clones representing 32 different sequences were obtained from seedlings, 85 clones representing 21 different sequences were obtained from silks, and 73 clones representing 28 different sequences were obtained from immature tassels (Table 1). We are confident that most, if not all, of these sequences are derived from RNA and not from contaminant genomic DNA. First, no amplification was obtained in control RNA not treated with reverse transcriptase (see MATERIALS AND METHODS). And second, the primers used for RT-PCR in this study flank the position of a conserved intron found in all maize Myb genes characterized to date. Indeed, <10% of the 80+ Arabidopsis Myb sequences characterized lack an intron in the MybBRH region (ROMERO et al. 1998
). The lack of introns in all of the sequences characterized here suggests that they are derived from RNA and not from DNA.
Analysis of the MybBRH region encoded by maize R2R3 Myb genes:
The MybBRH region is formed by the last five residues (residues 5862, using the maize P gene sequence as a reference; see Figure 1) of the R2 DNA-recognition helix, the flexible linker region (approximately residues 6371) that joins the R2 and R3 Myb repeats, and the first and second helices of the R3 Myb repeat (Figure 1). The sequence of the DNA-recognition helices of Myb domains is highly conserved, and residue changes in this region often impair DNA-binding activity (GABRIELSEN et al. 1991
; WILLIAMS and GROTEWOLD 1997
). The linker plays a fundamental role in positioning the DNA-recognition helices on the DNA (HEGVOLD and GABRIELSEN 1996
) by providing flexibility between the R2 and R3 Myb repeats (VAN AALTEN et al. 1998
). The limited sequence conservation in the region between the linker and the R3 DNA-recognition helix makes this part of the R2R3 Myb domain ideal for evolutionary studies. The proteins encoded by the C1 and Pl genes, which arose by a duplication event associated with the allotetraploidization of maize (GAUT and DOEBLEY 1997
), play identical functions in the control of maize flavonoid biosynthesis (CONE et al. 1993
). Yet, the MybBRH region of C1 and Pl have two amino acid differences (Figure 2).
In our analysis we identified five pairs of MybBRH sequences with identical amino acid sequences but encoded by different genes as deduced from the corresponding nucleotide sequences. These correspond to ZmMYB-IP20, which has an identical nucleotide sequence as the MybBRH region of the maize P gene (CHOPRA et al. 1996
) and is identical at the amino acid level to ZmMYB-IF8, ZmMYB-IF50 is identical to ZmMYB-IP26, ZmMYB-1C18 is identical to ZmMYB-IF55, ZmMYB-IM65 is identical to ZmMYB-IP47, and ZmMYB-IP71 is identical to ZmMYB-IP45 (Figure 2).
As expected from previous studies (WILLIAMS and GROTEWOLD 1997
), the residues corresponding to the R2 DNA-recognition helix (Figure 1) are very highly conserved in the 82 sequences analyzed here (Figure 2). The maize sequences show most often a Tyr residue at position 60 similar to Arabidopsis but different from the characteristic His residue present in animal Myb domains. Three maize Myb sequences (ZmMYB-IM31, ZmMYB-IP102, and ZmMYB-IP124) have His or Lys residues at this position. The conserved Arg residue at position 62 is replaced in a few maize R2R3 Myb domains by a His residue, but never by Asn as in animal Myb domains. However, some Arabidopsis sequences have Asn residues at this position (ROMERO et al. 1998
). None of the maize sequences characterized shows at this position the Ser residue found in the Arabidopsis GL1 protein, which is required for normal trichome formation (OPPENHEIMER et al. 1991
), or in several other Arabidopsis Myb domains phylogenetically unrelated to GL1 (ROMERO et al. 1998
).
Little is known about sequence conservation or requirements for the flexible linker. A conserved Pro residue is found at position 63 in both animal and plant Myb domains (WILLIAMS and GROTEWOLD 1997
). The change of Pro63 to Ala produced a dramatic reduction in the DNA-binding activity of animal Myb domains (HEGVOLD and GABRIELSEN 1996
), and this position was implicated in controlling the fluctuations of the Myb repeats with respect to each other (VAN AALTEN et al. 1998
). Consistently, the change of Pro63 to Ala would increase the flexibility of the linker. However, a subset of maize Myb domains, including that of the P protein, have Ala instead of Pro at that position (Figure 2). Despite this change, the P protein binds DNA with high affinity (WILLIAMS and GROTEWOLD 1997
). No other residues besides Pro and Ala are found at position 63 among the maize R2R3 Myb domains identified in this study. In 9 of the 10 R2R3 Myb domains where Pro63 is changed to Ala, there is also a Thr residue at position 84, which is very rarely found in other maize R2R3 Myb domains (Figure 2). Whether Thr84 helps to stabilize the interaction between the two Myb repeats in the presence of Ala63 remains to be determined, although it is unlikely given the distant positions of the residues corresponding to Pro63 and Thr84 in the deduced structure of the R2R3 Myb domain of c-Myb (OGATA et al. 1994
). All the maize Myb domains with an alanine residue at position 63 form a gene clade (Figure 3), and ZmMYB-2H67, which has only the proline 63 to alanine change, but not the Thr84 residue, is the most basal one, suggesting a two-step appearance of Ala63 and Thr84. Consistently, no Myb sequence was found in Arabidopsis containing this Ala63, although AtMYB11 and AtMYB12 have a serine residue at this position, being among a few Arabidopsis sequences with a Thr residue at position 84 (ROMERO et al. 1998
). From >200 plant R2R3 Myb sequences present in the database, a cotton gene encoding an R2R3 Myb protein (Cmy-J, accession no. AF034132) provides the only other example outside of maize with the Pro63 to Ala change. In the cotton Myb, the residue corresponding to position 84 is also a Thr.
A landmark of Myb domains is the presence of three periodic Trp residues in each Myb repeat (SAIKUMAR et al. 1990
). All plant R2R3 Myb domains analyzed to date have the first Trp residue of the second Myb repeat replaced by a hydrophobic amino acid (ROMERO et al. 1998
). Our analysis of the 82 maize Myb domains confirms this difference between plant and animal Myb domains, because most of the sequences analyzed here have either Phe, Ile, Leu, or Met residues at position 70. Exceptions are ZmMYB-HX22, ZmMYB-IF45, and ZmMYB-IP119, which have a Tyr residue instead. The change of the conserved Trp residue to either Phe, Val, or Leu has little effect on the DNA-binding activity of animal Myb domains (KANEI-ISHII et al. 1990
). However, the effect of changing the Trp residue to Tyr has not been reported. Even more dramatic is the presence of Lys residues at position 70 in ZmMYB-IP102 and ZmMYB-IP124. The effect of this nonconservative residue change on the overall structure and DNA-binding activity of Myb domains remains to be determined.
Pro97 is highly conserved in both animal as well as plant Myb domains (Figure 2). However, two maize sequences, ZmMYB-2H19 and ZmMYB-IM31, were found in which this conserved Pro residue was replaced by either Glu or Ser, respectively. The effect of this difference on Myb-domain structure or function is not yet known.
We expected that the MybBRH region would be highly conserved in length. The insertion of three Ala residues in the hinge region of v-Myb completely abolishes DNA binding (E. GROTEWOLD, unpublished results), and all R2R3 Myb domains characterized to date (ROMERO et al. 1998
) have the same distance between the DNA-recognition helices. Therefore, it was surprising to identify three sequences that showed an extra amino acid in the first helix of R3 (between residues 77 and 78). ZmMYB-IP102 and ZmMYB-IP124 have an additional Val at this position, whereas ZmMYB-IP50 has a Met residue inserted. Because at least one of these sequences appeared in multiple PCR reactions and all of these sequences have an insertion corresponding to exactly three nucleotides (Table 1), they are unlikely to represent PCR artifacts. ZmMYB-IP50 is identical at the nucleotide level with the MybBRH region of C1, with the exception of the ATG insertion that adds an additional Met residue. The B73 inbred line used for these studies has no C1 function, and the lesion in C1 in B73 has not been identified, raising the question of the relationship between C1 and ZmMYB-IP50. ZmMYB-IP50 was found once in seedlings, a tissue where C1 is normally not expected to be expressed, suggesting that ZmMYB-IP50 may correspond to a C1-related R2R3 Myb-domain protein, but probably not to C1 itself. The effect of a residue insertion between residues 77 and 78 on Myb-domain structure or DNA-binding activity is not known. ZmMYB-IF35 and ZmMYB-IQ68 have one residue less in the MybBRH region (position 72). Both sequences were identified once in tassels (Table 1), and they are different from each other at several positions (Figure 2); thus it is unlikely that these differences are due to PCR or cloning artifacts. Vertebrate R2 Myb repeats have a residue less than plant R2 repeats, and this difference in length is essential for DNA binding (WILLIAMS and GROTEWOLD 1997
). Perhaps small variations in the length of Myb repeats change the properties of the corresponding Myb domains, contributing to their biological specificity. Whether the Myb-domain proteins corresponding to ZmMYB-IP102, ZmMYB-IP124, ZmMYB-IP50, ZmMYB-IF35, and ZmMYB-IQ68 have unique DNA-binding activities or have lost their DNA-binding activity because of the length difference between the corresponding DNA-recognition helices remains to be determined. The studies shown here strongly support the notion that plant and animal Myb domains have extensive differences within the conserved Myb-homologous DNA-binding domain.
Sequence similarity of MybBRH sequences with other plant Myb genes:
We investigated the sequence similarity of the maize MybBRH sequences to other Myb-domain proteins in the nr database available from the National Center for Biotechnology Information. ZmMYB-IF8 showed 100% identity at the nucleotide level with P-wr (CHOPRA et al. 1996
), the allele of the P gene expressed in the B73 line used for these studies. The expression pattern of ZmMYB-IF8 coincides with the expression of P in pericarp and silk tissues, although at lower levels we also found sequences identical to ZmMYB-IF8 in the roots and seedling tissues (Table 1). The seedlings include the coleoptile, where P-regulated 3-deoxy flavonoids have been reported (STYLES and CESKA 1981
). To our knowledge, P expression in roots has never been rigorously examined.
ZmMYB-IP20 and ZmMYB-1H48 encode products identical to P at the amino acid level but with one and two differences, respectively, at the nucleotide level (not shown). While ZmMYB-1H48 was isolated just once, ZmMYB-IP20 was found three times in different PCR reactions, making the possibility of repeated errors introduced by the Taq polymerase highly unlikely. ZmMYB-IP20 was found only in silks, one of the tissues where P is also expressed. ZmMYB-IP20 could correspond to a gene encoding a protein highly similar to P, or alternatively, it could derive from one of the several repeated P copies that form the P-wr allele (CHOPRA et al. 1998
) that is present in the B73 line.
ZmMYB-IM49 is identical at the amino acid level with the maize Myb-domain protein Zm38, but these two sequences have three silent nucleotide differences. Although Zm38 was cloned from a leaf-specific cDNA library (MAROCCO et al. 1989
), its expression pattern was never investigated. Thus, we cannot rule out the possibility that ZmMYB-IM49 corresponds to Zm38 and that the nucleotide differences correspond to polymorphisms.
ZmMYB-IF33 shows a very high sequence similarity with the genes encoding the Mixta proteins from Petunia (myb.Ph3; MUR 1995
) and snapdragon (Mixta; NODA et al. 1994
). Mixta proteins were shown to control petal cell shape (reviewed in MARTIN and PAZ-ARES 1997
). A gene encoding a product very similar to the Mixta proteins from Arabidopsis (A.t. Mixta) was reported (RABINOWICZ et al. 1996
), and an alignment of the Petunia, snapdragon, Arabidopsis, and maize homologous sequences shows that in the MybBRH region, ZmMYB-IF33 has an identical amino acid sequence to Myb.Ph3 and has only one amino acid difference with A.t. Mixta (not shown). ZmMYB-IF33 was identified once in silks and in immature ears, consistent with the reported expression of Mixta in snapdragon flowers (NODA et al. 1994
).
Phylogenetic analysis of maize R2R3 Myb proteins:
Phylogenetic analysis of the sequence between the DNA-recognition helices, equivalent to the MybBRH sequences used in this study and using >80 Arabidopsis R2R3 Myb genes (ROMERO et al. 1998
), identified a tree topology consistent with the presence of subgroups identified by the presence of sequences in the carboxyl-terminal regions of R2R3 Myb proteins (KRANZ et al. 1998
). This suggests that the phylogenetic information content of the 44 amino acids in the MybBRH sequences used in this study is sufficiently strong to reveal historical patterns in the evolution of R2R3 Myb proteins.
To reconstruct the phylogenetic history of maize Myb genes, the MybBRH sequences obtained from maize as a part of this study were combined with a set of 18 MybBRH sequences from A. thaliana and a single MybBRH sequence from the moss Physcomitrella patens. Phylogenetic analyses of these sequences, using the animal c-MybBRH sequence (KLEMPNAUER et al. 1982
) as an outgroup, resulted in the identification of a phylogenetic tree topology that strongly supports the hypothesis that some diversification of the R2R3 Myb gene family occurred prior to the divergence of monocots and dicots (Figure 3). The bootstrap consensus trees estimated using unweighted maximum-parsimony analysis of MybBRH sequences or neighbor joining of several different distance measures (see MATERIALS AND METHODS) contained at least two gene clades containing both monocot and dicot sequences that are well supported (bootstrap proportions
70% in all analyses), and seven gene clades containing both monocot and dicot sequences supported by some analyses (bootstrap proportions
50%). The results of a large-scale analysis of Myb-domain sequences from higher plants (P. D. RABINOWICZ, E. L. BRAUN, R. T. KIMBALL, E. GROTEWOLD and A. D. WOLFE, unpublished results) also support the nesting of monocot and dicot sequences in specific gene clades, strongly supporting the hypothesis that the amplification of the R2R3 Myb gene family began prior to the divergence of monocots and dicots, ~160 mya (see GOREMYKIN et al. 1997
).
The major gene clades containing both monocot and dicot sequences that are suggested by the phylogeny of MybBRH sequences are poorly supported by bootstrap analyses in many cases (bootstrap proportions <50%), suggesting either that sufficient information is not present in the limited number of informative characters present in the MybBRH sequences or that the internal branches defining major gene clades are relatively short. The second possibility could reflect a period of rapid amplification of plant Myb genes (see BRAUN et al. 1998
for a similar example in the annexin gene family of animals). To further explore this issue, specific models of amino acid evolution that differentially weigh different types of substitutions were examined, because they may provide a better estimate of Myb evolution when using the MybBRH sequences. Although the PAM model has been used in some analyses of Myb evolution (KRANZ et al. 1998
), the JTT model of amino acid evolution (JONES et al. 1992
) exhibits a slightly better fit to the data [ln L = -3049.13 (JTT); ln L = -3068.34 (PAM)]. Although the modest differences between these models suggest that both will provide similar estimates of Myb phylogeny, incorporation of among-site rate variation to the JTT model using a discrete approximation to a
-distribution resulted in a significant improvement to the model [ln L = -2814.96 (JTT +
,
= 0.62); ln L = -3049.13 (JTT, no rate heterogeneity);
= 468.34, significant at P < 0.001], suggesting that the JTT +
model represents the best available model of amino acid sequence evolution for the MybBRH region. When this model of sequence evolution is used to estimate genetic distances, there is bootstrap support for a branch at the base of the phylogenetic tree (Figure 3, branch
). Surprisingly, this branch does not separate higher plant MybBRH sequences from the bryophyte (P. patens) MybBRH sequence, suggesting that at least one gene duplication in this set of sequences occurred prior to the divergence of vascular plants from bryophytes, ~450 mya (KENRICK and CRANE 1997
). Taken as a whole, we conclude that the phylogenetic information content of the MybBRH region of R2R3 Myb-domain sequences used in this study is sufficiently strong to resolve some relatively ancient relationships within the Myb gene family (Figure 3).
Substitution rates and codon bias of maize R2R3 Myb genes:
To determine whether the MybBRH region could provide a convenient sequence to evaluate the divergence between specific plant R2R3 Myb sequences, the limitations inherent to evolutionary studies using this region were assessed. A major feature of the maize MybBRH sequences was their nucleotide composition, which exhibited extreme bias in the degenerate codon positions (8.4% A, 5.1% T, 56.0% C, 30.5% G), which is similar to the bias found in other genes from monocotyledonous plants (MURRAY et al. 1989
). The rapid evolution of synonymous sites in plant nuclear genes, estimated to occur with an average rate of 6.5 x 10-9 substitutions per synonymous site per year (GAUT et al. 1996
), suggests that synonymous differences within the MybBRH region became saturated rapidly (Figure 4A). Given the high sampling variance resulting from the limited length of the MybBRH region examined in this study, the extreme codon bias of these sequences should limit the usefulness of synonymous distances to relatively recent gene duplications, such as those that occurred within the past 65 million years (Figure 4A).
Nonsynonymous changes may provide substantial information regarding ancient Myb gene duplications, because they are expected to accumulate at a much slower rate than synonymous differences. The composition of nondegenerate codon positions in the MybBRH region was found to be relatively unbiased (29.0% A, 19.7% T, 22.8% C, 28.5% G). However, using nonsynonymous differences to estimate the divergence times for paralogous Myb genes presents the challenges of accommodating the functional constraints present in the MybBRH region and the variation in the overall rate in which the MybBRH region evolved in distinct Myb paralogs. Estimates of evolutionary distances were obtained in a manner that accommodates among-site differences in evolutionary rates resulting from functional constraints (the J-C model of nucleotide substitution with rates at different sites distributed according to a
-distribution with a shape parameter
= 0.62; see JIN and NEI 1990
). The mean ratio of nonsynonymous to synonymous mutations (KA/KS) within the MybBRH region was estimated to be 0.2780, suggesting that nonsynonymous substitutions in this region accumulate at a mean rate of 1.8 x 10-9 substitutions per nonsynonymous site per year, assuming that the rate of synonymous evolution for these Myb genes is similar to that for other genes in the maize genome (GAUT et al. 1996
).
The mean ratio of nonsynonymous and synonymous distances (KA/KS) was calculated based upon a subset of pairwise comparisons that exhibit unsaturated synonymous distances. The unsaturated comparisons correspond to a total of 67 pairwise comparisons among 51 maize MybBRH sequences, representing genes that arose by duplication in the past 35 million years. In striking contrast, similar analyses using a set of 84 A. thaliana MybBRH sequences revealed only 6 pairwise comparisons that reflect gene duplications that occurred during a similar time frame (data not shown). These results suggest that many of the maize MybBRH sequences analyzed are closely related to at least one other maize sequence obtained as a part of this study and indicate that the smaller number of recent Myb gene duplications that occurred in A. thaliana will not allow a similar calibration of the rate of nonsynonymous mutations based upon sequences from this organism. The mean KA/KS ratio estimated from maize sequences has a high variance (5.812 x 10-2), probably reflecting both the sampling variance of individual distance estimates and differences in the rates at which nonsynonymous mutations accumulate in specific Myb genes (see LI 1985
). Although the range of estimated KA/KS ratios is large, the majority (~60%) of KA/KS estimates are <0.3, and almost all (>90%) of the KA/KS estimates are <0.6. This indicates that the estimated rate of nonsynonymous mutation in the MybBRH sequences is likely to provide useful estimates of divergence times for most Myb genes.
The average rate of nonsynonymous mutation in the MybBRH region suggests that these mutations are unlikely to exhibit saturation within the land plants (Figure 4B) and further suggests that direct comparisons of synonymous and nonsynonymous changes should reveal a rapid accumulation of synonymous differences followed by continued accumulation of nonsynonymous differences (Figure 4C). In fact, direct comparisons of synonymous and nonsynonymous mutations in the MybBRH region reveal the expected pattern of saturation for synonymous substitutions (Figure 5A). In contrast, nonsynonymous substitutions cluster between values of 0.2 and 0.35, much lower than expected for saturation. Based upon the predicted accumulation of nonsynonymous mutations (Figure 4B), we conclude that many of the duplications within the maize R2R3 Myb gene family appear to have occurred between 200 and 550 mya. These results are consistent with the results of phylogenetic analyses, which suggest substantial diversification of the Myb gene family prior to the divergence of monocots and dicots as well as some amplification of the Myb gene family early in the history of land plants (Figure 3).
To further explore possible recent duplications of Myb transcription factors, additional analyses were conducted using the members of the gene clade that contain the proline 63 to alanine substitution (Figure 2 and Figure 3), which includes the well-characterized P gene. Comparisons within this gene clade show substantial variation in the numbers of both synonymous and nonsynonymous divergence (Figure 5A). Surprisingly, the most divergent comparisons within this group do not involve the basal members of the proline-to-alanine gene clade. Instead, they involve a set of genes that appear to show accelerated evolution in the MybBRH region relative to other Myb genes (Figure 5B). These rapidly evolving R2R3 Myb genes are unlikely to represent pseudogenes, because they are expressed (Table 1) and they show accelerated evolution at both synonymous and nonsynonymous sites (Figure 5A).
To confirm the recent diversification of the proline-to-alanine gene clade, divergence times for the members of this group were calculated by combining a maximum-likelihood approach (KIMBALL et al. 1997
) with the use of "linearized" trees, which were produced by removing rapidly evolving taxa (TAKEZAKI et al. 1995
). If the divergence of the proline-to-alanine clade is calibrated using the divergence of C1 and Pl, two paralogous Myb genes that were duplicated as a result of the allotetraploid origin of the maize genome that diverged after the reversion of maize to disomic inheritance ~11.4 mya (GAUT and DOEBLEY 1997
), it is clear that the expansion of the maize proline-to-alanine gene clade began within the grasses (Figure 5C). Because all the members of this gene clade show some rate acceleration relative to C1 and Pl, the divergence time estimates reported here probably represent overestimates of the actual duplication times.
 | DISCUSSION |
|---|
In this study, we demonstrated that the maize genome harbors >80 genes encoding R2R3 Myb-domain proteins with distinct patterns of organ-specific expression. The number of maize R2R3 Myb genes identified here is similar to that found in dicots, providing an indication that the radiation of the R2R3 Myb gene family happened prior to the split between the two major groups of flowering plants. Synonymous and nonsynonymous substitution analyses support this notion and indicate that the major expansion of this family of regulatory proteins occurred after the origin of the land plants. Additional amplification of this gene family occurred fairly recently during plant evolution, clearly after the monocot-dicot separation, and probably prior to the allotetraploid origin of the maize genome. Together with the cellular functions known to be controlled by plant Myb genes, these findings suggest a fundamental role played by R2R3 Myb genes in the evolution of plant form and metabolic plasticity.
Maize expresses >82 R2R3 Myb genes:
Only five genes encoding R2R3 Myb-domain proteins have been identified in maize to date, corresponding to P, C1, Pl, Zm1, and Zm38 (PAZ-ARES et al. 1987
; MAROCCO et al. 1989
; GROTEWOLD et al. 1991
; CONE et al. 1993
). All of these gene products have been implicated directly or indirectly in the control of flavonoid biosynthesis. To date, ~20 genes potentially encoding rice R2R3 Myb-domain proteins have been deposited in the database (Figure 6), despite the large amount of DNA sequence information available for this monocot. No candidate genes for functions controlled by Myb-domain proteins in other plants, such as cell fate or cell shape determination, have been yet identified in monocots (MARTIN and PAZ-ARES 1997
). This could reflect that monocots have significantly fewer R2R3 Myb genes than dicots or that monocot Myb genes have not been actively searched for.
Using RT-PCR from RNA extracted from various plant organs, we demonstrated that maize expresses at least 82 genes encoding R2R3 Myb proteins. While this number is in the same order as reported for Arabidopsis (ROMERO et al. 1998
), maize probably encodes many more than 82 R2R3 Myb genes. First, in this study we identified a large number of sequences just once (Table 1), suggesting that the screen for expressed R2R3 Myb genes was not carried out to saturation. Second, while the degenerate primers used here recognize most R2R3 Myb genes characterized to date, the Antirrhinum PHANTASTICA gene, which has a highly unusual R3 DNA-recognition helix (WAITES et al. 1998
), provides an example of a Myb gene that would not have been amplified by these primers. Third, Mutator insertions in three Myb genes not represented in the sequences reported here were characterized (P. D. RABINOWICZ and E. GROTEWOLD, unpublished results). And last, conditions reported as inducers of Myb gene expression, including dehydration (URAO et al. 1993
), hormones (GUBLER et al. 1995
), and virus infection (YANG and KLESSIG 1996
) were not tested here. From 100 or more R2R3 Myb genes proposed to be present in the Arabidopsis genome, only 36 were identified as being expressed in the number of developmental and environmental conditions tested (KRANZ et al. 1998
). If a similar situation is true in maize, we expect that maize encodes many more than 200 R2R3 Myb genes, in agreement with the number of Myb-homologous sequences identified in a preliminary analysis of the Pioneer Hi-Bred International EST database (not shown). Nevertheless, the identification of >80 R2R3 Myb genes in maize makes this the largest family of expressed regulatory proteins identified in any given organism to date.
While the RT-PCR method used in this study does not provide an accurate estimate of tissue specificity or expression levels between different Myb genes, it does render evidence of whether a particular Myb gene is expressed in any given organ. A subset of the identified Myb genes was detected in just one organ (Table 1). Most sequences were found in at least two different organs, and just a few were present in all organs studied. The frequencies at which MybBRH sequences were found in our analysis extend over a wide range (Table 1). These differences could illustrate genuine contrasts in the levels of Myb mRNA accumulation, suggesting that Myb genes can be expressed at levels over two orders of magnitude different. However, the frequency at which different MybBRH sequences were identified could reflect, in part, the dissimilar annealing rates of the degenerate population of primers used for the PCR amplification. Sequences corresponding to ubiquitously found Myb genes were not amplified from each organ at levels higher than organ-specific ones. For example, ZmMYB-IM61 was found at similar low levels in all five organs studied, whereas ZmMYB-IP21, found only in roots and silks, is represented in both tissues at very high levels (Table 1). While these results provide preliminary evidence on the expression patterns of the maize R2R3 Myb genes, they should facilitate the isolation of cDNA clones and the investigation of the cellular effects caused by Mutator insertions in maize Myb genes (P. D. RABINOWICZ and E. GROTEWOLD, unpublished results). Our results on the expression of maize Myb genes are consistent with previous findings in Arabidopsis, where Myb genes show a wide range of expression patterns (KRANZ et al. 1998
).
There was no apparent correlation between gene expression patterns and phylogenetic relationships among the maize R2R3 Myb genes, which is similar to the results found for Arabidopsis Myb genes (KRANZ et al. 1998
). This could be either because each of the plant parts used in these studies is constituted by a number of different tissues or because expression patterns and phylogenetic relationships are not associated for Myb-domain proteins.
The MybBRH region as a probe to assess R2R3 Myb gene evolution:
The ability to easily amplify and clone the MybBRH region by RT-PCR provides a singular advantage to the R2R3 Myb family of regulatory genes to investigate their number, expression, and evolution, as was previously demonstrated for the Arabidopsis R2R3 Myb genes (ROMERO et al. 1998
). Although the MybBRH region consists of only 4244 amino acids (126132 nucleotides), the phylogenetic signal in this region is sufficiently strong to resolve evolutionary relationships among a large number of different Myb genes (Figure 3; ROMERO et al. 1998
). The variation at the amino acid level is high enough that even the C1 and Pl genes, which have a very recent common evolutionary origin (GAUT and DOEBLEY 1997
) with duplicated functions in the control of anthocyanin biosynthesis (CONE et al. 1993
), encode proteins with two amino acid differences within this region (Figure 2), indicating that the MybBRH sequence may provide a convenient "fingerprint" of R2R3 Myb gene identity. Consistent with this, only five groups of sequences with identical amino acid but different nucleotide sequences were found among the 82 R2R3 Myb sequences described here (Figure 2 and Table 1).
Synonymous and nonsynonymous substitution rates furnish powerful tools to investigate the evolutionary history of gene families. While synonymous substitution rates are expected to provide the best correlation with time, the extreme codon bias present in maize and the high sampling variance of synonymous p-distances for MybBRH sequences limit their use to relatively recent divergences, such as those that occurred during the last 65 million years (Figure 4A). Furthermore, as we demonstrated for a subset of the proline-to-alanine Myb proteins (Figure 5B), there may be currently unappreciated sources of rate variation at synonymous sites. In contrast to synonymous sites, nonsynonymous sites are not predicted to exhibit saturation within the land plants (Figure 4B), which have an evolutionary origin ~450 mya (KENRICK and CRANE 1997
). However, the use of nonsynonymous mutations as a measure of time for gene families must be taken with care, because the rates at which nonsynonymous sites of specific paralogs evolve may differ in response to changes in the functional constraints (LI 1985
). In fact, a large variance for the ratio of nonsynonymous to synonymous substitutions was noted for the set of comparisons involving unsaturated synonymous distances (see RESULTS), suggesting that there is some variation in the rate at which nonsynonymous mutations occur in different Myb paralogs. Despite this rate variation, the clustering of the nonsynonymous p-distances for most comparisons does suggest that these values reflect the duplication of many Myb genes during a specific period of land plant evolution prior to the radiation of the flowering plants.
Amplification of the plant Myb gene family:
The observation that maize encodes a number of R2R3 Myb genes in the same order of magnitude as found in the Arabidopsis genome (ROMERO et al. 1998
) suggests that amplification of this gene family occurred prior to the split between monocots and dicots, the two major groups of flowering plants thought to have separated 160 mya (GOREMYKIN et al. 1997
). Indeed, this is consistent with the observed nonsynonymous p-distances among the maize MybBRH sequences (Figure 5A), which center the majority of values at a time somewhere between 250 mya and 550 mya (Figure 4B). This supports the idea that the amplification of R2R3 Myb genes is a phenomenon unique to the plant kingdom. Based on the observation that only two R2R3 Myb genes were found in the moss P. patens (LEECH et al. 1993
), the time of the major amplification of this family is likely to have happened within the past 400 million years, after the divergence of vascular plants from the bryophytes (Figure 6).
Based on the now well-substantiated hypothesis that the major amplification of Myb genes happened a long time ago, we expected to find a recent set of Myb gene duplication associated with the allotetraploid origin of the maize genome (GAUT and DOEBLEY 1997
). Several groups of maize R2R3 Myb genes appear to have originated recently, around the time at which several other grasses diverged (Figure 6) or close to the time at which the maize genome duplicated (GAUT and DOEBLEY 1997
). Among 44 recently duplicated Myb genes (BRAUN and GROTEWOLD 1999
), we found 10 groups of Myb sequences that are likely to have undergone duplication during the allotetraploid origin of the maize genome. Five of these groups have three or more member sequences, indicating the existence of recent Myb gene amplifications that do not reflect the maize genome duplication. The clade containing the proline-to-alanine change (P-to-A clade, Figure 2) particularly attracted our attention, because this residue change was not found in any of the 100 Arabidopsis Myb genes reported, although an equivalent change is present in a cotton R2R3 Myb gene (GhMyb-J). Moreover, this clade contains the P gene, which activates a subset of maize flavonoid biosynthetic genes and controls the accumulation of 3-deoxy flavonoids and the phlobaphene pigments. P was recently shown to be sufficient for the biosynthesis of C-glycosyl flavones, important compounds with insecticidal activity in maize silks (BYRNE et al. 1996
; GROTEWOLD et al. 1998
). While the evolutionary origin of P is unknown, its role as a regulator of a branch of flavonoid biosynthesis appears to be novel and probably restricted to the grasses. According to our studies, the P-to-A clade has at least 10 members (Figure 2 and Figure 3). The first duplication within this clade occurred ~3040 mya (Figure 5C), earlier than the reported duplication of C1 and Pl. Subsequent duplications within this clade appear to have occurred in much more recent times, and it is possible that one or more of the duplications present in this clade were associated with the duplication of the maize genome (GAUT and DOEBLEY 1997
). Differences in the rates at which some of the sequences in this clade are evolving make the estimation of duplication times difficult, but our results suggest that they must have occurred within the past 20 million years.
The mechanisms by which R2R3 Myb genes amplified are not known nor can they be deduced from our studies. Local gene duplications followed by dispersion of similar genes is one possible mechanism. In Arabidopsis, R2R3 Myb genes are spread throughout the genome, not forming evident clusters (KRANZ et al. 1998
), consistent with synonymous and nonsynonymous substitution rates among Arabidopsis sequences that indicate that most Myb gene duplications in Arabidopsis are relatively ancient (E. L. BRAUN and E. GROTEWOLD, unpublished results). Little information is available on the map position of maize Myb genes. P maps to chromosome 1S, C1 to 9S, and Pl to 6L. Two additional maize Myb genes that carry Mutator insertions have been mapped to 2L and 5L, respectively (P. D. RABINOWICZ and E. GROTEWOLD, unpublished results).
An interesting case that may help to understand the processes associated with R2R3 Myb gene amplification is provided by the P-wr allele of the P gene. The P-wr allele is composed of six gene copies arranged in a tandem head-to-tail array, and the amplification may have occurred very recently, probably even after the domestication of modern maize (CHOPRA et al. 1998
). The P-wr allele may constitute an early intermediate in the amplification of R2R3 Myb genes. Dispersion caused by transposable element insertions and recombination events would be followed by functional divergence, creating additional R2R3 Myb genes with, at some point, novel regulatory roles.
Several lines of evidence indicate that members of the P-to-A clade are not derived from the tandemly repeated copies of the P gene found in the P-wr allele present in the B73 inbred line used in this study. First, the analysis of a longer cDNA sequence corresponding to ZmMYB-IM44 shows very limited homology to P outside of the Myb domain (not shown). Second, members of the P-to-A clade have very distinct expression patterns, as deduced from the RT-PCR analyses shown in Table 1, which would not be expected if they derived from the highly conserved repeats present in the P-wr allele (CHOPRA et al. 1998
). And last, the level of divergence among the members of the P-to-A clade is inconsistent with previous findings that suggest that this amplification of P-wr may have occurred within the past 200 years (CHOPRA et al. 1998
).
Myb-domain proteins and plant evolution:
Our finding that maize expresses a large number of R2R3 Myb-domain proteins, similar to Arabidopsis and other dicots (ROMERO et al. 1998
), demonstrates that R2R3 Myb genes constitute a very large gene family in higher plants. This is in sharp contrast to what has been found in animals, fungi, or slime molds (Figure 6), which appear to harbor only a handful of R2R3-related Myb-domain proteins (reviewed in LIPSICK 1996
). Only two Myb-homologous sequences were identified in the moss P. patens (LEECH et al.