We have analyzed the mitochondrial editing behavior of two Arabidopsis thaliana accessions, Landsberg erecta (Ler) and Columbia (Col). A survey of 362 C-to-U editing sites in 33 mitochondrial genes was conducted on RNA extracted from rosette leaves. We detected 67 new editing events in A. thaliana rosette leaves that had not been observed in a prior study of mitochondrial editing in suspension cultures. Furthermore, 37 of the 441 C-to-U editing events reported in A. thaliana suspension cultures were not observed in rosette leaves. Forty editing sites that are polymorphic in extent of editing were detected between Col and Ler. Silent editing sites, which do not change the encoded amino acid, were found in a large excess compared to nonsilent sites among the editing events that differed between accessions and between tissue types. Dominance relationships were assessed for 15 of the most polymorphic sites by evaluating the editing values of the reciprocal hybrids. Dominance is more common in nonsilent sites than in silent sites, while additivity was observed only in silent sites. A maternal effect was detected for 8 sites. QTL mapping with recombinant inbred lines detected 12 major QTL for 11 of the 13 editing traits analyzed, demonstrating that efficiency of editing of individual mitochondrial C targets is generally governed by a major factor.
IN vascular plants, organelle transcripts are modified by C-to-U editing. While ∼30 C-to-U editing events are typically found in vascular plant chloroplast transcriptomes (Maier et al. 1995; Tillich et al. 2005), hundreds of cytosine residues are subject to editing in the mitochondria of flowering plants; 427, 456, 491, and 357 sites have been reported, respectively, in rapeseed (Brassica napus), Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), and sugarbeet (Beta vulgaris) (Giege and Brennicke 1999; Notsu et al. 2002; Handa 2003; Mower and Palmer 2006)
Sequence requirements for recognition and editing of C's to U's in plant organelles have been probed in chloroplasts by analysis of editing of transcripts in chloroplast transgenic plants and in chloroplast extracts (Chaudhuri and Maliga 1996; Hirose and Sugiura 2001). In plant mitochondria, similar studies have been carried out by electroporation of transcripts into isolated mitochondria or by analysis of editing of exogenous RNAs in mitochondrial extracts (Staudinger and Kempken 2003; Takenaka and Brennicke 2003). Sequences within 20–40 nt 5′ and 2–10 nt downstream of the C targets of editing in both organelles have been found to be critical for editing reactions (Choury et al. 2004; Hayes et al. 2006; Van Der Merwe et al. 2006; Verbitskiy et al. 2006; Hayes and Hanson 2007a). Another common feature of editing in both plant organelles is the bias toward having a U upstream and an A downstream of the edited C (Giege and Brennicke 1999; Tillich et al. 2006).
In neither system is there evidence for the involvement of “guide RNAs” in recognition of C targets or catalysis of the editing reaction. In contrast, biochemical and genetic analyses suggest that proteins may mediate recognition of chloroplast editing substrates. Proteins bound to particular chloroplast editing sites could be detected in chloroplast extracts capable of editing exogenously added transcripts (Miyamoto et al. 2002). Two different nuclear-encoded chloroplast proteins, CRR4 and CRR21, were found to be required for editing of two specific C targets in ndhD (Kotera et al. 2005; Okuda et al. 2007), after identification of nuclear genes needed to restore chloroplast NADH dehydrogenase activity to two different Arabidopsis mutants. Both CRR4 and CRR21 are pentatricopeptide repeat (PPR) motif proteins, encoded by members of a large gene family previously observed to be involved in RNA processing and splicing in plant organelles (e.g., Fisk et al. 1999; Meierhoff et al. 2003). Recombinant CRR4 expressed in Escherichia coli bound specifically to a sequence comprising the 25 nucleotides upstream and 10 nucleotides downstream of the ndhD-1 editing site (Okuda et al. 2006). A current hypothesis is that CRR4 and CRR21 are site recognition factors that interact with the elements near their respective C targets to recruit an editing enzyme, reminiscent of the interaction of the RNA-binding protein apobec-1 complementation factor (ACF) with a cytidine deaminase, APOBEC1, at the mammalian apoB editing site (Keegan et al. 2001).
Neither the chloroplast nor the mitochondrial editing enzyme has been identified. No mitochondrial trans-acting editing factor has yet been discovered. Biochemical purification of components of the mitochondrial editing activity using existing editing-competent mitochondrial extracts would be difficult, given the low activities reported so far, i.e., 1.5–3% in the pea lysate to a maximum of 4–7% in the cauliflower system (Neuwirt et al. 2005). Mutant hunts for mitochondrial editing factors may not succeed. While plants with defective chloroplast NADH dehydrogenase can survive (Kotera et al. 2005), plants carrying mutations affecting editing of mitochondrial genes may not be viable.
An alternative to mutant isolation for identification of genes involved in plant processes is analysis of naturally occurring genetic variation. In Arabidopsis thaliana this has become an important strategy for discovery of genes involved in quantitative trait variation, for example, in flowering time (review in Koornneef et al. 2004). The early day-length insensitive (EDI) locus was the first QTL cloned in Arabidopsis and was shown to be the blue light photoreceptor gene cryptochrome 2 (CRY2) (El-Din El-Assal et al. 2001). An appealing approach to identify mitochondrial editing factors is the genetic mapping of an editing polymorphism between two accessions followed by map-based cloning of the gene responsible for the polymorphism. In a previous study, we reported that naturally occurring genetic variation for mitochondrial editing does exist in A. thaliana (Bentolila et al. 2005). A particular site within the mitochondrial transcript ccb206 is differentially edited between two widely used laboratory accessions, Columbia (Col) and Landsberg erecta (Ler), depending on the developmental stage and particular tissue. QTL mapping performed on a population of recombinant inbred lines (RILs) generated from a cross between these two accessions allowed the identification of a major QTL on chromosome 4.
In this study, we extend the analysis of mitochondrial editing genetic variation between Col and Ler by surveying transcripts of the entire set of known mitochondrial genes in the Arabidopsis mitochondrial genome. Dominance relationships were assessed by evaluating the editing values of the polymorphic sites on the F1 hybrids obtained from reciprocal crosses between the two parental accessions. Finally, QTL mapping was performed for a subset of sites exhibiting the highest degree of polymorphism. This report is the first to describe the genetic architecture of mitochondrial editing in A. thaliana.
MATERIALS AND METHODS
Plant material and growth conditions:
Seeds of Arabidopsis (A. thaliana) accession Col-4, Ler-0, and the RILs (CS1899) were obtained from the Arabidopsis Biological Resource Center (ABRC) (http://www.biosci.ohio-state.edu/∼plantbio/Facilities/abrc/abrchome.htm). The F1 seeds were produced by manually crossing the two parental accessions. At the time of the initial experiments, seeds from only one F1, Col × Ler, had been obtained. Seeds from Col, Ler, Col × Ler, and a subset of the 30 most informative RILs (i.e., having the highest recombination over the five chromosomes) were sterilized by bleach treatment and then left half-submerged in the last water wash on a shaker for 4 days at 4°. After cold treatment, ∼10 seeds/line were sown in each individual cell of a 36-cell flat containing Metromix soil. The experimental design comprised three flats, each representing a randomized block containing 33 lines, the two parental accessions, the F1, and the 30 RILs. The flats were put in a greenhouse during the summer of 2005 and shuffled every day to minimize the environmental differences within the greenhouse. After germination, all seedlings but one were removed from each cell. Forty days after germination, four rosette leaves were collected from each plant and stored at −70°. Seven plants died before the collection of their tissue (CS1900, CS1903, CS1957, CS1975, CS1951, CS1911, and CS1959). For these lines only two measurements were performed for each editing site while the editing value was measured three times for the remaining RILs.
Seeds of the Ler × Col F1 hybrid were obtained afterward and grown in the same greenhouse conditions as the material previously described. Ler was grown at the same time as the Ler × Col hybrid and is referred to as Ler2 in the text and figures. Ler2 serves as a parental control to ensure that possible variation in growing conditions did not interfere with the assessment of the editing values of the Ler × Col hybrid.
Measurement of editing values:
Most of the techniques used to measure the editing values of mitochondrial sites are similar to the ones previously described (Bentolila et al. 2005) with some minor changes as follows. Total RNA was extracted from rosette leaves using Trizol reagent (Invitrogen, Carlsbad, CA) according to the manufacturer's instructions. Reverse transcription (RT) was performed with Superscript III RT (Invitrogen) following the manufacturer's protocol, except for the amount of enzyme, which was only one-twentieth of the amount recommended. Primers used for the RT and cDNA amplification of the 33 Arabidopsis mitochondrial genes are given in supplemental Table 1. Bulk sequencing of the RT–PCR products with the same primers used for the amplification of the cDNAs was performed at the Cornell Biotechnology Resource Center facility. cDNA sequences from Ler and Col were compared with the corresponding genomic sequence using Sequencher software version 4.5 (Gene Codes, Ann Arbor, MI). Sequencher also allowed us to compare the electrophoretograms of the cDNA bulk sequences from the two accessions. Poisoned primer extension (PPE) of RT–PCR products and determination of editing values were conducted as previously described (Bentolila et al. 2005), using the PPE gene-site primers (supplemental Table 1).
Statistical analysis and QTL mapping:
Statistical analysis was conducted using Minitab (State College, PA) Statistical Software release 14. Association between qualitative variables, e.g., between the nature of a site silent (S) or nonsilent (NS) and its mono- or polymorphism, was tested by a chi-square (χ2)-test. A χ2-goodness-of-fit test was used whenever it was necessary to test the hypothesis that data were following a multinomial distribution with certain proportions. A two-way ANOVA using the GLM procedure and genotype and repetition as factors was used to determine the dominance relationship of some of the editing polymorphisms. The genotypes included in this analysis were the parental accessions, Col, Ler, and Ler2; the two reciprocal F1's, Col × Ler and Ler × Col; and the midparental value (m) defined as Col − (Col − Ler)/2. Pairwise comparison between these genotypes and m was performed by using Tukey's test; and grouping of the means was determined by a threshold of P < 0.05. In the vast majority of the cases, this assembling was unequivocal, a genotype belonging to only one group. Most importantly, Ler2, the parental control grown at the same time as Ler × Col, was always grouped with Ler, ensuring that any difference found between the two hybrids was not caused by a difference in growing conditions. Confirmation of the relationship of the hybrids to m was given by performing a Dunnett test. Broad-sense heritability was estimated from one-way ANOVA of the editing value for each polymorphic site, using the formula , where is the variance between the RILs, is the variance within a RIL, and r is the average number of measurements per RIL (≈3). Transgression was assessed using a Dunnett test, comparing the means of the most edited parent with those of the RILs outperforming it.
QTL mapping was performed as described in our previous study (Bentolila et al. 2005). Briefly, three different analytical methods, single-point regression analysis (SPA), interval mapping (IM), and composite-interval mapping (CIM) were used to estimate the location and the effects of QTL. SPA and IM were performed by using the Qgene computer program version 3.07 (Nelson 1997) and the Windows QTL cartographer version 2.5 (Wang et al. 2005), while CIM is available only in the latter program. Since both programs gave similar results for the SPA and IM mapping procedures, only the results from QTL cartographer are given. CIM was implemented by using the standard model 6, specifying five cofactors to control for genetic background and a window size of 10 cM that blocked out a region of the genome on either side of the markers flanking the test site. The specific cofactors used in model 6 were obtained by a forward regression. The walking speed for both CIM and IM was set up to 2 cM.
Experimentwise significance thresholds were established by permutation tests (n = 10,000) for the three mapping methods as suggested by Churchill and Doerge (1994). Permutations for SPA were done by the Qgene software while QTL cartographer was used to perform the permutations for IM and CIM.
The markers closest to identified QTL (P < 0.01) were used in a multiple-regression analysis performed by Qgene. The adjusted R2 or proportion of phenotypic variance explained by the model was compared to the heritability of the corresponding trait to estimate the fraction of genetic variance revealed by QTL mapping.
Screening of the mitochondrial transcriptome for editing polymorphisms between Col and Ler:
Comparing the electrophoretograms obtained from bulk sequencing cDNAs of the rosette leaves of the two accessions revealed an editing site within ccb206 that was polymorphic with respect to editing extent (Bentolila et al. 2005). At position 406, the Col electrophoretogram showed two peaks of equal height corresponding to C and T, whereas the Ler electrophoretogram showed a predominance of C. We extended this initial screen by surveying the remaining mitochondrial genes using the same strategy.
cDNAs representing 33 mitochondrial genes were analyzed, encompassing 362 editing sites previously described in suspension cultures (Giege and Brennicke 1999), while 40 such sites were not covered by the PCR primers we used to generate the cDNAs. We found 15 single-nucleotide polymorphisms (SNPs) between some of the cDNA sequences from Col and the corresponding annotated sequences in GenBank (Table 1). All these SNPs were different from the C-to-U conversions that result from editing. Four of these SNPs can be explained by a misannotated splicing site, as the 4 SNPs occur at the junction of two exons; a shift of one nucleotide is sufficient to predict the correct nucleotide. The remaining 11 SNPs were found in 6 mitochondrial genes and could not be explained by a misplaced splicing site. We validated the existence of 9 of these SNPs by sequencing the genomic DNAs from both Col and Ler. Two of the SNPs were not examined in the genomic DNA because they belong to nad5, a gene with trans-spliced introns, thus precluding the use of the primers used for the RT–PCR to amplify the genomic DNA.
The finding of nine verified SNPs between our sequencing data and the corresponding sequences from GenBank (Unseld et al. 1997) raises the possibility that the accession used by these authors was not Col. Indeed, a previous report from the same group mentions C24 as the accession used to generate the cosmid libraries that were later used to sequence the Arabidopsis mitochondrial genome (Klein et al. 1994). Thus it is likely that the SNPs we detected are genuine genomic DNA polymorphisms between Col and C24.
Surprisingly, in our analysis of rosette leaves, we detected no editing of 37 C targets previously identified in Arabidopsis suspension cultures (Giege and Brennicke 1999) (supplemental Figure 1, supplemental Table 2). Furthermore, we identified 67 new editing sites, either totally edited for both accessions (22 sites) or partially edited for at least one accession (45 sites) (supplemental Figure 1, supplemental Table 2). There is no doubt that the new partially editing sites are genuine, as the electrophoretograms show both C and T at the corresponding positions. We sequenced the genomic DNA from Col and Ler for eight mitochondrial genes that exhibited 11 of the 22 new totally edited sites. These eight genes do not have introns so that the same primers used for the RT–PCR were used to amplify the genomic DNA. The remaining 11 sites belong to genes with introns and were not further examined because the amplification of the genomic DNA would have involved the synthesis of new primers. In all the cases examined, the genomic DNA of both Col and Ler was found to encode a C, validating the occurrence of new totally edited sites for half of the sites (11/22) reported in this study. It is thus likely that all 22 new sites we detected are genuine C-to-U events and not SNPs between C24 and Col/Ler.
Editing events can be divided into two classes, NS when editing changes the encoded amino acid or S when the amino acid is unchanged. We observed a striking difference between NS and S sites with regard to their editing status in suspension cultures vs. in rosette leaves (Table 2). There is a very significant association between the nature of a site (S or NS) and whether it is edited or not in rosette leaves (χ2 = 14.399, P = 0.000). The main contribution to the high value of χ2 comes from the observed number of S sites in the class of sites not edited in rosette leaves (n = 16), which is higher than the expected number (n = 7) in the case of independence. The same analysis was carried out with the new edited sites identified in the rosette leaves but absent from the report of suspension cell mitochondrial editing (Table 2). Again a very significant association was found between the nature of a site (S or NS) and whether it is edited or not in suspension cells (χ2 = 40.045, P = 0.000). The deviation of χ2 from independence came mostly from the high number of S sites in the class of sites not edited in suspension cells (32 vs. an expected number of 13) (Table 2).
The proportions of S sites found in the class of sites not edited either in rosette leaves or in suspension cells are very close, respectively 43% (16/37) and 48% (32/67), and differ very significantly from the proportion of S sites found in the whole population of sites edited in either tissue (20% = 84/429). However, the association between the nature of a site, S or NS, and its editing status in rosette leaves, but not suspension cells, depends strongly on the level of editing of the site. For newly detected sites that were completely edited (CE) sites, no association could be supported by our data (χ2 = 2.664, P = 0.103). For newly detected C targets that were only partially edited (PE), the association of editing status and S vs. NS changes was the most significant of all the associations tested in this study (χ2 = 94.191, P = 0.000). The main contributor to the deviation of χ2 from independence is the high number of S sites found in the class of new PE sites (31 observed sites vs. 8 expected sites, Table 2). In PE sites seen in rosette leaves but not reported in suspension cells, a majority of sites are silent (69% of S vs. 31% of NS, Table 2), which is in marked contrast to the proportions observed in the whole population of sites edited in rosette leaves (17% of S vs. 83% of NS, Table 2). In CE sites found in rosette leaves but not in suspension cells, the proportions of S and NS sites do not differ significantly from what is observed in the whole population of sites edited in rosette leaves (respectively 5 and 95% vs. 17 and 83%, Table 2). It was not possible to perform a similar analysis with the 37 sites edited in suspension cells but not in rosette leaves because the level of editing was not reported in the suspension cell mitochondrial study of Giege and Brennicke (1999).
Inspection of the electrophoretograms obtained with Col and Ler cDNAs revealed 71 potential editing-extent polymorphisms between these two accessions. At those positions, the C and T peaks from bulk cDNAs of the two accessions exhibit different heights. No case of a site fully edited in one accession (only a T peak) and not in the other accession (only a C peak) was detected. Although differences in the electrophoretograms were reproducible, they were difficult to quantify with an acceptable level of accuracy. The bulk-sequencing assay was sufficient to show a difference of editing between the two accessions but was inadequate for quantifying the level of editing. The extent of editing at the 71 sites with possible polymorphisms was more precisely quantified using a reproducible PPE assay originally developed to analyze the editing of chloroplast transcripts in maize (Peeters and Hanson 2002). This assay is able to detect even 1% editing (Hayes and Hanson 2007b).
Sites were considered polymorphic only when the difference in editing between the two accessions was ≥5%. This threshold allowed the identification of 40 polymorphic sites (supplemental Table 2). Of the 33 mitochondrial genes examined, transcripts of 19 genes showed at least one polymorphic editing site, while transcripts of 14 genes did not show any editing-extent polymorphism (Table 3). The largest number of polymorphic sites was found for rps4; 5 of the 11 editing sites present in this gene are differentially edited between Ler and Col. The most polymorphic transcripts tend to encode ribosomal proteins (Table 3). Nad3 and rps12 are 2 of the 3 genes whose transcripts exhibit a proportion of polymorphic sites that is significantly >10% (40/392), the average of polymorphism in the population. These 2 genes colocalize on the mitochondrial genome (Unseld et al. 1997) and are cotranscribed.
No apparent cofunctionality or colocalization was found for genes showing a high proportion of sites not edited in rosette leaves but reported to be edited in suspension cells or for genes showing a high proportion of sites edited in rosette leaves but not edited in suspension cells (Table 3). Ccb382 shows the highest number of sites edited in suspension cells but not in rosette leaves (8 sites, Table 3) while the gene exhibiting transcripts with the highest number of sites edited in rosette leaves but not in suspension cell is orfX (11 sites, Table 3).
The distribution of S and NS sites was studied in the population of monomorphic or polymorphic sites (Table 4). A very significant association was found between the nature of a site, S or NS, and its mono- or polymorphism (χ2 = 28.248, P = 0.000). The deviation from independence was mostly due to the high number of S sites in the population of polymorphic sites (19 observed polymorphic S sites vs. 7 expected). The proportion of S sites in the polymorphic class was 48% (19/40) compared to an overall proportion of S sites in the population of edited sites of 17% (68/392) (Table 4). Even when the threshold to declare a site polymorphic was increased to 10 or 15% (DIFF ≥ 10, DIFF ≥ 15, Table 4) the association between the nature of a site and its mono- or polymorphism remained very significant (χ2 = 24.158 and 15.703, P = 0.000, respectively). The cause for deviation remained an excessive number of S sites in the class of polymorphic sites, 50% (15/30) and 53% (9/17), respectively. When the threshold to declare a site polymorphic was 20% (DIFF ≥ 20, Table 4), the χ2 was just below the significance level (χ2 = 3.673, P = 0.055). In this latter case, however, the validity of the test must be considered with caution, because the expected number of polymorphic S sites is <5.
Among the 40 polymorphic sites detected when the threshold is 5%, Col is more edited than Ler in 16 sites (40%) and Ler is more edited than Col in 24 sites (60%) (supplemental Table 4). These two proportions do not differ significantly from equal proportions (χ2 = 1.6, P = 0.206). The equality between these two proportions could not be rejected for any of the polymorphic thresholds (χ2 = 1.2, P = 0.273; χ2 = 0.53, P = 0.467; χ2 = 1.6, P = 0.206, respectively, for DIFF ≥ 10, 15, or 20).
Dominance relationship of the editing efficiency for a subset of polymorphic sites:
Fifteen of the C-to-U editing events exhibiting the greatest polymorphism in editing extent in leaves of the two accessions were selected for further analysis. Editing extent was measured in three plants per genotype. We observed a significant difference between the reciprocal hybrids for eight of the polymorphisms tested. The largest difference observed was for the editing of orf114-309, where the average editings for Col × Ler and Ler × Col are, respectively, 41 and 56% (significantly different at P < 0.001, Figure 1). The editing polymorphisms could be classified in seven categories depending on the value of the F1's. The largest category comprises ccb256-252 (S), ccb256-624 (S), rps4-226 (NS), rps4-235 (NS), and nad3-149 (S). For these sites, the Col × Ler hybrid does not differ significantly from the midparental value (P < 0.05), while Ler × Col differs significantly from the midparental value but not from the most edited parent (P < 0.05) (Figure 1).
The second most represented category is exemplified by rpl2-212 (NS) and contains also ccb203-208 (NS), rps3-1534 (NS), and nad3-212 (NS). All of the sites belonging to this category except nad3-212 exhibit significant differences in editing extent in both hybrids vs. the midparental value, but not from the most edited parent (Figure 1). In the case of nad3-212, due to the smaller range of editing variation between the parental accessions, an additive model could not be rejected for Col × Ler (P = 0.1544); nevertheless, the data fit a dominant model better (P = 0.9994).
The next category is characterized by an editing value for both hybrids not differing significantly from the midparental value. This category contains cox2-138 (S) and orfX-144 (S) (Figure 1). The remaining sites have unique dominance patterns. Rps4-175 (NS) could not be part of the dominant category represented by rpl2-212 (Figure 1), because the Col × Ler hybrid exhibited an editing value significantly higher than the midparental value (P = 0.0003) but significantly lower than that of Ler × Col, Ler, and Ler2 (Figure 1). Nad7-789 (S) and nad2-558 (S) showed similar patterns: the editing value of Col × Ler is not significantly different from the midparental value (respectively P = 0.7915 and P = 0.8286) and the editing value of Ler × Col is significantly lower than the midparental value for both sites (respectively P = 0.0002 and P = 0.0069) (Figure 1). However, the editing value for Ler × Col is not significantly different from that of the less edited parent, Ler and Ler2, for nad7-789 (S) while it is significantly different for nad2-558 (S).
The last category is represented by orf114-309 (S). At this site the Col × Ler editing value does not differ significantly from the midparental value (P = 0.4054); nevertheless it is closer to that of Col, the less edited parent (P = 0.7980). The editing value for Ler × Col differs significantly from the midparental value (P = 0.030) but does not differ significantly from that of the most edited parent, Ler and Ler2. Because for this site the editing values of Ler and Ler2 are close to significantly different (P = 0.0658), we repeated this analysis with a midparental variable m2 calculated with Ler2 (m2 = Col − (Col − Ler2)/2). The editing values for Ler × Col and m2 are just below the significance threshold (P = 0.0556); nevertheless, Ler × Col shows an editing value closer to the most edited accession. In other words, even if we cannot formally reject an additive model for orf114-309, the data fit a dominant model better where the dominance of the phenotype is dependent on the direction of the cross.
Despite the apparent heterogeneity of the dominance pattern shown by the sites studied, some general observations can be drawn from this analysis. First, dominance is more common in NS sites. Complete dominance for both F1's was observed only for four NS sites, rpl2-212, ccb203-208, rps3-1534, and nad3-212. Dominance of the most edited phenotype is always observed for the NS sites even when it is restricted to only one F1. By contrast, additivity for both F1's was observed only for two S sites, cox2-138 and orfX-144. Two S sites, nad7-789 and nad2-558, were also the only ones to show a dominance of the less edited phenotype. As discussed earlier, the mode of dominance for orf114-309 could not be assigned unambiguously at the chosen threshold of significance. A maternal effect was detected for eight sites, five S sites and three NS sites. The five S sites belong to four different mitochondrial genes (ccb256-252, ccb256-624, nad7-789, orf114-309, and nad2-558) whereas the three NS sites belong to the same mitochondrial gene (rps4-175, rps4-226, and rps4-235).
Phenotypic variation and correlation among editing efficiencies of mitochondrial sites:
The editing extents of sites in transcripts of 30 RILs (Lister and Dean 1993) were assayed for 13 polymorphic sites by PPE. For this analysis we chose the 12 most polymorphic sites between the two parental accessions and orf114-309 because of its peculiar dominance pattern. Included in this analysis were the two parental accessions, which were reassessed for their editing values by PPE, providing an internal control for the accuracy of the measurement. For all traits analyzed, significant variation was observed between RILs, with broad-sense heritabilities (estimated as the proportion of variance explained by between-line differences) ranging from 0.51 for rps4-226 to 0.96 for nad2-558 (Table 5). Transgressive variation, i.e., the fact that the variation among the RILs exceeds the variation among the parental accessions, was detected for six editing traits but only in one direction (higher than that of the most edited parent), indicating the presence of an allele increasing the editing efficiency in the two parental lines (Table 5).
Phenotypic correlations among the editing traits are presented in Table 6. More than half the pairwise correlations are significant at the 5% level, therefore indicating that the editing traits might share some common genetic control. Surprisingly, around half the significant correlation coefficients are negative (Table 6). All the editing traits except orf114-309 show some highly significant correlation with at least another editing trait. A threshold significance set up at the 0.1% level allows the assembly of the editing traits into five correlative groups. Interestingly, a site belonging to a specific mitochondrial gene does not necessarily correlate the most strongly with other sites from the same gene. Hence rps4-175 shows a high correlation coefficient of 0.9 with rps3-1534. In the same group we found ccb206-406, which is negatively correlated with rps4-175 and rps3-1534 (respectively r = −0.69 and r = −0.65). A second group contains orfX-144 and nad2-558, which are negatively correlated (r = −0.8). Ccb203-208 and rpl2-212 belong to the third group (r = 0.77). In a fourth group we find ccb256-252, ccb256-624, and cox2-138. Ccb256-624 correlates more strongly to cox2-138 (r = 0.92) than to ccb256-252 (r = −0.59), reproducing what has been already observed for rps4-175. In the last group are two sites belonging to the same mitochondrial gene, rps4-226 and rps4-235 (r = 0.92). The groups defined above represent most of the correlations significant at the 0.1% level (9 of 11, Table 6). In addition, nad7-789 correlates positively with ccb256-624 but not with the other members of the same group at the 0.1% level (r = 0.61). Rps4-175 correlates positively with rps4-226 (r = 0.63). These five correlative groups are discussed again in relation to the QTL identified in the following section.
Mapping QTL controlling the editing efficiency of mitochondrial genes:
The markers for this analysis were chosen among a list available at the Nottingham Arabidopsis Stock Center website (http://arabidopsis.info/). The criteria to select markers were a full coverage of the genetic map, an even density of markers along the chromosomes, and a low number of missing values per marker. This last criterion was respected for most of the markers except for TSA1, tai224, and SGCSNP84, which were not scored for 16, 12, and 8 RILs, respectively. These markers were retained because they cover the south end of chromosomes 3 and 5 (Figure 2). The resulting map contained 70 markers, 15 markers on chromosomes 1, 4, and 5, 13 markers on chromosome 3, and 12 markers on chromosome 2. The distance between markers ranges from 1.7 cM (GLU2-RPS2, chromosome 2) to 13.1 cM (mi225-AIG2 and m435-agp27e, respectively, on chromosomes 3 and 5) (Figure 2). The average distance between markers along the map is 8.9 cM. Another important feature of the mapping material resides in the amount of genotypic data available for each segregating progeny. The number of markers scored per RIL averaged 67 of 70 or 96%; the lowest value was recorded for CS1969, which was genotyped for 61 markers or 87%.
Mapping of the editing QTL was conducted by using three analytical methods, SPA, IM, and CIM. Overall, 21 QTL were identified in this study; 12 major QTL were detected by all methods and 9 minor QTL were detected only by CIM (P < 0.05, Table 7). CIM is the more accurate method to detect QTL since it incorporates cofactors in the model, allowing for the reduction of the genetic noise and thus enhancing the power of detection. For the same reason, the confidence intervals determined by the drop of the LOD score around the maximum LOD score tend to be smaller with CIM (Table 7).
QTL were detected for all the editing traits analyzed. Moreover, a majority of the traits are under the control of major QTL. Only the editing efficiencies of rps4-235 and orf114-309 exhibit minor QTL detected by CIM alone. These two traits show also relatively low heritability compared to the other traits (Table 5). In the case of the QTL controlling orf114-309, the allele increasing the editing efficiency comes from the Col parent (Table 7), although Ler has a higher editing value than Col (Table 5). This apparent paradox could explain why the distribution of this trait shows the largest transgressive variation (Table 5). Indeed, the editing of orf114-309 for one of the RILs shows an increase of 42% compared to Ler. Editing extent of ccb256-252 is the only other trait that shows a QTL whose allelic effect is opposed to the parental values. As for orf114-309, this result is not surprising given that the phenotypic distribution of this trait shows transgression (Table 5). Furthermore, editing efficiency of ccb256-252 shows the highest number of transgressive lines of all the traits studied (data not shown). The largest effect attached to a QTL was detected for editing extent of nad2-558 (Table 7). The QTL controlling this trait colocalizes with the marker m246 on chromosome 2; its effect could explain 95% of the phenotypic variation. In that specific case, a monogenic control of an editing trait has been established, supporting the pattern of phenotypic distribution observed in the RILs (data not shown).
Editing extent of rps3-1534 was the only other trait whose phenotypic distribution in the RILs was characteristic of single-gene segregation (data not shown). Editing of Rps3-1534 is under the control of a major QTL colocalizing with marker C18a on chromosome 4, whose effect could explain 75% of the phenotypic variation (Table 7). The effect of this QTL could explain 97% of the genetic variation, thus supporting also a monogenic control for editing extent of rps3-1534. In Table 7 we also include the results from our previous study on QTL controlling the editing of ccb206-406 (Bentolila et al. 2005). However, the position and effects of the QTL for editing efficiency of ccb206-406 were reestimated with the new map to be able to compare the position of the QTL with the ones controlling other editing sites.
We repeated the same analysis for every trait analyzed by first establishing how much of the phenotypic variation could be explained by the QTL detected at P < 0.01 in a multiple-regression model. An estimation of the fraction of the genetic variation uncovered by the QTL was calculated by dividing the heritability of the trait with the corresponding R2. This ratio ranged from 0.29 for orf114-309 to 0.99 for nad2-558 with an average of 0.74.
The editing QTL are not evenly distributed along the genetic map: 11, 4, 2, 5, and 1 QTL lie, respectively, on chromosomes 1, 2, 3, 4, and 5 (Figure 2). This distribution differs significantly from a random distribution based on the length of each chromosome (χ2 = 9.88895, P = 0.042). The deviation from a random distribution is due to an overrepresentation of QTL on chromosome 1 (6 expected QTL) and a low number of QTL on chromosome 5 (5 expected QTL). The same results are obtained when the expected number of QTL is based on the number of genes carried by each chromosome (χ2 = 10.1086, P = 0.039). Furthermore, the distribution of the QTL along the chromosome does not seem to be random since the vast majority of them colocalize with other QTL (Figure 2). Five colocalization areas could be discriminated along the genetic map, two on chromosome 1 and one on each remaining chromosome except for chromosome 5 (Figure 2).
There is a very good fit between the five correlative groups discussed in the previous section and the presence of the corresponding major QTL in the five colocalization groups presented in Figure 2. For instance, the major QTL detected for ccb256-252, ccb256-624, and cox2-138 are all members of group 1 (Figure 2); the corresponding traits show a significant correlation at the 0.1% level. Moreover, the parental allelic effect of the QTL predicts the sense of the correlation (Figure 2). For the major QTL controlling ccb256-624 and cox2-138, the Col allele increases the editing efficiency and the corresponding traits are positively correlated, while the Ler allele increases the editing efficiency of ccb256-252. This latter trait is negatively correlated with ccb256-624 and cox2-138. Most of the correlative groups could be traced back from the colocalization groups, orfX-144 and nad2-558 from group 3, rpl2-212 and ccb203-208 from group 4, and ccb206-406, rps4-175, and rps3-1534 from group 5 (Figure 2). Group 2 contains the major QTL for rps4-226 and the minor QTL for rps4-235 in accordance with their strong correlation. In addition to these QTL, group 2 contains also the major QTL for nad7-789 even though nad7-789 shows a stronger correlation with ccb256-624 (Table 5).
Tissue-specific variation of mitochondrial editing:
Although the accession used in the sequencing of the mitochondrial genome was identified as Columbia in the report by Unseld et al. (1997), it seems likely that it was actually C24. The accession used in the subsequent report on mitochondrial editing in Arabidopsis suspension cultures (Giege and Brennicke 1999) was Columbia (P. Giege, personal communication). Therefore we can compare the suspension culture editing data to our data on rosette leaves. Sixty-seven rosette-leaf-specific sites were discovered in this study, resulting in a total number of 508 editing sites in Arabidopsis. The magnitude of this finding was quite unexpected given that only one rosette-specific site was found by us when we analyzed editing of ccb206-406 in rosette leaves (Bentolila et al. 2005). Conversely, 37 sites previously reported were not edited in rosette leaves and can be considered suspension-cell specific. According to Giege and Brennicke (1999), almost 6.6% of 6627 total cytosines present in protein-coding regions are modified by editing in suspension cultures; i.e., 1 of 15 C's is edited in the Arabidopsis mitochondrial genome. The proportion of modified C's in mitochondria is actually even higher, given the high number of new editing sites we identified by analyzing only one additional tissue type in this study. Tissue specificity of editing has previously been reported both in tobacco and maize chloroplasts (Peeters and Hanson 2002; Chateigner-Boutin and Hanson 2003; Bentolila et al. 2005) and in maize mitochondria (Grosskopf and Mulligan 1996).
When the level of editing is considered, the variation of editing according to tissue specificity depends greatly on the nature of the site. Among the 67 sites that are rosette leaf specific, 22 were totally edited in both accessions and almost all of these completely edited sites are nonsilent. By contrast, partial editing of 45 rosette leaf-specific sites affected a majority of silent sites (69%), an observation similar to that of the excess of S sites that are partially edited in B. vulgaris (Mower and Palmer 2006). Whether S sites among the 37 suspension cell-specific sites are more likely to be partially rather than fully edited could not be determined from the published report (Giege and Brennicke 1999) because data on extent of editing were not described.
Editing is developmentally regulated in maize and tobacco chloroplasts (Peeters and Hanson 2002; Chateigner-Boutin and Hanson 2003). Furthermore, tobacco chloroplast sites carrying similar cis-sequences covary in editing extent according to the tissue analyzed (Chateigner-Boutin and Hanson 2003). We have suggested that these observations could be explained by the existence of developmentally regulated trans-factors that recognize specific cis-elements in the vicinity of the editing sites. The existence of 5′ cis-elements has been documented both in mitochondrial and in chloroplast editing (e.g., Neuwirt et al. 2005; Hayes et al. 2006). Our data support and extend a model in which expression of a trans-factor varies, resulting in limiting amounts in certain tissues, therefore impeding the editing extent of the target sites. In rosette leaves, trans-factors might be present that are absent in suspension cells and vice versa, allowing the editing tissue specificity of their cognate targets. We propose that the primary targets of these hypothetical trans-factors are the NS sites, and S sites are secondary targets that happen to share some similarity in cis-sequence with one or more NS sites. We suggest that the affinity of the trans-factors for the cis-sequences around the NS sites is higher than for the S sites, resulting in more efficient editing of NS sites. This model explains both the tissue specificity and the level of editing of mitochondrial sites according to their nature, S or NS. Surveying mitochondrial editing for tissues other than suspension cells and rosette leaves would allow further testing of this model.
Accession-specific variation of mitochondrial editing:
Forty polymorphic sites were detected between Col and Ler when the minimal difference of editing between the two accessions was set to 5%, a reliable threshold given the accuracy of the measurement method used. Ten of these sites show a difference in editing of ≥20% between Col and Ler. No difference in the genomic sequences of the 33 mitochondrial genes surveyed was detected between Col and Ler, an expected fact given the very low mutation rate in the plant mitochondrion (Lynch et al. 2006). Thus the trans-factor model discussed in the previous section could explain the occurrence of editing polymorphism between the two Arabidopsis accessions. It is possible that Col and Ler have evolved trans-factors with slightly different affinity for the target sequences, resulting in different editing extent of the targeted C, but the presence of a C that requires editing to U for expression of a proper protein makes at least some editing mandatory. This model explains why no qualitative difference in mitochondrial editing was observed between Col and Ler; i.e., we never detected an editing site in one accession that was absent in the other. Alternatively, the amount of trans-factor might be a limiting factor, a fact supported by reduced editing upon higher concentrations of substrate in vitro (Takenaka et al. 2004). Perhaps a trans-factor is more abundant in one accession than in the other, resulting in more efficient editing.
There is a significant excess of silent sites in the population of polymorphic sites (∼50%, depending on the threshold used to record a polymorphic site). Nevertheless, half of the polymorphism affects nonsilent sites. Six of the 10 most polymorphic sites are nonsilent. However, editing extents of nonsilent sites in the less edited accession are all >40% (rps4-175, Table 5), which presumably results in sufficient edited transcript that enough proper protein is synthesized so that the fitness of the plant is not highly impaired. Because partially edited RNAs are intermediates of RNA editing in plant mitochondria (Verbitskiy et al. 2006), at least some of the partially edited transcripts may eventually become edited. By comparison, editing extent of the less edited accession is lower in silent sites (17% for nad2-558, Table 5) than in nonsilent sites. Because editing of S sites does not change the encoded protein, the presence of partially edited transcripts is less relevant to protein expression.
Maternal effect on mitochondrial editing:
More than half of the sites inspected for the editing extent values of the reciprocal hybrids show a significant difference depending on the direction of the cross. One possible explanation resides in the fact that the hybrids were not grown at the same time in the greenhouse; thus, fluctuations in editing efficiency might have been caused by differences in growing conditions. To test for this possible artifact, Ler was grown as a control with the hybrid Ler × Col. Editing values of Ler were very similar between the two sets of experiments (Figure 1), thus allowing comparison of the results obtained for the reciprocal hybrids. There still remains the possibility of a genotype × environment interaction between the two growing periods, resulting in the modification of editing for the hybrid only. Alternatively, a maternal effect could account for the observed differences in editing between the two hybrids. However, no nucleotide difference was found between the two accessions for the 33 mitochondrial genes analyzed. For orf114-309, the site showing the largest difference in editing extent between the reciprocal hybrids, the 3′ end of the reverse primer is complementary to nucleotide 312, 3 nucleotides away from the editing site. In that particular case, we cannot ascertain that no differences between Col and Ler exist downstream of that position. In every other case, the nucleotide identity of the two accessions extends over 50 nucleotides around the edited site. Therefore a difference in cis-element of the mitochondrial substrate cannot be held responsible for the difference observed between the F1's. Since recognition of cis-sequences by a nuclear-encoded trans-factor could not be involved in the maternal effect of editing of those mitochondrial genes that are identical in sequence in both accessions, a polymorphism outside the mitochondrial gene sequence has to be postulated as the cause for maternal effect. Because the complete sequence of the Ler mitochondrial genome is lacking, we could not search for putative polymorphisms between the two accessions. There are some known polymorphisms in chloroplast DNAs of different Arabidopsis accessions (Azhagiri and Maliga 2007).
Possible molecular identity and role of editing QTL:
In this study the three measurements per RIL were done on three plants, allowing for an estimation of the heritability of the editing traits that ranged from 0.51 to 0.96 with an average of 0.79. This amount of genetic variation indicated that QTL mapping was likely to reveal major QTL for most traits if they existed. The discovery of 12 major QTL for 11 of the 13 editing traits analyzed demonstrated that mitochondrial editing is generally under the governance of a major factor. Some of these factors might control all of the genetic variation for certain sites, such as for nad2-558 and rps3-1534. In other cases additional minor QTL might contribute to the genetic variation, e.g., ccb256-252.
It is unlikely that any of these QTL will encode the enzymatic activity necessary to achieve editing since a polymorphism in the activity itself is rather unexpected: hundreds of sites are fully edited in both accessions. Another possibility is that the QTL encode factors not directly involved in editing but rather in stability of the transcripts. In Petunia nad3 transcripts the editing extent of a site was shown to cosegregate with the transcript abundance in a BC1 population (Lu and Hanson 1992). A stability factor model would predict that sites carried on the same gene's transcripts from the same parent would be the most edited for all the polymorphic sites. However, we observe that many transcripts exhibiting more than one polymorphic site have one parental accession exhibiting higher editing at one C target while the other parental accession exhibits higher editing at a second C target (supplemental Table 2). For instance, rps-12 has three polymorphic sites where Ler is more edited than Col and one site where Col is more edited than Ler. Our data point to a specific role of the QTL in the editing of the corresponding site. It is tempting to speculate that at least some of the QTL identified in this study might encode for the trans-factors involved in the recognition of the cis-elements found in the vicinity of the editable C.
Because two factors required for editing of two different chloroplast sites have been shown to be PPR-motif-containing proteins (Kotera et al. 2005; Okuda et al. 2007), possibly some of these QTL will correspond to genes encoding members of the PPR family, one of the largest plant gene families first described in Arabidopsis (Small and Peeters 2000). Although the PPR-containing genes are ubiquitous in eukaryotes, the family is much larger in plants than in other eukaryotes; 441 PPR-containing genes have been reported in Arabidopsis (Lurin et al. 2004). The combination of a rather low power of resolution of our QTL mapping experiment and the prevalence of PPR-containing genes in the Arabidopsis genome does not allow us to test this hypothesis. A LOD1 confidence interval was 1.9 Mb long on average and contained ∼500 genes. Every LOD1 interval contained PPR genes except orfX-144 on chromosome 2. Nevertheless the LOD2 support interval for orfX-144 did contain 3 genes for PPR proteins. Further fine mapping and/or positional cloning of the editing QTL is thus necessary to find out whether any of them correspond to PPR-containing genes.
Instead of factors recognizing cis-elements, editing QTL could instead encode transcription factors involved in the regulation of expression of trans-factors. Most of the models discussed in this article refer to either a difference in affinity of the trans-factor for its cis-element or a difference in abundance of the trans-factor depending on the accession. Indeed, of the 10 QTL thus far cloned from plants, at least one-half are transcription factors or the phenotype has been ascribed to altered gene expression levels (Paran and Zamir 2003). The transcription factor or PPR-containing genes hypotheses are not mutually exclusive. Some editing QTL might encode transcription factors while others might encode PPRs. Once again, precisely determining the genetic sources of variation in editing efficiency will require the cloning of mapped QTL.
The colocalization of the QTL and the subsequent covariation of the corresponding traits (Figure 2 and Table 6) are quite striking. There is precedence for developmental covariation of chloroplast sites sharing cis-elements or co-inhibition when one member is overexpressed (Chateigner-Boutin and Hanson 2002, 2003). In the present study, the covariation is not developmental but instead is genetic. Because the mapping resolution is low, we do not know whether the colocalization is due to a single gene with a pleiotropic effect on several sites or whether different linked genes control different sites. In the case of the single-gene model, we would expect corelated sites to carry common cis-elements. A positive correlation between two editing traits is expected if their targets share similar cis-recognition sequences: a single trans-factor will recognize the common cis-elements and controls the editing extent of both sites. A negative correlation might still be explained if the trans-factor is in limited supply and sequestered by the site showing the highest affinity, resulting in depletion of the trans-factor at the site with the lower affinity and a reduced editing. Conversely, the control of corelated editing traits by linked genes does not require the occurrence of common cis-elements.
We decided to inspect the sequences around the sites for which the editing extent shows a strong correlation (Figure 3). In some cases, such as rps4-175 and rps3-1534, the sequences around the editable C show a reasonable amount of similarity, supporting the possibility of common cis-elements for these two sites and a single gene controlling the editing extent of both sites (Figure 3). In other cases, such as ccb203-208 and rpl2-212, the sequence similarity around the editable C is rather weak (Figure 3), suggesting that the editing QTL controlling these traits are more likely to be different linked genes.
In this study, we adopted a genetic approach to understand a biological phenomenon that has been studied mostly with biochemical techniques. Our data strengthen and complement the model of a trans-factor recognizing a cis-sequence around the editable C. Our work demonstrates that it is possible to map major QTL controlling mitochondrial editing for many traits if the parental accessions show some polymorphism. We believe that our strategy will prove to be useful in identifying components of the mitochondrial editing machinery.
We thank Françoise Vermeylen from the Cornell Statistical Consulting Unit for her help with the statistical analyses. L.E.E. was a participant of the Cornell/Boyce Thompson Institute National Science Foundation (NSF) Research Education for Undergraduates Plant Genome Summer Undergraduate Program. This work was supported by NSF Molecular and Cellular Biosciences grant 0344007 to M.R.H.
- Received March 19, 2007.
- Accepted May 23, 2007.
- Copyright © 2008 by the Genetics Society of America