- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Supplemental Data
-
All Versions of this Article:
genetics.105.049353v1
172/2/1179 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Kliebenstein, D. J.
- Articles by St. Clair, D. A.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Kliebenstein, D. J.
- Articles by St. Clair, D. A.
Originally published as Genetics Published Articles Ahead of Print on October 3, 2005.
Genetics, Vol. 172, 1179-1189, February 2006, Copyright © 2006
doi:10.1534/genetics.105.049353
Genomic Survey of Gene Expression Diversity in Arabidopsis thaliana
Daniel J. Kliebenstein*,1,
Marilyn A. L. West*,1,
Hans van Leeuwen*,
Kyunga Kim
,
R. W. Doerge
,
Richard W. Michelmore*,
and
Dina A. St. Clair*,2
* Department of Plant Sciences, University of California, Davis, California 95616-8780,
Department of Statistics, Purdue University, West Lafayette, Indiana 47907-2067 and
The Genome Center, University of California, Davis, California 95616
2 Corresponding author: Department of Plant Sciences, University of California, Mail Stop 3, 1 Shields Ave., Davis, CA 95616-8780.
E-mail: dastclair{at}ucdavis.edu
Differential gene expression controls variation in numerous plant traits, such as flowering time and plant/pest interactions, but little is known about the genomic distribution of the determinants of transcript levels and their associated variation. Affymetrix ATH1 GeneChip microarrays representing 22,810 genes were used to survey the transcriptome of seven Arabidopsis thaliana accessions in the presence and absence of exogenously applied salicylic acid (SA). These accessions encompassed
80% of the moderate- to high-frequency nucleotide polymorphisms in Arabidopsis. A factorial design, consisting of three biological replicates per accession for the two treatments at three time points (4, 28, and 52 hr post-treatment), and a total of 126 microarrays were used. Between any pair of Arabidopsis accessions, we detected on average 2234 genes (ranging from 1428 to 3334) that were significantly differentially expressed under the conditions of this experiment, using a split-plot analysis of variance. Upward of 6433 genes were differentially expressed between at least one pair of accessions. These results suggest that analysis of additional genetic, developmental, and environmental conditions may show that a significant fraction of the Arabidopsis genome is differentially expressed. Examination of sequence diversity demonstrated a significant positive association with diversity in gene expression.
NATURAL phenotypic variation is often quantitatively distributed and occurs in both simple and complex traits. DNA sequence polymorphisms can be a primary genetic cause of phenotypic variation in populations. Some recent studies have assessed genomewide DNA sequence variation within and between different species (GIBBS et al. 2003; SCHMID et al. 2003, 2005; STEIN et al. 2003; BENTLEY and PARKHILL 2004; NORDBORG et al., 2005). These studies identified variation throughout the genome, including single-nucleotide and INDEL polymorphisms in all gene components: promoter, UTRs, ORFs, and introns. However, cataloging of sequence variation is only the first step in understanding how DNA variation contributes to phenotypic variation. Recent advances in technology have enabled comparative studies that focus on the variation present in the transcriptome, metabolome, and proteome. Recent work comparing sequence divergence to expression divergence between two Drosophila species found little evidence for a correlation between the two measures of divergence (NUZHDIN et al. 2004). Understanding how variation in these intermediate processes is correlated will aid in linking DNA sequence polymorphism to its phenotypic consequences.
Differences in gene expression, termed expression level polymorphisms (ELPs) (DOERGE 2002), have been shown to control inter- and intraspecific phenotypic variation in various organisms (WANG et al. 1999; CARROL 2000; BREM et al. 2002). In plants, naturally occurring ELPs have been associated with changes during maize domestication (WANG et al. 1999), flowering-time control in Arabidopsis (JOHANSON et al. 2000; CAICEDO et al. 2004; LEMPE et al. 2005; WERNER et al. 2005), pathogen resistance (GRANT et al. 1995; GASSMANN et al. 1999), and Arabidopsis insect resistance and secondary metabolism (KLIEBENSTEIN et al. 2001, 2002; LAMBRIX et al. 2001). ELPs can be caused by various types of DNA sequence polymorphisms, including trans-acting factors (CAICEDO et al. 2004), cis-acting promoter polymorphisms (FRARY et al. 2000; CONG et al. 2002), cis-acting processing variants, such as splice-site polymorphisms (LAMBRIX et al. 2001), and whole or partial gene deletions that functionally act to nullify the gene's expression (KLIEBENSTEIN et al. 2001; LAMBRIX et al. 2001). In contrast, the more drastic mRNA processing and INDEL polymorphisms may be overlooked even though they can be the basis of variation in plant/environment interactions (LAMBRIX et al. 2001; KLIEBENSTEIN et al. 2002). To fully study the sequence level polymorphisms that alter gene expression patterns, we include the full range of polymorphism types in our definition of ELPs.
In spite of the potential importance of ELPs in controlling phenotypic variation, little is known about the extent of ELPs within a species or the distribution of ELPs within the genome. ELP variation within a species should be described at several levels. First, which genes are differentially expressed among accessions or genotypes? Such genes may be under selective pressure for altered expression patterns or the expression difference may be the result of neutral sequence variation. Second, how does variation in gene expression relate to the underlying nucleotide polymorphism level? Comparing the nucleotide polymorphism level between individuals to their ELP divergence can help assess if these two types of genetic diversity are related. By comparing the pattern of nucleotide polymorphism to ELP variation, it may be possible to determine if nucleotide polymorphisms and expression level polymorphism show similar variation patterns, suggesting comparable levels of neutrality. Third, how does gene expression variation differ in response to changes in the environment and/or treatments? These questions, as well as many others, can be addressed by quantitative analyses of natural variation in gene expression within a species.
In this study, we assay natural variation in Arabidopsis gene expression on a genome-wide basis in terms of which genes exhibit ELPs, how frequently these genes vary in their expression patterns between pairs of accessions, and the consequences of varying a treatment on ELP detection. The Affymetrix ATH1 GeneChip microarray representing 22,810 genes was employed to query gene expression in seven Arabidopsis thaliana accessions. These accessions represent
80% of the moderate- to high-frequency polymorphisms within Arabidopsis (NORDBORG et al. 2005). All statistically significant ELPs in 5-week-old foliar tissue were used to obtain an overview of species-wide variation. Our analysis revealed between 1428 and 3344 differentially expressed genes for any pair of the Arabidopsis accessions surveyed. The genes exhibiting differential expression between at least one pair of these seven accessions composed more than one-third of the Arabidopsis genome. The frequency of differential gene expression between accessions was significantly positively correlated with the genomewide sequence divergence among the same accessions. Furthermore, this sequence/expression divergence correlation was also evident in terms of chromosome organization/physical localization. This study provides a first description of a species-level sampling of gene expression variation within Arabidopsis, as well as an assessment of how ELP variation relates to nucleotide polymorphism levels.
Plant material and experimental conditions:
Seeds of seven Arabidopsis thaliana accessions (Col-0, Cvi-1, Est, Kin-0, Mt-0, Tsu-1, and Van-0) were provided by J. Chory, Salk Institute, San Diego. These accessions can be obtained from The European Arabidopsis Stock Centre (http://arabidopsis.info) and The Arabidopsis Information Resource (TAIR) (www.arabidopsis.org). To identify a level of exogenous salicylic acid (SA) application that caused gene induction without phytotoxicity in all accessions, a preliminary experiment was conducted. Multiple plants of each accession were grown in a growth chamber under short-day conditions (8 hr light at 100120 µEi, 20° day/20° night) to maintain plants in the vegetative phase for 56 weeks. The accessions were treated with SA at approximately the same developmental stage (as defined by rosette diameter and leaf number). Plants were selected for treatment when they either reached a rosette diameter of 5060 mm or had
1520 leaves. Subsequently, the plants were sprayed with a series of SA (Sigma-Aldrich, St. Louis) concentrations [0, 0.1, 0.2, 0.3, and 1 mM, all in 0.02% Silwet L77, a surfactant (Lehle Seeds, Round Rock, TX)] to runoff and then harvested at four time points (4, 28, 52, and 76 hr) post-SA treatment and quick-frozen in liquid nitrogen. Three plants per accession were bulked for each concentration/time point to generate sufficient quantities of RNA. RNA was extracted from the bulked plants using TRIzol as per the manufacturer's protocol (Invitrogen, Carlsbad, CA) and applied to slot blots as described (AUSUBEL et al. 2004). Genes that had been reported previously as responsive to SA induction were used as probes on the slot blots (PR1, PR2, PR5, NPR1, EDS1, EDS5, and PAD4) (GLAZEBROOK 2001), and ß-tubulin served as a negative control. DNA sequences were obtained from GenBank (www.ncbi.nlm.nih.gov) and TAIR (www.arabidopsis.org) to design primers for PCR amplification and cloning. PR1, the most informative marker gene, indicated that across the seven accessions, plants responded to 0.30 mM SA during the first three time points (4, 28, and 52 hr) without exhibiting phytotoxicity (data not shown). Therefore, two treatments, 0.30 mM SA plus a zero SA control, and sampling at three time points, 4, 28, and 52 hr post-SA application, were chosen to conduct global assays for ELPs among these seven accessions.
A factorial experiment was conducted with the seven accessions, using three biological replications for each of the two treatments and the three time points. Due to space limitations in the growth chambers, these replications were performed on different dates. The accessions were seeded out on four dates (10/8/02, 10/29/02, 12/19/02, and 2/6/03). If plants of an accession did not grow properly in one of the first three planting dates, all samples from that accession for the affected date were discarded. Subsequently, plants from the fourth planting date for that accession were used as a replacement third replicate to maintain a balanced factorial consisting of three biological replications per accession. For each biological replicate, multiple plants (
30) per accession were grown in a growth chamber under short-day conditions (8 hr light at 100120 µEi, 20° day/20° night) to maintain plants in the vegetative phase for 67 weeks. Plants with 2530 rosette leaves (rosette diameter 6070 mm) were selected and sprayed with 0.30 mM SA in 0.02% Silwet L77 to runoff, and control plants were sprayed with 0.02% Silwet L77 alone. Plants from each combination of accession and treatment were harvested at 4, 28, and 52 hr post-treatment and quick-frozen in liquid nitrogen for subsequent RNA extractions. Each plant was harvested only once. To avoid diurnal variation in gene expression, the time points were 24 hr apart and harvesting was always performed over a 30-min period at the same time of day (1:00 PM).
RNA isolation and microarray analysis of transcriptome:
RNA was purified from the 126 samples from the factorial experiment using the TRIzol extraction procedure as above, followed by purification on RNeasy columns (QIAGEN, Valencia, CA). Labeled cRNA was prepared and hybridized to Affymetrix ATH1 GeneChips, containing 22,810 Arabidopsis genes, according to the manufacturer's guidelines (Affymetrix, Santa Clara, CA). The GeneChips were scanned with an Affymetrix GeneArray 2500 scanner and data acquired via the Microarray Suite software MAS 5.0. The MAS 5.0 algorithm with default scaling was used to obtain gene expression levels for analysis.
Quality control:
A preliminary analysis was conducted with GeneSpring v. 4.2 software (Agilent Technologies, Palo Alto, CA) for the purpose of selecting diagnostic genes that could be used to identify the accession hybridized to each GeneChip. Thirty-one genes were chosen whose expression level allowed each accession to be uniquely identified, regardless of treatment or time point. Using these 31 diagnostic genes, we fingerprinted all the chips. This procedure detected 11 GeneChips that did not fit the expected expression profile for a given accession, but matched the profile of a different accession, suggesting that an error had occurred during sample processing. These chips had all been run on the same 2 days, suggesting a single procedural error. When the 11 original RNA samples were reanalyzed on replacement GeneChips, the expression profiles fit their expected accession profile. This confirmed that a sample switch had occurred during initial chip processing, so the data from the incorrect GeneChip were removed and replaced with the correct replacement GeneChip.
Statistical analysis:
We next proceeded to conduct a statistical analysis with the following goals: to detect differential gene expression among accessions, which may represent potential ELPs; to identify accessions with gene expression changes resulting from SA application; and to account for array (chip-to-chip) variation. A mixed linear model analysis of variance (ANOVA) was used to analyze the GeneChip data. This approach is more powerful than nonparametric methods to detect differentially expressed genes and allows partitioning of the sources of variation in gene expression (e.g., accession, treatment, time points, array, gene, and their various interaction terms) that in turn improves accuracy and enables experimental interpretation (CRAIG et al. 2003). The mixed linear model also takes into account the array (chip-to-chip) variation, which enables better estimation of the "true" underlying gene effects, especially when the array variation is large. The mixed model, as employed here, does not require data "normalization" procedures since the components of variation are incorporated in the model. Furthermore, this model permits analysis of expression level data from all genes represented on the GeneChip without relying on Affymetrix software to call the presence or absence of expression.
For each pair of accessions (21 in total), an ANOVA was performed using a split-plot mixed linear model with a random array effect. In this model yijkgr denotes the expression level of gene g, measured from the accession i under SA treatment j at the time point k for the chip replication r. The ANOVA model for the log-transformed expression levels is
![]() |
;
ijkgr represents the subplot error and is assumed to be normally distributed with mean 0 and variance
. Two null hypotheses were employed to test for gene expression differences both within and between accessions. The first null hypothesis was used to statistically identify genes whose expression level significantly changed between treatments within an accession. For each gene of each accession at each time point, H01: Sj + (SG)jg = Sj' + (SG)j'g was tested. A second hypothesis was tested to detect potential ELPs. Specifically, for each gene under each treatment condition at each time point, there is no difference in expression levels between two accessions: H02: Pi + (PG)ig = Pi' + (PG)i'g. For each of 21 accession pairs, t-tests were used with the type I error adjusted for multiple comparisons, Holm's correction (HOLM 1979), and false discovery rate (HOLM 1979; BENJAMINI and HOCHBERG 1995). The accession analysis was based on the total number of genes identified as statistically significant by at least one of the two hypotheses H01 and H02, the number of potential ELPs detected by t-tests for H02, and the number of genes with possible ELPs related to SA induction as identified by t-tests for both H01 and H02. The numbers of such genes for all 21 parental pairs were compared to determine the maximum and minimum number of differences.
Quantitative RTPCR:
To validate gene expression data obtained from the GeneChips, we chose 15 single-copy genes for quantitative RTPCR analysis (QRTPCR). These genes were scattered throughout the genome and represented the range of ELP frequencies: 5 genes polymorphic in >50% of pairwise comparisons, 5 genes polymorphic in
50% of pairwise comparisons, and 5 genes polymorphic in <50% of pairwise comparisons. Total RNA from the same samples used for hybridization to the GeneChips was treated with deoxyribonuclease I (Sigma, St. Louis) according to the manufacturer's instructions. The DNAse-treated RNA was then used for first-strand cDNA synthesis with the Advantage RT-for-PCR kit (Clontech, Mountain View, CA), according to the manufacturer's instructions. The PCR primers were designed with Primer3 software (ROZEN and SKALETSKY 2000) to amplify fragments of
140 bp (supplemental Table 1 at http://www.genetics.org/supplemental/). The Actin7 gene, At5g09810, was used as a control gene for QRTPCR since it exhibited invariant expression among all seven accessions. The optimal annealing temperature was determined with gradient PCR using the Robocycler Gradient 96 (Stratagene, La Jolla, CA). Real-time quantitative PCR was performed with the AB7300 Real Time PCR system, using the SYBR Green PCR Master Mix, the RTPCR kit, and 96-well optical reaction plates [Applied Biosystems (Foster City, CA) 7300 Real Time PCR system]. PCR cycling conditions were 10 min at 95°, 40 cycles of 15 sec at 95°, and 1 min at 57°. Forward and reverse primer concentrations of 50, 300, and 900 nM were used to determine optimal conditions for each gene. The quality of the amplification was determined using the dissociation curve analysis of the AB7300 Real Time PCR system. Next, real-time PCR was performed with the control gene plus two of our chosen genes per plate, with seven cDNA samples and two no-template controls, each with two biological replicates and two technical replicates. For genes without an intron in the genomic DNA corresponding to the amplicon, control cDNA samples were used for which the reverse transcriptase had been omitted during cDNA synthesis. Threshold cycle (CT) values reported by the AB7300 Real Time PCR system were used for data analysis. Three genes did not amplify under any condition utilized and were not considered in the analysis.
|
Single-feature polymorphism analysis:
The Affymetrix ATH1 GeneChip contains oligomer probes designed from the DNA sequence of accession Col-0. When these arrays are hybridized with other genotypes, differences in hybridization signal may be due to true ELPs or due to sequence polymorphisms that result in altered probe affinity. The latter can be detected as single-feature polymorphisms (SFPs) between genotypes (BOREVITZ et al. 2003; WINZELER et al. 2003). Most Arabidopsis genes on the ATH1 GeneChip that contain a SFP are due to single-nucleotide polymorphisms (BOREVITZ et al. 2003). In total, 11 probe pairs represent each gene on the ATH1 GeneChip. To test whether SFPs had a major impact on perceived expression differences, we utilized the individual probe data from the ATH1 GeneChips to identify SFPs between Col-0 and Cvi-1. These two accessions were used as they showed the largest number of potential ELPs. Every probe from 2000 genes was tested for an SFP using the following algorithm: (probeCvi-1/probeset averageCvi-1)/(probeCol-0/probeset averageCol-0). A value of 0.40 was used as the cutoff for declaring a significant SFP as this was the empirically determined 0.05 threshold derived from a histogram of all
22,000 tests: 5% of the tests were between the values of 0.1 and 0.4, and 95% of the tests were between 0.4 and 1. The genes were classified by the number of SFPs that were identified and whether the gene showed differential expression in our specific pairwise comparisons of accessions. We randomly selected 400 genes that had one or more SFPs per gene and calculated the average expression level for each gene from each GeneChip in two ways: using all 11 perfect-match (PM) probes that included one or more with an SFP and using only the probes lacking any SFPs. An ANOVA was performed for each gene, using a split-split-plot mixed linear model with a random array effect, identical to the previous model above except that the sub-subplot was defined as the probes within a gene and the subplot was the gene. By comparing the per-gene expression values with and without SFP-containing probes, we could test whether SFPs were a significant source of variation in a gene's overall expression value.
ELP sliding-window analysis:
All 22,810 genes represented on the ATH1 GeneChip and all of the 875 sequenced loci (NORDBORG et al. 2005) (walnut.usc.edu/2010/) were positioned on the Arabidopsis physical map (TAIR website; www.arabidopsis.org), using the ATG start codon as the start of each gene or the first base pair as the start of each sequenced locus. Each gene was classified according to whether it showed differential expression between any of our six accessions that were also a part of the published sequence diversity study (Col-0, Cvi-1, Kin-0, Mt-0, Tsu-1, and Van-0) (NORDBORG et al. 2005). Microsoft Excel was used to determine the frequency of gene expression divergence along each chromosome, in a 500-gene sliding window. The values were expressed as the ELP/gene ratio by dividing the number of ELPs per 500-gene sliding window by 500 to place these values on a similar scale as the nucleotide polymorphism values for ease of comparison. This measure of divergence was plotted at the physical position for the median of the 500 genes within the window. The window was moved to the next adjacent gene, and the process was repeated until the end of the chromosome was reached. We utilized a fixed 500-gene window to minimize the effects of variation in gene density upon the estimate. Windows of 100 and 250 genes provided similar results, although the traces were less smooth (data not shown).To estimate the threshold limit at a significance level of 0.05 for the frequency of differential gene expression, the positions of the differentially expressed genes were randomly permuted across the genome 500 times. The sliding-window analysis was repeated for each permutation, and the maximal and minimal values of differentially expressed genes per window per randomly permuted genome were obtained. The upper and lower P = 0.05 limits were then empirically set, using these distributions from the permutations (0.056 ELP/gene for the lower bound and 0.132 ELP/gene for the upper bound). The extreme lower and upper values identified in the permutation analysis were 0.048 ELP/gene and 0.150 ELP/gene, respectively.
Sequence divergence analysis:
Sequence alignments for 96 A. thaliana accessions at 875 loci were downloaded from the public website (NORDBORG et al. 2005) (walnut.usc.edu/2010/). Six of the seven accessions (Col-0, Cvi-1, Kin-0, Mt-0, Tsu-1, and Van-0) in our study (Est was not included) were represented in this sequence collection. Alignments were then filtered to identify and count the polymorphisms across all 96 accessions, across the 6 accessions in common between the two studies, as well as for each possible pairwise combination among our 6 accessions. This information was utilized to measure the frequency of each polymorphism across all 96 accessions and to classify whether it was polymorphic in our 6 accessions. The polymorphism information from all possible pairwise alignments of the 6 accessions was used to calculate the per-locus nucleotide diversity (
) for each pairwise alignment (TAJIMA 1983; NEI 1987). A linear regression analysis was utilized to determine the relationship between sequence polymorphism and level of ELPs. The number of ELPs per accession pair was regressed upon the average
-measure for the same accession pair. Each accession pair was given a categorical variable indicating whether that pair contained the Col-0 accession or not. This variable was included in the analysis to test if the linear relationship was dependent upon the presence of Col-0; the presence of Col-0 did affect the ELP estimate, but it did not alter the positive association between ELP number and the average
-measure. A 5-Mbp sliding window of the average
-measure for the same 6 accessions using all available 875 sequenced loci was generated in Excel as described previously. The average value was plotted at the median physical position. A 5-Mbp window size was selected to allow for sufficient averaging across the sequenced loci.
GO annotation analysis:
The complete genome gene ontology (GO) biological classification list was obtained and used to classify each gene (www.arabidopsis.org). The frequency of each first-level classification was obtained for the complete genome as well as for each list of ELPs from the pairwise accession ANOVAs. Each ELP list was tested for significant deviations from the expected frequencies for the complete genome using
2-analysis (CHEN et al. 2005). Diversity sampling:
Six of our 7 accessions are included in a set of 96 Arabidopsis accessions from which 875 loci have been sequenced (NORDBORG et al. 2005) (walnut.usc.edu/2010/), allowing us to estimate the amount of species-wide DNA polymorphism contained in our sample of accessions. We identified the polymorphisms present in 875 loci distributed across the genome and scored the frequency of each polymorphism among the 96 accessions and if that polymorphism was variable in our 6 accessions. Our 6 accessions contained
80% of the species-wide medium- to high-frequency polymorphisms (minor allele frequency >10%) (Figure 1). In the rare- to low-frequency class, our accessions contained only 2030% of the species diversity (Figure 1). Therefore, our accessions were representative of most of the medium- to high-frequency sequence variation in Arabidopsis.
|
ANOVA of microarray data for differentially expressed genes:
The split-plot ANOVA revealed that four effects were significant for all 21 pairs of accessions: array, gene, interaction of gene with accession, and gene with time point (data not shown and Table 1). Since expression levels vary across different genes, a significant gene effect was expected. The significant gene x accession interaction effect indicates that gene expression behaved differently in the various accessions. Since these differences may be due to potential ELPs or sequence polymorphisms, the statistical analysis cannot distinguish between these two alternatives. A significant gene x time point interaction effect indicates that different genes exhibit distinct expression patterns over time. The significant array effect shows that there is chip-to-chip variation, but the split-plot model properly accounted for this source of variation. Over the 21 pairs of accessions, there were no significant effects for SA treatment, interaction of accession x treatment, and accession x treatment x gene (three-way interaction), suggesting that the SA treatment did not strongly affect the overall expression levels across all 22,810 genes, which is expected given that only a subset of genes is induced by SA. However, the regulation of expression over time was definitely affected by SA treatment, as evidenced by significant effects for time points and all time-related interactions.The pairwise ANOVA identified on average 2234 genes showing differential expression signal (potential ELPs) between any two accessions across treatments, using false discovery rate (FDR) multiple-testing correction (BENJAMINI and HOCHBERG 1995), and 331 genes when using Holm's multiple testing correction (HOLM 1979) (Figure 2). Accession Cvi-1 had more potential ELPs in all of its pairwise comparisons than did any of the other accessions (Figure 2). To assess the variability of potential ELPs, we added the number of pairwise comparisons for which any given gene showed a differential expression signal. The resulting distribution was heavy tailed and skewed toward low- to moderate-frequency ELPs and was similar for gene lists obtained using either the FDR or Holm's procedure (Figure 3). Interestingly, several hundred genes showed differential expression signals in over half of the pairwise tests, suggesting that these genes have multiple alleles that can confer distinct expression levels.
|
|
We included a SA treatment as we are interested in the natural variation in responses to SA application. The use of two treatments (SA and control) provided an opportunity to examine the consequences of experimental treatment variation on the set of genes showing differential expression. SA is a major signaling molecule involved in plant defense responses (GAFFNEY et al. 1993; DELANEY et al. 1995). We utilized a 0.30-mM SA treatment, as higher SA levels caused phytotoxicity in some accessions (data not shown). Statistical analysis identified an average of 660 genes with FDR and 25 genes with Holm's procedure showing differential SA response between at least one pair of accessions. These numbers represent
1020% of the genes showing a difference in expression among accessions that did not exhibit a treatment interaction. This suggests there is considerable variation in responses to this key signaling molecule, but how this relates to variation in disease resistance is currently unknown.
Control genes:
To test the ability of our ANOVA to detect genes that had previously been reported as varying between accessions, we studied a set of genes that were known to display natural variation in DNA sequence, mRNA accumulation, and phenotype. This comparison included genes involved in the production of the plant-defense-related glucosinolates (ESP, AOP2, AOP3, GS-OH, Mam1, and MamL) (KLIEBENSTEIN et al. 2001; LAMBRIX et al. 2001; KROYMANN et al. 2003), flowering time (FLC, FRI, FLM, and HUA2) (JOHANSON et al. 2000; CAICEDO et al. 2004; DOYLE et al. 2005; LEMPE et al. 2005; WERNER et al. 2005), hypocotyl light responses (PHYA and CRY2) (EL-ASSAL et al. 2001; MALOOF et al. 2001), and elicitation of resistance (RPM1, RPS2, RPS4, RPS5, and RPP5) (GRANT et al. 1995; PARKER et al. 1997; GASSMANN et al. 1999; HENK et al. 1999; MAURICIO et al. 2003). These genes were classified on the basis of the nature of the molecular polymorphisms (Table 2). Any polymorphism (INDEL, splice site polymorphism, or observed mRNA difference) that might cause a change in observed gene expression was listed as a potential ELP. Our statistical analysis identified 12 of the 13 genes previously known or predicted as being differentially expressed (Table 2). One gene (FLM) that we did not identify as polymorphic has an INDEL in an accession that was not included in our study (WERNER et al. 2005). Genes with nonsense or missense polymorphisms served as controls and, as expected, were not detected as differentially expressed. These results suggest that our experiment has a low false negative rate for detecting ELPs.
|
To check if the above analysis was biased by using genes known to cause phenotypic polymorphisms, we used the 12 genes showing differential expression in the seven accessions for QRTPCR analysis for differential gene expression. The average correlation for the expression values of these 12 genes between QRTPCR and ATH1 GeneChip values was 0.69, indicating that the microarray analysis accurately detects genes with ELPs.
SFPs and differential expression:
To test if the presence of SFPs significantly alters estimates of gene expression, we calculated the apparent expression levels for 400 genes that contained SFPs between Col-0 and Cvi-1 with and without signal from the SFP-containing probes and employed a split-split-plot ANOVA. Of the 400 SFP-containing genes, the SFP was responsible for
0.13% of the variance in signal among these genes. Furthermore, inclusion of the probes with SFPs in the estimation of a gene's expression value led to only a 7% decrease in the estimate. Therefore, SFPs are not a significant source of variation in gene expression estimates and the differences among accessions were predominantly due to actual ELPs.
Gene function bias:
Previous work had suggested a bias in the functional classification of genes that show differential expression among Arabidopsis accessions (CHEN et al. 2005). When the GO biological annotations were used to classify the genes, the differentially expressed genes were significantly enriched for genes classified as controlling biotic and abiotic responses, stress responses, and signal transduction (Figure 4). These differences in classification frequencies were significant by
2-tests in all pairwise comparisons per category as well as for the average of the pairwise comparisons after adjusting for multiple comparisons with Bonferroni correction (data not shown). In addition, there was a significant decrease in genes classified as being involved in transport (Figure 4). This result was similar to those previously reported by CHEN et al. (2005). A similar comparison using the GO molecular annotations identified a significant difference from the whole-genome expectation (in over half of the pairwise comparisons) only for genes classified as receptor binding or activity. No classes in GO cellular annotations were significant in the pairwise comparisons (data not shown).
|
Relationship of pairwise expression differences to sequence divergence:
If there is no major filtering mechanism, such as differential selection pressures upon sequence and expression polymorphisms or massive trans-acting effects, to bias the impact of DNA sequence variation on gene expression variation, a genome-wide relationship between DNA sequence variation and ELP variation is expected. To assess this relationship, we determined the pairwise
, average pairwise nucleotide divergence (TAJIMA 1983; NEI 1987) between the six accessions for which there were sequence data, and then compared
with the number of genes we detected as differentially expressed between the same accession pair (as determined by FDR). There was a significant positive relationship between these values (Figure 5). This linear regression analysis accounted for whether the pair included Col-0, from which the oligomer probes on the ATH1 GeneChip were designed (N = 15, P < 0.0001, r = 0.83; Col-0 presence in accession pair, d.f. = 1, P = 0.0483) (Figure 5). The estimates of gene expression divergence provided by Holm's multiple testing correction generated a nearly identical relationship between sequence and gene expression divergence (N = 15, P < 0.0001, r = 0.86; Col-0 presence in accession pair, d.f. = 1, P = 0.0097). This association was also significant when comparing
against the number of genes showing an expression difference dependent upon an accession x treatment interaction (N = 15, P < 0.0001, r = 0.95; Col-0 presence in accession pair, d.f. = 1, P = 0.001). These genes were not likely to have been affected by hybridization differences, such as those caused by INDEL polymorphisms, since these genes exhibited nonvariable gene expression in one treatment across two accessions but showed differential gene expression under the other treatment. This relationship between
and the genes significant for accession x treatment interactions further supports the hypothesis that SFPs did not significantly influence estimates of gene expression variation and that we indeed measured ELPs among accessions.
|
Genomic distribution of expression differences and sequence divergence:
The relationship between
and gene expression variation (previous section) is a genome-wide average comparison. To assess if there was a relationship between local DNA sequence variation and gene expression variation, we conducted a sliding-window analysis along the Arabidopsis chromosomes (Figure 6). A distinct relationship in local sequence variation and gene expression variation was detected, as the two traces were very similar across the chromosomes (Figure 6). Local peaks of DNA sequence divergence among the accessions typically colocated with local enrichments in the number of genes showing polymorphic gene expression. In particular, there is a region near the middle of chromosome III that has a significant elevation in gene expression polymorphism and is also enriched in DNA sequence variation (Figure 6). Two regions with a lack of local correlation between sequence diversity and polymorphic gene expression can be explained by technical or biological reasons. The flat region for differential gene expression at the top of chromosome IV is an area of low gene density due to the presence of the nucleolar organizing region. The
10-Mbp region on chromosome V that had low estimates of
also has a paucity of sequenced loci and may represent a local sampling bias.
|
The ELPs detected in our study showed a bias in their categorization by the GO biological annotations, with an overrepresentation of genes involved in signaling and plant/stress responses compared to the base distribution from the Col-0 sequence annotation (Figure 4). These biases were the same as those found by CHEN et al. (2005) in their analysis of variable gene expression among accessions sampled at different developmental stages. As suggested by these categorical differences, the greatest opportunities for the evolution of differences in gene expression levels in Arabidopsis are in responses to the environment. This idea is consistent with the number of flowering-time and light-response quantitative trait loci (QTL) that have been mapped to known signaling loci (JOHANSON et al. 2000; EL-ASSAL et al. 2001; MALOOF et al. 2001; CAICEDO et al. 2004; DOYLE et al. 2005; WERNER et al. 2005). However, it is also possible that even though the plants were grown under carefully controlled conditions, there were physiological differences between the accessions that led to differential stress responses.
We identified a positive relationship between ELPs and DNA sequence divergence at both the whole-genome and local subchromosomal levels (Figures 5 and 6). This relationship implies that there is a direct association between DNA sequence polymorphism and ELP. Such a relationship at the local level suggests that there is a significant impact of cis- (or linked trans-) DNA sequence polymorphism on gene expression. If the majority of expression differences were due to unlinked trans-polymorphisms, the global estimates of sequence and expression polymorphism would still be correlated but at the local level there would not be a relationship between ELP and DNA sequence polymorphism. However, there was heterogeneity in the local relationship between sequence and gene expression variation. For example, chromosome I appeared to be split, with the top half of the chromosome showing an association while the bottom half of chromosome I showed a decrease in
in relation to gene expression variation (Figure 6). Thus, there seemed to be an enhanced frequency of ELPs at the distal end of chromosome I. It will require genetic analysis of a segregating population to provide more definitive data on the relative importance of cis- vs. trans-acting DNA polymorphisms in regulating ELPs in Arabidopsis.
This work provides a baseline description of the level and patterns of ELP polymorphism within Arabidopsis. As a number of Arabidopsis QTL have been shown to be controlled by an ELP within a causative gene, our data may enhance QTL cloning by providing rapid candidate gene identification. Furthermore, these data provide a foundation from which to explore the impacts of different environmental conditions and developmental stages upon ELPs. Additionally, ELPs can be mapped in a segregating population as expression QTL to characterize how cis- and trans-acting loci coordinate to generate ELP patterns within the species (BREM et al. 2002; SCHADT et al. 2003; YVERT et al. 2003; BREM and KRUGLYAK 2005).
AUSUBEL, F. M., R. BRENT, R. E. KINGSTON, D. D. MOORE, J. G. SEIDMAN et al., 2004 Current Protocols in Molecular Biology. John Wiley & Sons, New York.
BENJAMINI, Y., and Y. HOCHBERG, 1995 Controlling the false discovery ratea practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Methodol. 57: 289300.
BENTLEY, S. D., and J. PARKHILL, 2004 Comparative genomic structure of prokaryotes. Annu. Rev. Genet. 38: 771792.[CrossRef][Medline]
BOREVITZ, J. O., D. LIANG, D. PLOUFFE, H. S. CHANG, T. ZHU et al., 2003 Large-scale identification of single-feature polymorphisms in complex genomes. Genome Res. 13: 513523.
BREM, R. B., and L. KRUGLYAK, 2005 The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proc. Natl. Acad. Sci. USA 102: 15721577.
BREM, R. B., G. YVERT, R. CLINTON and L. KRUGLYAK, 2002 Genetic dissection of transcriptional regulation in budding yeast. Science 296: 752755.
CAICEDO, A. L., J. R. STINCHCOMBE, K. M. OLSEN, J. SCHMITT and M. D. PURUGGANAN, 2004 Epistatic interaction between Arabidopsis FRI and FLC flowering time genes generates a latitudinal cline in a life history trait. Proc. Natl. Acad. Sci. USA 101: 1567015675.
CARROL, S. B., 2000 Endless forms: the evolution of gene regulation and morphological diversity. Cell 101: 577580.[CrossRef][Medline]
CHEN, W. J., S. H. CHANG, M. E. HUDSON, W.-K. KWAN, J. LI et al., 2005 Contribution of transcriptional regulation to natural variations in Arabidopsis. Genome Biol. 6: R32.[CrossRef][Medline]
CONG, B., J. P. LIU and S. D. TANKSLEY, 2002 Natural alleles at a tomato fruit size quantitative trait locus differ by heterochronic regulatory mutations. Proc. Natl. Acad. Sci. USA 99: 1360613611.
CRAIG, B. A., M. A. BLACK and R. W. DOERGE, 2003 Gene expression data: the technology and statistical analysis. J. Agric. Biol. Environ. Stat. 8: 128.[CrossRef]
DELANEY, T. P., L. FRIEDRICH and J. A. RYALS, 1995 Arabidopsis signal transduction mutant defective in chemically and biologically induced disease resistance. Proc. Natl. Acad. Sci. USA 92: 66026606.
DOERGE, R. W., 2002 Mapping and analysis of quantitative trait loci in experimental populations. Nat. Rev. Genet. 3: 4352.[CrossRef][Medline]
DOYLE, M. R., C. M. BIZZELL, M. R. KELLER, S. D. MICHAELS, J. D. SONG et al., 2005 HUA2 is required for the expression of floral repressors in Arabidopsis thaliana. Plant J. 41: 376385.[CrossRef][Medline]
EL-ASSAL, S. E. D., C. ALONSO-BLANCO, A. J. M. PEETERS, V. RAZ and M. KOORNNEEF, 2001 A QTL for flowering time in Arabidopsis reveals a novel allele of CRY2. Nat. Genet. 29: 435440.[CrossRef][Medline]
FRARY, A., T. C. NESBITT, A. FRARY, S. GRANDILLO, E. VAN DER KNAAP et al., 2000 fw2.2: a quantitative trait locus key to the evolution of tomato fruit size. Science 289: 8588.
GAFFNEY, T., L. FRIEDRICH, B. VERNOOIJ, D. NEGROTTO, G. NYE et al., 1993 Requirement of salicylic acid for the induction of systemic acquired-resistance. Science 261: 754756.
GASSMANN, W., M. E. HINSCH and B. J. STASKAWICZ, 1999 The Arabidopsis RPS4 bacterial-resistance gene is a member of the TIR-NBS-LRR family of disease-resistance genes. Plant J. 20: 265277.[Medline]
GIBBS, R. A., J. W. BELMONT, P. HARDENBOL, T. D. WILLIS, F. L. YU et al., 2003 The International HapMap Project. Nature 426: 789796.[CrossRef][Medline]
GLAZEBROOK, J., 2001 Genes controlling expression of defense responses in Arabidopsis2001 status. Curr. Opin. Plant Biol. 4: 301308.[CrossRef][Medline]
GRANT, M. R., L. GODIARD, E. STRAUBE, T. ASHFIELD, J. LEWALD et al., 1995 Structure of the Arabidopsis Rpm1 gene enabling dual-specificity disease resistance. Science 269: 843846.
HENK, A. D., R. F. WARREN and R. W. INNES, 1999 A new Ac-like transposon of Arabidopsis is associated with a deletion of the RPS5 disease resistance gene. Genetics 151: 15811589.
HOLM, S., 1979 A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6: 6570.
JOHANSON, U., J. WEST, C. LISTER, S. MICHAELS, R. AMASINO et al., 2000 Molecular analysis of FRIGIDA, a major determinant of natural variation in Arabidopsis flowering time. Science 290: 344347.
KLIEBENSTEIN, D., V. LAMBRIX, M. REICHELT, J. GERSHENZON and T. MITCHELL-OLDS, 2001 Gene duplication and the diversification of secondary metabolism: side chain modification of glucosinolates in Arabidopsis thaliana. Plant Cell 13: 681693.
KLIEBENSTEIN, D. J., D. PEDERSEN and T. MITCHELL-OLDS, 2002 Comparative analysis of insect resistance QTL and QTL controlling the myrosinase/glucosinolate system in Arabidopsis thaliana. Genetics 161: 325332.
KROYMANN, J., S. DONNERHACKE, D. SCHNABELRAUCH and T. MITCHELL-OLDS, 2003 Evolutionary dynamics of an Arabidopsis insect resistance quantitative trait locus. Proc. Natl. Acad. Sci. USA 100: 1458714592.
LAMBRIX, V., M. REICHELT, T. MITCHELL-OLDS, D. KLIEBENSTEIN and J. GERSHENZON, 2001 The Arabidopsis epithiospecifier protein promotes the hydrolysis of glucosinolates to nitriles and influences Trichoplusia ni herbivory. Plant Cell 13: 27932807.
LEMPE, J., S. BALASUBRAMANIAN, S. SURESHKUMAR, A. SINGH, M. SCHMID et al., 2005 Diversity of flowering responses in wild Arabidopsis thaliana strains. PLoS Genet. 1: 109118.[Medline]
MALOOF, J. N., J. O. BOREVITZ, T. DABI, J. LUTES, R. B. NEHRING et al., 2001 Natural variation in light sensitivity of Arabidopsis. Nat. Genet. 29: 441446.[CrossRef][Medline]
MAURICIO, R., E. A. STAHL, T. KORVES, D. C. TIAN, M. KREITMAN et al., 2003 Natural selection for polymorphism in the disease resistance gene Rps2 of Arabidopsis thaliana. Genetics 163: 735746.
NEI, M., 1987 Molecular Evolutionary Genetics. Columbia University Press, New York.
NORDBORG, M., T. T. HU, Y. ISHINO, J. JHAVERI, C. TOOMAJIAN et al., 2005 The pattern of polymorphism in Arabidopsis thaliana. PLoS Biol. 3: e196.[CrossRef][Medline]
NUZHDIN, S. V., M. L. WAYNE, K. L. HARMON and L. M. MCINTYRE, 2004 Common pattern of evolution of gene expression level and protein sequence in Drosophila. Mol. Biol. Evol. 21: 13081317.
PARKER, J. E., M. J. COLEMAN, V. SZABO, L. N. FROST, R. SCHMIDT et al., 1997 The Arabidopsis downy mildew resistance gene RPP5 shares similarity to the toll and interleukin-1 receptors with N and L6. Plant Cell 9: 879894.
ROZEN, S., and H. SKALETSKY, 2000 Primer3 on the WWW for general users and for biologist programmers, pp. 365386 in Bioinformatics Methods and Protocols: Methods in Molecular Biology, edited by S. KRAWETZ and S. MISENER. Humana Press, Totowa, NJ.
SCHADT, E. E., S. A. MONKS, T. A. DRAKE, A. J. LUSIS, N. CHE et al., 2003 Genetics of gene expression surveyed in maize, mouse and man. Nature 422: 297302.[CrossRef][Medline]
SCHMID, K. J., T. R. SORENSEN, R. STRACKE, O. TORJEK, T. ALTMANN et al., 2003 Large-scale identification and analysis of genome-wide single-nucleotide polymorphisms for mapping in Arabidopsis thaliana. Genome Res. 13: 12501257.
SCHMID, K. J., S. RAMOS-ONSINS, H. RINGYS-BECKSTEIN, B. WEISSHAAR and T. MITCHELL-OLDS, 2005 A multilocus sequence survey in Arabidopsis thaliana reveals a genome-wide departure from a neutral model of DNA sequence polymorphism. Genetics 169: 16011615.
STEIN, L. D., Z. R. BAO, D. BLASIAR, T. BLUMENTHAL, M. R. BRENT et al., 2003 The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics. PloS Biol. 1: 166.[CrossRef]
TAJIMA, F., 1983 Evolutionary relationship of DNA sequences in finite populations. Genetics 105: 437460.
WANG, R. L., A. STEC, J. HEY, L. LUKENS and J. DOEBLEY, 1999 The limits of selection during maize domestication. Nature 398: 236239.[CrossRef][Medline]
WERNER, J. D., J. O. BOREVITZ, N. WARTHMANN, G. T. TRAINER, J. R. ECKER et al., 2005 Quantitative trait locus mapping and DNA array hybridization identify an FLM deletion as a cause for natural flowering-time variation. Proc. Natl. Acad. Sci. USA 102: 24602465.
WINZELER, E. A., C. I. CASTILLO-DAVIS, G. OSHIRO, D. LIANG, D. R. RICHARDS et al., 2003 Genetic diversity in yeast assessed with whole-genome oligonucleotide arrays. Genetics 163: 7989.
YVERT, G., R. B. BREM, J. WHITTLE, J. M. AKEY, E. FOSS et al., 2003 Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors. Nat. Genet. 35: 5764.[Medline]
Communicating editor: J. A. BIRCHLER
This article has been cited by other articles:
![]() |
J. Wu, C. Hettenhausen, M. C. Schuman, and I. T. Baldwin A Comparison of Two Nicotiana attenuata Accessions Reveals Large Differences in Signaling Induced by Oral Secretions of the Specialist Herbivore Manduca sexta Plant Physiology, March 1, 2008; 146(3): 927 - 939. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. M.P. Van Poecke, M. Sato, L. Lenarz-Wyatt, S. Weisberg, and F. Katagiri Natural Variation in RPS2-Mediated Resistance among Arabidopsis Accessions: Correlation between Gene Expression Profiles and Phenotypic Responses PLANT CELL, December 1, 2007; 19(12): 4046 - 4060. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Slotte, K. Holm, L. M. McIntyre, U. Lagercrantz, and M. Lascoux Differential Expression of Genes Important for Adaptation in Capsella bursa-pastoris (Brassicaceae) Plant Physiology, September 1, 2007; 145(1): 160 - 173. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. van Leeuwen, D. J. Kliebenstein, M. A.L. West, K. Kim, R. van Poecke, F. Katagiri, R. W. Michelmore, R. W. Doerge, and D. A. St.Clair Natural Variation among Arabidopsis thaliana Accessions for Transcriptome Response to Exogenous Salicylic Acid PLANT CELL, July 1, 2007; 19(7): 2099 - 2110. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Krinke, E. Ruelland, O. Valentova, C. Vergnolle, J.-P. Renou, L. Taconnat, M. Flemr, L. Burketova, and A. Zachowski Phosphatidylinositol 4-Kinase Activation Is an Early Response to Salicylic Acid in Arabidopsis Suspension Cells Plant Physiology, July 1, 2007; 144(3): 1347 - 1359. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. L. West, K. Kim, D. J. Kliebenstein, H. van Leeuwen, R. W. Michelmore, R. W. Doerge, and D. A. St. Clair Global eQTL Mapping Reveals the Complex Genetic Architecture of Transcript-Level Variation in Arabidopsis Genetics, March 1, 2007; 175(3): 1441 - 1450. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Masclaux-Daubresse, S. Purdy, T. Lemaitre, N. Pourtau, L. Taconnat, J.-P. Renou, and A. Wingler Genetic Variation Suggests Interaction between Cold Acclimation and Metabolic Regulation of Leaf Senescence Plant Physiology, January 1, 2007; 143(1): 434 - 446. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. de Meaux, A. Pop, and T. Mitchell-Olds Cis-regulatory Evolution of Chalcone-Synthase Expression in the Genus Arabidopsis Genetics, December 1, 2006; 174(4): 2181 - 2202. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A.L. West, H. van Leeuwen, A. Kozik, D. J. Kliebenstein, R.W. Doerge, D. A. St. Clair, and R. W. Michelmore High-density haplotyping with microarray-based expression and single feature polymorphism markers in Arabidopsis Genome Res., June 1, 2006; 16(6): 787 - 795. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Supplemental Data
-
All Versions of this Article:
genetics.105.049353v1
172/2/1179 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Kliebenstein, D. J.
- Articles by St. Clair, D. A.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Kliebenstein, D. J.
- Articles by St. Clair, D. A.










