The genetic architecture of transcript-level variation is largely unknown. The genetic determinants of transcript-level variation were characterized in a recombinant inbred line (RIL) population (n = 211) of Arabidopsis thaliana using whole-genome microarray analysis and expression quantitative trait loci (eQTL) mapping of transcript levels as expression traits (e-traits). Genetic control of transcription was highly complex: one-third of the quantitatively controlled transcripts/e-traits were regulated by cis-eQTL, and many trans-eQTL mapped to hotspots that regulated hundreds to thousands of e-traits. Several thousand eQTL of large phenotypic effect were detected, but almost all (93%) of the 36,871 eQTL were associated with small phenotypic effects (R2 < 0.3). Many transcripts/e-traits were controlled by multiple eQTL with opposite allelic effects and exhibited higher heritability in the RILs than their parents, suggesting nonadditive genetic variation. To our knowledge, this is the first large-scale global eQTL study in a relatively large plant mapping population. It reveals that the genetic control of transcript level is highly variable and multifaceted and that this complexity may be a general characteristic of eukaryotes.
TRANSCRIPT levels, when assessed in an experimental or mapping population, can be considered as quantitative traits and their variation used to map expression quantitative trait loci (eQTL) (Jansen and Nap 2001; Doerge 2002; Schadt et al. 2003). Recent studies in a limited number of organisms indicate that the levels of transcripts are heritable and can be under multigenic control (Brem et al. 2002; Schadt et al. 2003; Yvert et al. 2003; Kirst et al. 2004; Brem and Kruglyak 2005; Bystrykh et al. 2005; Chesler et al. 2005; Hubner et al. 2005; DeCook et al. 2006). Global eQTL analyses in yeast, mice, and humans have detected significant levels of cis polymorphism controlling individual genes, as well as evidence for clustered trans-eQTL that simultaneously regulate a large fraction of the transcriptome (Brem et al. 2002; Schadt et al. 2003; Morley et al. 2004). In yeast, the complex inheritance of transcript levels (Brem and Kruglyak 2005) has been revealed by detecting significant levels of nonadditive genetic variance, epistatic interactions, and transgressive segregation. However, comparable studies in plants have yet to be reported. Furthermore, the relationship between transcript-level variation and downstream phenotypic trait variation is not well understood (Mackay 2001; Gibson and Weir 2005).
While the identification of cis-eQTL has led to successful QTL cloning in plants (Kliebenstein et al. 2001; Lambrix et al. 2001; Kroymann et al. 2003; Caicedo et al. 2004; Werner et al. 2005; Zhang et al. 2006), genomewide eQTL analyses in large mapping populations of plants have been lacking. To our knowledge, the study that we describe here is among the first large global eQTL mapping studies in plants. Our overarching goal was to assess and report on the genetic architecture of transcript-level variation in a higher plant species. To address this goal, we employed a sample of 211 recombinant inbred lines (RILs) from a population derived from a cross between two inbred Arabidopsis thaliana accessions, Bayreuth-0 (Bay-0) and Shahdara (Sha), that were grown in a biologically replicated experiment to quantify genomewide transcript levels using Affymetrix whole-genome microarrays. The 211 RILs represent a sample of individuals from the population of all possible individuals who represent this particular cross. Transcript-level variation in the 211 RILs was measured and used as quantitative expression traits (e-traits) to map eQTL using a framework map of single feature polymorphism (SFP) markers (West et al. 2006) for the 211 RILs. We report on the genomic distributions for eQTL, as well as on the phenotypic effects and numbers of cis- and trans-eQTL. We also estimate the broad-sense heritability of transcript levels/e-traits in the RILs and parent lines and assess transgressive segregation and nonadditive genetic variation.
MATERIALS AND METHODS
Plant material and experimental conditions:
Seeds for A. thaliana accessions Bay-0 and Sha and the F8 generation of a Bay-0 × Sha RIL population (Loudet et al. 2002) were obtained from The Arabidopsis Information Resource (TAIR; stock no. CS57920; http://www.arabidopsis.org). The advanced generation RILs were created by single-seed descent from the F2 generation (Loudet et al. 2002). The RIL (F8) plants and parental accessions were grown in a single growth chamber at the University of California at Davis and allowed to self-pollinate; seed was harvested from individual plants to produce sufficient seed for each homozygous F9 line for our replicated experiments.
For the Bay-0 × Sha RIL microarray experiment, five plants/biological replicate for each of 211 RILs, plus parental controls, Bay-0 and Sha, were grown as previously described (West et al. 2006). At 6 weeks postgermination, the plants were treated with 0.02% Silwet L77, a surfactant (Lehle Seeds, Round Rock, TX), and harvested 28 hr post-treatment, as previously described (West et al. 2006).
Microarray hybridization and quality control:
Total RNA was extracted, converted to cDNA, and labeled with biotin as recommended by the manufacturer (Affymetrix, Santa Clara, CA; http://www.affymetrix.com). Affymetrix ATH1 GeneChip microarrays were hybridized, washed, scanned, and checked for quality as reported previously (West et al. 2006). The RIL haplotypes obtained using SFP markers derived from the same GeneChips were consistent with the haplotypes determined previously by microsatellite analysis of genomic DNA, indicating that each GeneChip data set was derived from the designated RIL (Loudet et al. 2002; West et al. 2006).
Raw image data from the RIL GeneChips were converted to numeric data via Bioconductor software (http://www.bioconductor.org). Due to the nonbiological variation that is present in every microarray experiment, we normalized across all arrays for the purpose of addressing any nonlinear relationships between arrays. Quantile normalization reduces nonbiological variation that is largely due to the technology itself and, when applied at the probe level, it has been shown to outperform other normalization methods that are based on what is referred to as a “base-line array” (Bolstad et al. 2003). After the quantile normalization, raw intensity values were log2 transformed. For this analysis of gene e-traits, transcript levels of each gene were averaged over the two microarray replicates per RIL to give a single transcript measurement per RIL.
A genetic map was estimated using JoinMap (Kyazma B.V., Wageningen, The Netherlands); 540 SFP markers were obtained as described previously (West et al. 2006), and 38 reference SSR markers (Loudet et al. 2002) were scored on the 211 RILs. In previous work (West et al. 2006), we used a subset of 148 RILs to create a high-density SFP map. When the microarray data for the remaining 63 RILs became available, we reestimated the genetic map on the basis of a total sample of 211 RILs. The resulting map had an average marker density of 0.78 cM. Composite interval mapping (CIM) analysis (Zeng et al. 1999) was employed in conjunction with a 5-cM framework map (93 SFP and 2 SSR markers; supplemental Table 1 at http://www.genetics.org/supplemental/). The “zmapqtl” CIM module of QTL Cartographer Version 1.17 (Basten et al. 1999) with a walking speed of 1 cM and a window size of 10 cM was employed to analyze each e-trait. To obtain a threshold criterion for declaring statistically significant eQTL, a global permutation threshold (GPT) was obtained by permuting the e-traits while maintaining the genetic information. Because it was computationally not feasible (i.e., at least 1000 permutations for each of the 22,794 e-traits) to compute an empirical threshold value (Churchill and Doerge 1994) for each e-trait that we modeled with composite interval mapping, we adopted a GPT approach as follows. For each of 100 randomly selected transcripts or e-traits, the null distribution of the maximum likelihood-ratio test (LRT) statistic was empirically estimated using permutation thresholds based on 1000 permutations (Churchill and Doerge 1994). Inspection of the means and standard deviations for the 100 null distributions from the permuted transcript values revealed little variation, and the average 95-percentile permutation threshold was 12.0329, with a sample standard deviation of 0.3478. Permutation thresholds were then computed on the basis of each of the 100 representative null distributions. A representative null distribution based on 100,000 maximum LRT statistics (1000 permutations × 100 randomly selected e-traits) was employed for all 22,794 noncontrol transcripts represented on the ATH1 microarray. The GPT was calculated as the 95% upper bound of the representative null distribution and equaled 12.0583.
Identification of eQTL location:
After conducting composite interval mapping for each of the 22,794 e-traits, it was necessary to summarize the resulting eQTL with a certain level of resolution to the known location of the gene features on the map. For this we utilized the Eqtl module of QTL Cartographer to count, summarize, and locate potential eQTL (Basten et al. 1999). Since experimental limitations (e.g., genetic map density and the sample size of the mapping population), statistical issues (e.g., limitations of the statistical method), and genetic factors (e.g., ghost QTL, false-positive QTL, linked QTL, etc.) come into play when locating QTL, we attempted to summarize our eQTL results in a general manner by using an “exclusionary window” that is available in the Eqtl module to select and count eQTL. However, we first investigated the effect of varying the exclusionary window size (5, 10, and 20 cM) on the eQTL summary statistics. Changing the exclusionary window size had no effect on the number of transcripts/e-traits with detectable eQTL, but increasing it decreased the number of eQTL that were counted for each transcript (supplemental Figure 1 at http://www.genetics.org/supplemental/). The average number of eQTL counted per transcript decreased from 3.7 to 2.3 when results were compared between windows of 5 and 20 cM (supplemental Table 2 at http://www.genetics.org/supplemental/), but this did not affect the genetic positions of the large trans-effect peaks (data not shown). These results suggest that the smaller exclusionary window sizes fail to accommodate closely linked eQTL, ghost QTL (Haley and Knott 1992), and other artifacts that are the result of either experimental limitations or the limitations of composite interval mapping to locate eQTL. After deciding on an exclusionary window of 20 cM to select and count eQTL, the following protocol was used in turn for each of the 22,794 transcripts whose levels were measured and analyzed as e-traits. First, all potential eQTL were sorted by their LRT statistics. The eQTL with the largest LRT statistic was selected, and all other potential eQTL within the exclusionary window (i.e., within 10 cM on either side of the chosen eQTL) were disregarded. The next eQTL was considered and the process iterated until all eQTL for the particular e-trait were exhausted.
Our observation that a 5-cM exclusionary window (Eqtl module of QTL Cartographer) is most likely affected by additional linked eQTL is supported by the realization that increasing the window size to 20 cM shifted the distribution to more transcripts/e-traits with fewer eQTL (one to three) (supplemental Figure 1 at http://www.genetics.org/supplemental/). To investigate this phenomenon further, we examined four genes that were map-based cloned and shown to be controlled by single cis-eQTL in Arabidopsis (Kliebenstein et al. 2001; Lambrix et al. 2001). These four genes showed large LRT statistics in cis to the physical/genetic location of each gene, as expected (supplemental Figure 2 at http://www.genetics.org/supplemental/). When a 5-cM exclusionary window was used for identifying eQTL, three to five eQTL were incorrectly associated with each of the four cloned loci instead of a single cis-eQTL for each locus. Increasing the window size to 20 cM eliminated these extra eQTL peaks in each test case. Consequently, we used a conservative 20-cM exclusionary window for the global eQTL analysis reported here.
Estimation of heritability:
We calculated estimates of broad-sense heritability (H) for each e-trait as , where is the estimated e-trait (transcript level) genetic variance among different genotypes in this sample of 211 RILs and is the estimated phenotypic variance for an e-trait (Falconer and Mackay 1996). Heritability was estimated for all 22,746 noncontrol e-traits in two microarray data sets independently. The first data set included two GeneChips (biological replicates) for each of the 211 RILs, permitting estimation of the heritability for each e-trait within the RIL population. The estimates of (variance due to replicate) and (experimental error variance) were determined for each e-trait in this data set. The second data set included eight GeneChips per parent (Bay-0 and Sha), representing 16 different biological replicates, 4 replicates of each parent grown concurrently with the two RIL replications. The parental data allowed for the estimation of the broad-sense heritability for the levels of each transcript, which were then compared with the estimated heritabilities of e-traits in the 211 RILs.
Cis- vs. trans-eQTL identification:
The SFP markers (West et al. 2006), which are derived from genes with known physical positions, were used to anchor the genetic map to the genome. The CIM eQTL results for each transcript were examined to investigate if there was an eQTL within 3.5 cM of each gene's physical location. Expanding this exclusionary window to 5 cM had minimal effect on the numbers of cis-eQTL identified (data not shown; further discussion below). To test the impact of cis-eQTL on the relationship between the parental and RIL heritability estimates, we limited this analysis to only those 4933 transcripts represented by a unique probe set to ensure that the probe set was specific to only one genomic location.
Transgressive segregation and eQTL of opposite allelic effect:
An ad-hoc approach was devised to test for transgressive segregation (Brem and Kruglyak 2005). For each e-trait, we counted the number of RILs whose average transcript-level accumulation was 1 pooled standard deviation above or below the minimum or maximum parental mean value. The Bay-0 and Sha accessions may be fixed for groups of alleles with opposite effects at two eQTL (e.g., Bay-0 as + −/+ − and Sha as − +/− +), which would either cancel out each other or generate progeny with a greater phenotypic distribution than the parents due to the segregation of recombinant individuals with + +/+ + and − −/− − genotypes for the two eQTL (Lynch and Walsh 1998). For the 10,084 e-traits with two or more significant eQTL, the additive effect of each eQTL was queried to determine if the transcript was associated with eQTL having opposite allelic effects. An e-trait was defined as exhibiting an opposite allelic effect if it had at least one eQTL with a positive Bay-0 allelic effect and at least one eQTL with a positive Sha allelic effect. If all of the eQTL for an e-trait had only positive Bay-0 or only positive Sha allelic effects, the transcript was classified as not showing transgressive segregation.
eQTL hotspot significance threshold:
To determine whether a genetic location associated with multiple eQTL was a significant cluster or hotspot, we estimated a significance threshold using permutation. The positions of the 36,871 eQTL at the marker intervals were permuted across the genome 1000 times, and the maximal number of eQTL per genetic position per permutation was obtained. Using the distribution of the maximum number of eQTL, the criterion for declaration of a significant eQTL hotspot was 133 eQTL/genetic position at α = 0.05.
Gene ontology analysis of trans-eQTL hotspots:
The gene ontology (GO) analysis was performed using GOslim terms (a high-level GO term for functional categorization). Analysis was performed using the GO annotation (Berardini et al. 2004) from the TAIR website (download 20051119; http://www.arabidopsis.org/). The frequency of each classification was obtained for the completely sequenced genome and for the gene lists for each of the 17 trans-eQTL hotspots (see results). Each gene list was tested for significant deviation (P ≤ 0.001) from the expected frequencies for the complete genome using χ2 analysis (Chen et al. 2005).
Numbers of eQTL detected:
The majority of genes within the Arabidopsis genome exhibited heritable transcriptional variation that is controlled by eQTL in a Bay-0 × Sha RIL population (Table 1). Of the 22,746 Arabidopsis transcripts represented on the Affymetrix ATH1 microarray and whose levels were measured as e-traits in the 211 RILs, 69% were associated with one or more significant eQTL (LRT statistics > 12.0583, the 95% global permutation threshold; see materials and methods). A total of 36,871 distinct eQTL were detected and counted using the Eqtl module of QTL Cartographer and an exclusionary window size of 20 cM (for details, see materials and methods). The number of eQTL detected per transcript/e-trait varied from 0 to 11. The transcripts/e-traits associated with eQTL had an average of 2.34 and a median of two eQTL/transcript. Additionally, eQTL were identified for non-nuclear genes: 69% of the 155 organelle-encoded transcripts identified nuclear-encoded eQTL (Table 1).
eQTL genomic distribution:
One-third of the e-traits with detectable eQTL had a cis-eQTL (i.e., coincident with the physical position of the gene whose transcript level varied) (Table 1). This proportion was strikingly uniform across the five Arabidopsis chromosomes, as evidenced by the diagonal of significant LRT statistics across each chromosome (Figure 1A). Furthermore, there were approximately equal contributions of parental alleles that increased transcript abundance (Figure 1A).
The remaining (trans-)eQTL were nonuniformly distributed across the chromosomes, with the majority of eQTL localized on chromosome II (Figure 1, Table 1). eQTL were clustered into trans-eQTL hotspots that are visible as vertical bands of eQTL (Figure 1A) and as peaks of transcript numbers with common eQTL map positions (Figure 1B). Seventeen significant trans-eQTL hotspots were detected across the genome, with a minimum of one/chromosome. Of these 17 hotspots, 10 contained eQTL associated with >200 different transcripts; the largest hotspot was associated with 2528 transcripts (Figure 1B; Table 2). Chromosome II had the greatest number of hotspots (5), as well as the hotspots associated with the greatest number of transcripts (Figure 1; Table 2). A GO analysis of the transcripts associated with the 17 trans-eQTL hotspots did not reveal any significant over- or underrepresentation of any functional or biological process GO categories (data not shown). In addition, our previous network eQTL analysis in this RIL population using 18 a priori-defined gene networks indicated that the chromosome II hotspots did not control the majority of these networks (Kliebenstein et al. 2006a).
A strong directional bias was evident for each of the trans-eQTL hotspots (Figure 1A; Table 2), such that the same parental allele upregulated most of the transcripts associated with a hotspot. At 8 trans-eQTL hotspots, the Sha allele increased transcript levels (positive effect); the other 9 hotspots were associated with the opposite allelic effect (i.e., the Sha allele decreased accumulation). A global average of 54.7% of the 36,871 eQTL had a positive Sha allelic effect and 45.3% had a positive Bay-0 allelic effect. In contrast, the effects of each trans-eQTL hotspot significantly deviated from these global averages: for 15 of the 17 hotspots, >80% of the transcripts associated with each trans-eQTL hotspot, were affected positively by the same parental allele (Table 2).
Proportion of phenotypic variation per eQTL:
Most individual eQTL accounted for only a small proportion of the associated transcript/e-trait's estimated phenotypic variation (R2): 89% of eQTL accounted for <20% of the R2 for each transcript (Figure 2). The cis-eQTL typically controlled more of an e-trait's phenotypic variation than did the trans-eQTL (Figure 2). This difference in the R2 distributions between cis- and trans-eQTL was not due to the presence of SFPs because >91% of all genes with a cis-eQTL did not have a SFP (supplemental Table 3 at http://www.genetics.org/supplemental/). Furthermore, the genes with a SFP and a cis-eQTL had an R2 distribution similar to genes with a cis-eQTL that lacked a SFP (data not shown).
eQTL allelic effects and transcript heritabilities:
The opposing directionality of allelic effects in the trans-eQTL hotspots suggests the potential for nonadditive genetic variation in transcript levels between Bay-0 and Sha. The presence of nonadditive variation is also supported by the observation that while the parental microarray data showed a significant difference in transcript levels for only 3351 of the 22,746 transcripts (false discovery rate of 0.05) (data not shown), the RIL microarray data allowed detection of eQTL for 69% of the e-traits, indicating that the levels of those transcripts are most likely genetically controlled. To further test for nonadditive genetic variation in this population, we estimated and compared the broad-sense heritability (H) of 22,746 transcripts/e-traits with data from parental microarrays and from replicated microarrays for the 211 RILs. The vast majority of the e-traits showed greater heritability in the RILs than in the parents (Figure 3A), suggesting nonadditive genetic variation in the RILs. Therefore, for the majority of e-traits, the observed variation of transcript levels in the parents was not predictive of the variation present in their progeny.
One potential source of nonadditive genetic variation is transgressive segregation whereby the progeny have average e-trait values that exceed the range of the parent values. Comparing transcript accumulation for 22,746 transcripts between the parents and RILs identified 14,258 transcripts for which at least 10% of the RILs were 1 standard deviation beyond the parental range (data not shown). Since transgressive segregation can be due to the presence of eQTL with opposing parental allelic effects, our investigation revealed that of the 10,084 e-traits associated with two or more eQTL, 6911 exhibited eQTL with opposing additive effects, suggesting that such eQTL are a significant component of the nonadditive genetic variation within the 211 Bay-0 × Sha RILs. Of these 6911 e-traits, 4871 show evidence of transgressive segregation as described above, indicating that 1 standard deviation above and below the parental range is a conservative assessment of nonadditive genetic variation. Our statistical analyses and these summaries do not, however, indicate whether epistasis is a component of the nonadditive genetic variation that is prevalent in this population. Due to the complexity of the required statistical analyses involving both a large number of tests and a large number of eQTL, epistasis will be investigated in a subsequent effort.
Transcript heritabilities in RILs vs. parents:
While the majority of e-trait estimated heritabilities showed little correlation between the parents and the RILs, some did exhibit a positive relationship (Figure 3, B and C). We found that the transcripts/e-traits associated with a cis-eQTL are most likely to exhibit a linear heritability relationship in the parents and RILs, because cis-eQTL tend to control more of the e-trait's variation (R2, Figure 2). In fact, the 5127 e-traits with a cis-eQTL exhibited a stronger linear relationship of heritability between the parents and the RILs (Figure 3C) than did the 10,644 e-traits with only trans-eQTL (i.e., no cis-linkages) (Figure 3D). Even though R2 values are not additive, if used as a relative measure of phenotypic variation for a specific transcript, the totaled R2 for transcripts with a cis-eQTL explains more of the relative phenotypic variation than the corresponding totaled R2 for transcripts with only trans-eQTL (supplemental Figure 3 at http://www.genetics.org/supplemental/). A small proportion of e-traits had high heritability (H > 0.7) in the RILs, but no significant eQTL, suggesting that these effects are too small to detect in a sample of 211 individuals.
The genetic architecture of transcript-level variation in the A. thaliana Bay-0 × Sha RIL population was surprisingly highly variable and complex. Single eQTL were detected for a large proportion (69%) of the 22,746 transcripts/e-traits (Table 1). Many e-traits were also associated with multiple eQTL. We realize that this experiment and these data represent only a single sample of the life-history variation for this species; however, the expression levels of the majority of transcripts showed quantitative variation. These transcript levels are probably quantitatively controlled, given the fact that this RIL population, derived from two inbred accessions, represents only a fraction of the variation present in this diverse species (Nordborg et al. 2005; Kliebenstein et al. 2006b). The complexity of this control also implies that transcriptional regulation in other higher plant species is likely to be as multifaceted. Our findings in Arabidopsis, in conjunction with studies in yeast (Yvert et al. 2003; Brem and Kruglyak 2005), mice (Schadt et al. 2003; Chesler et al. 2005), and humans (Morley et al. 2004), support the working hypothesis that global complexity of transcriptional regulation may be a general feature of eukaryotes.
The transcript levels of most genes in this Arabidopsis RIL population were controlled by multiple eQTL with small R2 values (Figure 2). The RIL sample size, the observable number of recombinants, and the statistical analysis that we employed can all be considered limiting factors of this study that affect our ability to detect eQTL of small phenotypic effect (Beavis 1998; Mackay 2001; Doerge 2002; Kim et al. 2005). The genetic variance associated with an eQTL (i.e., the proportion of the phenotypic variance for which it accounts) is larger in a RIL population than in F2 or backcross populations (Beavis 1998; Flint et al. 2005). Therefore, our use of a RIL mapping population structure in conjunction with 211 biologically replicated RILs greatly facilitated the detection of smaller-effect eQTL due to the relatively large number of observed recombination events that, in turn, allowed more precise estimation of phenotypic variances. Nevertheless, it is likely that additional small-effect eQTL remain undetected in this population. The observed skewed (e.g., long right-hand tail) R2 histogram (Figure 2) suggests that an increase in population size (n > 211) and additional biological replication per line would most likely increase detection of eQTL with R2 values <0.1 (Kim et al. 2005).
This study revealed that small-effect eQTL were prevalent in the 211 RILs, with <6% of the 36,871 eQTL accounting for a large proportion of the estimated phenotypic variation (R2 > 0.3). These results contrast to studies that were based on smaller population sizes in which large phenotypic effect cis-eQTL were prevalent (Schadt et al. 2003; DeCook et al. 2006). Studies based on small sizes in combination with population structures that do not maximize recombination events reduce QTL resolution and lead to sampling bias when detecting eQTL/QTL of larger phenotypic effect (Beavis 1998; Bogdan and Doerge 2005; de Koning and Haley 2005; Gibson and Weir 2005).
Our observation that cis- and trans-eQTL are associated with different distributions of R2 (Figure 2) was also found in Drosophila (Meiklejohn et al. 2003; Wayne et al. 2004; Hughes et al. 2006). A potential explanation for this observation is that the transcript abundance of most genes is regulated by multiple signal transduction networks at multiple levels (e.g., transcription and transcript stability); thus a polymorphism in any one of these regulatory levels may be functionally limited to only a small change in transcript accumulation for those genes controlled in trans by that polymorphism. However, induced mutations in transcription factors can generate large expression differences in genes regulated in trans by the transcription factor. Alternatively, the polymorphism underlying a trans-eQTL hotspot has greater potential to be pleiotropic than a cis-eQTL with no corresponding hotspot. Large-effect mutations in pleiotropic genes are more likely to be deleterious than mutations in less interconnected genes (Wright 1977; Turelli 1988; Wagner 2000; Jeong et al. 2001; Fraser et al. 2002; Yu et al. 2004). There may be an evolutionary fitness limitation on the potential genetic effect of polymorphisms that generate trans-eQTL hotspots. Additional populations and species need to be evaluated to determine if cis-eQTL effects are larger than trans-eQTL effects in general.
The majority of transcripts/e-traits exhibited higher estimated broad-sense heritabilities (H) within the RILs than in the parents (Figure 3), indicating that the transcript-level variation present in these two homozygous inbred Arabidopsis accessions does not accurately predict transcript variation in their RIL progeny. Furthermore, our results indicate that if parental transcript variation is used to select a restricted set of genes for subsequent study in the progeny, many genes would be overlooked. The difference in estimated transcript-level heritabilities for parents and RILs also suggests that the majority of the genetic variance influencing the levels of most transcripts is nonadditive (i.e., includes dominance and epistatic interaction components; Lynch and Walsh 1998). Indeed, ∼70% of the transcripts with two or more eQTL showed opposite allelic effects. Since the parents and RILs were grown together in the same environment, it is unlikely that genotype × environment interactions were a major source of differences in phenotypic variance estimates between parents and RILs, supporting the conclusion that nonadditive genetic variance was the main contributor to differences in e-trait H between parents and RILs. Our results are consistent with a yeast global transcriptome study for which transgressive segregation and nonadditive genetic variation were also prevalent (Brem and Kruglyak 2005). The predominance of nonadditive genetic variation in our study also suggests that surveys of transcript-level variation in parental accessions, at least within Arabidopsis, will significantly underestimate the actual transcript-level variation in progeny populations derived from intercrossing these accessions. However, the underestimation will primarily affect trans-eQTL as the cis-eQTL typically exhibit larger R2 values (Figure 2) and a positive correlation of transcript-level heritability between the parents and their RILs (Figure 3C). This latter observation supports our previous suggestion that correlation between transcript-level variation and sequence polymorphism within seven Arabidopsis accessions was due to cis control of the majority of the detected transcript-level variation (Kliebenstein et al. 2006b). Interestingly, a small number of transcripts (41) exhibited high heritability, but no detectable eQTL. This could be due to control by multiple loci with small additive effects or control by loci that interact nonadditively (Brem and Kruglyak 2005). Neither of these issues is investigated here, which serves as motivation for further work in this area.
While cis-eQTL accounted for a greater proportion of the phenotypic variance, the majority of eQTL within this RIL population are situated in trans to the genes whose transcript levels varied. Furthermore, most trans-eQTL clustered into regions defined as trans-eQTL hotspots that control the levels of a larger number of transcripts but only a small fraction of the variation for each transcript. Trans-eQTL hotspots have been reported in other organisms (Brem et al. 2002; Schadt et al. 2003). Interestingly, in our study the e-traits affected by a trans-eQTL hotspot also showed directionality such that most transcripts were either up- or downregulated by the same parental allele at the trans-eQTL (Figure 1A, Table 2). However, examination of their genomic positions and GO annotation analysis did not identify any functional categories significantly associated with any of the 17 trans-eQTL hotspots (data not shown); thus the biological function of the trans-eQTL hotspots remains unknown. Finally, our comparison of the trans-eQTL hotspots to known phenotypic QTL locations did not identify any significant enrichment of phenotypic QTL in these regions (Kliebenstein et al. 2006a; data not shown).
Clearly, further genetic dissection of each trans-eQTL hotspot is required to identify the genes that influence so many other genes in trans and to understand the biological function of such hotspots. One approach to finding candidate causal genes is to identify cis-eQTL underlying a trans-eQTL hotspot; our analysis identified ∼100 cis-eQTL/hotspot. However, due to the potential genetic complexity of e-QTL regions, it may not be a straightforward task to identify the specific causal gene or genes. QTL and e-QTL detected as single genetic effects may be composed of multiple, physically linked loci, each associated with a small portion of the phenotypic variation (Flint et al. 2005). Understanding how closely linked loci of small effect interact to control a quantitative phenotype would require a very large mapping population (i.e., many recombinants) and appropriate follow-up studies to separate and isolate each of the QTL involved (Kim et al. 2005).
An important question that remains to be addressed is how transcript-level variation is related to and controls downstream phenotypic variation (Gibson and Weir 2005). Studies in plants have correlated specific phenotypic QTL with transcript-level variation controlled in cis (Kliebenstein et al. 2001; Lambrix et al. 2001; Zhang et al. 2006). Several of these studies identified cis-regulated transcript-level variation in signal transduction genes controlling flowering time or circadian rhythm variation (Johanson et al. 2000; Caicedo et al. 2004; Werner et al. 2005). This cis variation in a signal transduction component potentially controls small changes in transcript accumulation for numerous genes in trans. As mentioned previously, we hypothesize that cis-eQTL variation may underlie some trans-eQTL hotspots whereby the genes affected in trans are the mechanism through which the cis-affected gene influences the downstream phenotype. Testing this hypothesis will require high-throughput analysis of a RIL population for both transcript level and downstream phenotypic variation and subsequent functional analysis of the causal genes.
A fundamental issue in quantitative genetics is how the genotype determines the quantitative trait phenotype (Mackay 2001). A study of transcriptional variation is merely one component of this question. Determining the actual biological relationship between transcript-level variation and other phenotypes, such as protein and metabolite levels, that in turn influence trait phenotypes farther downstream, is a highly complex task (Jansen et al. 2002). For example, post-transcriptional regulation can diminish the correlation between transcript levels and protein levels. Furthermore, inaccuracy in measurements of global transcript accumulation can reduce the ability to detect the relationship between mRNA levels and downstream phenotypes. Reducing experimental error via biological replication of large segregating populations representing many recombination events (Jansen and Nap 2001) can improve the ability to detect meaningful biological associations of transcriptomic variation with proteomic and metabolomic variation. The Bay-0 × Sha RIL population, consisting of hundreds of genetically stable inbred lines, and the global eQTL analysis described here provide a basis from which to study how transcript-level variation may influence downstream phenotypic trait variation in higher plants. Such studies will enable determination of how the genotype determines quantitative trait phenotypes and contribute toward a deeper understanding of the genotype–phenotype relationship in eukaryotes.
We thank Rebecca Walker and Tanya Tang for technical assistance and Steve Edberg for assistance with data management and dissemination. This research was supported by the National Science Foundation 2010 Project, grant no. MCB-0115109 to D.A.S., R.W.M., and R.W.D.
Communicating editor: J. F. Doebley
- Received August 17, 2006.
- Accepted December 1, 2006.
- Copyright © 2007 by the Genetics Society of America