Species diversity may have evolved by differential regulation of a similar set of genes. To analyze and compare the genetic architecture of transcript regulation in different genetic backgrounds of Eucalyptus, microarrays were used to examine variation in mRNA abundance in the differentiating xylem of a E. grandis pseudobackcross population [E. grandis × F1 hybrid (E. grandis × E. globulus)]. Least-squares mean estimates of transcript levels were generated for 2608 genes in 91 interspecific backcross progeny. The quantitative measurements of variation in transcript abundance for specific genes were mapped as expression QTL (eQTL) in two single-tree genetic linkage maps (F1 hybrid paternal and E. grandis maternal). EQTL were identified for 1067 genes in the two maps, of which 811 were located in the F1 hybrid paternal map, and 451 in the E. grandis maternal map. EQTL for 195 genes mapped to both parental maps, the majority of which localized to nonhomologous linkage groups, suggesting trans-regulation by different loci in the two genetic backgrounds. For 821 genes, a single eQTL that explained up to 70% of the transcript-level variation was identified. Hotspots with colocalized eQTL were identified in both maps and typically contained genes associated with specific metabolic and regulatory pathways, suggesting coordinated genetic regulation.
COMPARATIVE analysis of genome sequences of yeast, animal, and plant species indicates that the morphological and developmental diversity of eukaryotes arose primarily from the differential regulation of a core set of genes, rather than by extensive acquisition or creation of new genes (King and Wilson 1975; Baltimore 2001; Levine and Tjian 2003). The evolution of gene expression variation may arise from the expansion of gene families regulating transcription differentially or from an increase in complexity of regulatory sequences and protein complexes involved in transcriptional regulation. These mechanisms modify mRNA levels and cell type specificity while maintaining the structure and functionality of proteins. Phenotypic variation at the species level is also created more often by changes in gene expression regulation than through variant forms of proteins. DNA sequence variation is more frequent in regulatory sequences than in coding regions in humans (Rockman and Wray 2002).
Despite the contribution of gene expression variation to diversity and evolution, variation normally has been studied either at the DNA sequence level or at the phenotype level. The evolution of species can now be studied at the transcriptional level using genomic tools that assess genome-wide variation in gene expression (Schena et al. 1995; Velculescu et al. 1995). The genetic control and heritability of transcript-level variation have been demonstrated in yeast, mice, and flies (Dumas et al. 2000; Karp et al. 2000; Brem et al. 2002; Wayne and McIntyre 2002; Schadt et al. 2003; Yvert et al. 2003). Transcriptional regulation may also be affected by nongenetic sources of variation (Gibson 2003), much like a typical quantitative trait. Therefore, transcript levels measured by microarrays may be considered molecular quantitative phenotypes with genetic, environmental, and developmental sources of variation, which contribute to a transcript-level phenotype.
The nature of transcript abundance variation at the individual, population, and species levels has only recently been investigated. Oleksiak et al. (2002) found that 18% of the genes examined showed differences in transcript abundance among 15 individuals from three populations of fish in the genus Fundulus. However, interpopulation variation was minor, affecting the transcript levels of only 15 genes. A comparison of transcript variation in humans, chimpanzees, and orangutans identified species-specific patterns of gene transcript levels, particularly in the brain, which may reflect differences among these taxonomic groups (Enard et al. 2002). A similar analysis identified transcript abundance patterns that are conserved in humans, flies, worms, and yeast and in others that are specific for different animals (Stuart et al. 2003).
Microarrays can be used to study the genetic regulation of gene expression on a genome-wide scale by measuring transcript levels in a segregating progeny. Gene-specific transcript levels estimated from microarrays can be individually analyzed as quantitative traits and mapped using established quantitative genetic methods (Mackay 2001). This strategy has recently been demonstrated in yeast, mice, humans, and maize (Brem et al. 2002; Schadt et al. 2003; Yvert et al. 2003), providing a genome-wide overview of the genetic architecture of gene transcript regulation and the relative extent of cis- and trans-acting factors involved in the control of metabolic and regulatory pathways.
The genetic architecture of variation in transcript abundance represents a complex genetic regulatory network. Understanding the number and nature of the interactions involved is one of the major problems of genetics. For instance, how does the genetic architecture of transcript abundance regulation differ among individuals in a species or among taxonomically related species? Are the genomic regions involved in the regulation of orthologous gene transcripts conserved in different genetic backgrounds, or does variation arise from divergence at many different regulatory loci? These questions were addressed here by analyzing the genetics of transcript variation in wood-forming tissues (differentiating xylem) in a wide interspecific cross of an F1 hybrid of Eucalyptis grandis and E. globulus ssp. globulus, backcrossed to a different E. grandis parent (Figure 1).
E. grandis and E. globulus belong to the same taxonomic subgenus, Symphyomyrtus, but they are members of different sections (Latoangulatae and Maidenaria, respectively) that diverged in the late Miocene-Pliocene (5–10 MYA) (Pryor and Johnson 1971; Eldridge et al. 1993; Ladiges et al. 2003). E. grandis occurs naturally along the subtropical regions of Australia's east coast and E. globulus is found in the temperate climates of Tasmania and the extreme south of the Australian continent (Eldridge et al. 1993). The two species have contrasting wood properties and growth. E. globulus has high wood density and relatively slow growth. E. grandis has superior growth and lower wood density. These phenotypic differences may be due to fixed alleles within the species. Eucalyptus hybrids have been created and propagated widely because of the unique properties of specific hybrid genotypes. Crosses between E. grandis and E. globulus have resulted in progeny populations with wide genetic and phenotypic segregation. The hybrids are particularly suited to describe the genetic architecture of quantitative variation in wood quality and growth traits (Myburg et al. 2003). Therefore, such a hybrid provides an ideal pedigree to study the genetic architecture of transcript abundance variation.
This report extends a previous study (Myburg et al. 2003) in which two separate single-tree genetic maps were generated; a paternal map described the allelic segregation of the F1 hybrid (tree BBT01058/H1A1) and a maternal map described the E. grandis backcross parent (tree 678.2.1). QTL analysis of progeny from the F1 hybrid reflects the effects associated with the presence of the E. globulus parent (genotype not described) or the E. grandis (tree G50) alleles in backcross individuals (Figure 1), revealing species differentiation in the regulation of transcript abundance in wood-forming tissues. QTL reported for the E. grandis (678.2.1) backcross parent reflect differences in genetic regulation in the pure species. Synteny between the F1 hybrid maternal and the E. grandis paternal maps allows the comparison of the genetic architecture of transcript abundance variation of specific genes in different genetic backgrounds. The transcript-level variation measured in the 91 segregating progeny was mapped as expression QTL (eQTL) in the two paternal maps to identify the cis- or trans-acting loci that regulate the transcript-level variation. This approach identified eQTL for 1067 genes and suggests that the genetic control of transcript levels is modulated by variation at different regulatory loci in different genetic backgrounds.
MATERIALS AND METHODS
Transcript-level QTL were mapped onto two single-tree genetic maps [F1 hybrid (genotype H1A1) and E. grandis (genotype 678.2.1)] generated previously by genotyping 331 individuals from the E. grandis backcross population with 803 polymorphic AFLP fragments (Myburg et al. 2003). Framework markers were selected to obtain an average spacing of ∼10 cM and comprise 96 AFLP fragments mapped to 12 major linkage groups in the E. grandis (678.2.1) maternal map and 122 fragments mapped to 11 linkage groups in the F1 hybrid (H1A1) paternal map. The two maps were aligned and synteny was established among linkage groups on the basis of dominant intercross markers, i.e., markers inherited as heterozygotes from both the F1 hybrid (H1A1) and the E. grandis (678.2.1) parents, segregating in a 3:1 ratio in the progeny.
Plant material and tissue collection:
The 91 individuals used in this study, from the E. grandis backcross mapping population (Figure 1), were used to construct the F1 hybrid [E. grandis (genotype G50) × E. globulus (unknown parent)] paternal map and E. grandis (genotype 678.2.1) maternal map (Myburg et al. 2003). Stump sprouts were collected from the E. grandis backcross progeny population when the trees were 6 years old and rooted. The rooted cuttings were planted on a single field plot near Paysandu (Uruguay) by Forestal Oriental S.A. When the trees were 20 months old, differentiating xylem, including cells in the stages of cambial cell division, cell expansion, and secondary-wall formation, was collected from the first two meters of the entire stem circumference of one ramet of each clone. Tissue collection occurred at the peak of the growth season and was restricted to 2 consecutive days to minimize environmental variation.
RNA preparation, labeling, and hybridization:
The tissue was stored in RNAlater solution (Ambion, Austin, TX) until RNA was extracted (Chang et al. 1993) and purified on RNAeasy plant mini kit (QIAGEN, Chatsworth, CA) columns. Labeling was carried out with Cy3 and Cy5 dyes according to the aminoallyl method (Hegde et al. 2000) after first-strand cDNA synthesis with SuperScript II (Life Technologies). After hybridization (20 hr at 42°) and high-stringency washes (Hegde et al. 2000), the slides were scanned with a ScanArray 4000 microarray analysis system scanner (Packard Bioscience). Images were processed using QuantArray software (Packard Bioscience). The raw intensity microarray data were deposited in the Gene Expression Omnibus database under the accession nos. GPL348 (platform), GSM7637–GSM7727 (sample), and GSE502 (series).
The cDNA microarray comprised 2608 cDNAs selected from a set of 14,000 ESTs from E. grandis (cDNA libraries from differentiating xylem, juvenile and adult leaf, petiole, and root) and from E. tereticornis (two cDNA libraries from flower), as described previously (Kirst et al. 2004). After PCR amplification and purification in multiscreen 96-well filtration plates (Millipore, Bedford, MA), clones were screened for quality in agarose gels and printed in duplicate on aminosilane-coated glass slides (Corning, Corning, NY) using an Affymetrix 417 spotter. Sequences of the cDNAs used in the microarray were deposited in GenBank (CB967505–CB968059, CD667988–CD670002, CD670004, CD670097, CD670101–CD670112, CD670114–CD670137).
Experimental design and statistical analysis:
A loop design (Churchill 2002) was used to maximize the number of sampled meioses, while technically replicating each sample (Cy3 and Cy5). The loop design allows for the evaluation of twice as many genotypes (meiotic events) for a given number of slides as does a reference design. Therefore, while each individual from the segregating population was technically replicated using Cy3 and Cy5 labeling, the loci tested for presence of an eQTL were biologically replicated in each individual carrying the alternative parental alleles for that locus in the population. Raw signal intensity values were transformed (log2) and normalized using the mixed analysis of variance (ANOVA) model, yijk = μ + Ai + Dj + Pk + (A × D)ij + (A × P)ik + (D × P)jk + (A × D × P)ijk + εijk, as suggested previously (Jin et al. 2001; Wolfinger et al. 2001). This model accounts for systematic (experiment-wide) sources of variation associated with array (Ai, d.f. = 90, random effect), dye (Dj, d.f. = 1, fixed effect), and pin effects (Pk, d.f. = 3, fixed effect), and interactions. Residuals, representing normalized values, were used to estimate individual tree genotype effects for each tree and each cDNA in the mixed ANOVA model rilm = μ + Ai + N(A)l(i) + Tm + εilm. In this model, Tm (d.f. = 90, fixed effect) represents the effect of the individual tree, or genotype, on the transcript level of every gene, from which least-squares means estimates were calculated. Array (Ai, d.f. = 90, random effect) was included in the model to control for spot effect (Jin et al. 2001; Wolfinger et al. 2001) and N(A)l(i) (d.f. = 91, random effect) accounts for the spot replication within slides. Significant differences in transcript abundance in the segregating population were inferred from the type 3 tests of fixed effects for genotype, carried out on a gene-by-gene basis in the mixed ANOVA model.
Least-squares means estimates of transcript levels, calculated for each individual and cDNA, and the framework marker data of the F1 hybrid paternal and E. grandis maternal maps (Myburg et al. 2003) were used for separate, genome-wide QTL detection scans using QTL Cartographer (Basten et al. 2003). Composite interval mapping (Zeng 1993, 1994), i.e., model 6 of the Zmapqtl module of QTL Cartographer, was used for QTL detection. Likelihood-ratio (LR) profiles [−2 ln(L0/L1)], representing the ratio of the likelihood of the null hypothesis (L0, no QTL in the marker interval) to the alternative hypothesis (L1, presence of a QTL in the marker interval) were generated for each gene at every 2-cM interval of the parental map. A window size of 10 cM was used, and cofactors were chosen by stepwise regression (SRmapqtl). Epistatic interactions were evaluated using multiple interval mapping (Kao et al. 1999) in a model with a maximum number of three QTL. Empirical LR thresholds were determined by randomly permuting the trait values among marker genotypes and recording the highest LR peak produced in each random data set using the same QTL detection parameters described above (Churchill and Doerge 1994; Doerge and Churchill 1996). The empirical threshold for experiment-wise type I error rates of 0.01, 0.05, and 0.1 was determined by recording the 100th-, 500th-, and 1000th-ranked LR of 10,000 random permutations for 35 cDNAs. The most conservative LR threshold found was used for all the genes.
Transcript abundance variation for 2608 cDNAs was assessed in 91 genotypes from a pseudobackcross population (Figure 1). Each individual from the progeny was technically replicated using two dyes, while the parental alleles were biologically replicated in approximately half of the segregating population. Of the 2608 elements represented in the cDNA microarray, 1373 (53%) identified a differential transcript abundance in the backcross progeny after a Bonferroni correction for 2608 tests (experiment-wise α = 0.05), reflecting the wide segregation of the mapping population. Every gene (cDNA) was evaluated for eQTL, even when there was no significant difference in expression among the backcross progeny, following the suggestions of Brem et al. (2002) and Schadt et al. (2003). Threshold log-likelihood ratios of 11, 13, and 16 were used to obtain an experiment-wise α of 10, 5, and 1%, respectively, on the basis of permutation tests (Churchill and Doerge 1994; Doerge and Churchill 1996). Support intervals were evaluated previously for a set of lignin-related genes and typically extend over 20–40 cM (Kirst et al. 2004). Results reported here use a LR threshold of 11, unless stated otherwise, and are fully described in supplementary Tables 1 and 2 at http://genetics.org/supplemental/).
Number and distribution of eQTL:
A total of 1655 eQTL were detected in both maps for 1067 genes (41%). Of these, 608 had been identified with differentially abundant transcripts in the progeny. EQTL were identified for 811 genes (31%) in the F1 hybrid paternal map (maximum LR = 103) and for 451 genes (17%) in the E. grandis maternal genetic map (maximum LR = 90), using the least-stringent criterium for QTL identification in this study (LR > 11). For 195 genes, eQTL were detected in both maps. At the highest stringency (LR > 16), the number of eQTL identified decreased to 193 (7.4%) and 120 (3.3%) for the F1 hybrid paternal and E. grandis maternal maps, respectively. The maximum LR detected for each of the 2608 genes has a skewed distribution (Figure 2) with a median and upper quartile of 9.3 and 11.8 for the F1 hybrid paternal and 7.8 and 10.0 for the E. grandis maternal mapping set (Figure 2A). As might be expected on the basis of previous QTL studies of this pedigree (Myburg 2001), eQTL were more readily identified in the highly heterogeneous genetic background of the F1 hybrid, relative to the pure species (E. grandis).
A total of 821 genes displayed a simple genetic architecture for transcript-level variation, with a single eQTL being detected in 73% of the cases where they could be identified in the F1 hybrid paternal and in 81% of the cases in the E. grandis maternal map (Figure 2B). For both maps, more complex patterns were detected in 246 genes. For example, four eQTL were detected for a putative serine carboxypeptidase cDNA sequence (CD668599) in the E. grandis map. Three cDNAs displayed five eQTL in the hybrid map, for transcripts encoding an unknown protein (CD668546), a putative protein (CB967788), and a translation elongation factor eEF-1 α-chain (CB967966).
Additive effects, direction, and proportion of the phenotypic variation explained:
The marker data in the F1 hybrid paternal map were recoded so that the genotype at each marker was associated with the presence of the E. globulus (unknown parent) or the absence of E. grandis (G50) alleles (Myburg 2001). Therefore, the direction of QTL effects indicates the effect on transcript abundance associated with the substitution of the E. grandis QTL allele (negative sign) with the E. globulus (positive sign) QTL allele in the E. grandis backcross population. Estimates of transcript-level additive effects reported by QTL Cartographer for the F1 hybrid paternal map ranged from a 1.74-fold change for a phosphatase (CD668452) (negative effect) to a 2.5-fold change for a gene encoding a low-temperature and salt-responsive protein LTI6B (CD669389) (positive effect) (Figure 2C). The eQTL for LTI6B explains 70% of the transcript abundance variation in the progeny, the largest proportion explained for any gene. Analysis of the direction of QTL effects in the E. grandis maternal map is arbitrary for each linkage group and, therefore, no inferences can be made about the grandparent origin of the QTL alleles. For the E. grandis maternal map, the maximum additive effect was 2.4-fold change for a putative protein (CD668860). The eQTL explained 68% of the phenotypic variation in transcript-level variation measured in the progeny.
Homology of eQTL detected in the parental maps:
eQTL were identified in both parental maps for 195 genes (LR > 11). Synteny between the linkage groups from the two parental maps was inferred from intercross AFLP markers shared between the parents of the F1 hybrid and the E. grandis (678.2.1) trees (Myburg et al. 2003) and was used to evaluate the genomic locations of eQTL in both maps. From the 195 genes for which eQTL could be identified in the two parental maps, only 13 had significant eQTL localized in homologous linkage groups and did overlap to a certain extent. This suggests that most eQTL are trans-acting. If we raised the minimum LR of 13, the number of genes with eQTL in both maps was 62, of which only 6 had eQTL in homologous linkage groups. Variation in recombination rates throughout the genome of both parental trees does not allow a definitive comparison of eQTL genomic locations, but suggests that for this subset of colocal-ized eQTL, transcript abundance regulation is associated with the same genomic regions, although they could be regulated by different closely linked loci.
Epistatic interaction and proportion of the phenotypic variation explained:
Transcription is initiated by the interaction of an RNA polymerase with a gene's promoter, normally enhanced by the cooperation of various transcription factors that interact epistatically. Epistasis, evaluated using multiple interval mapping (Kao et al. 1999) and identified for 310 genes in the F1 hybrid paternal map and for 285 genes in the E. grandis maternal map, explained up to 62% (F1 hybrid) and 69% (E. grandis) of the transcript-level variation in the backcross population (supplemental Tables 3 and 4 at http://genetics.org/supplemental/). For 19 genes in the F1 hybrid paternal map and 35 in the E. grandis maternal map, the epistatic interactions explained a higher proportion of the transcript-level variation than the estimated additive effect of the interacting eQTL.
Genomic distribution of eQTL:
The total length of the E. grandis and the F1 hybrid (E. grandis × E. globulus) maps were estimated to be 1335 and 1448 cM, respectively (Myburg 2001). If eQTL were evenly distributed, they would be detected on average every 2–3 cM in both maps. Instead, eQTL were clustered in certain genomic regions of both the F1 hybrid paternal map and the E. grandis map (Figure 3). Although significant segregation distortion was observed in this mapping population previously (Myburg et al. 2003), these regions did not colocalize with an eQTL hotspot. Hotspots containing eQTL for 10 or more genes were identified in 18 genomic regions of the F1 hybrid paternal map, dispersed throughout most linkage groups (LG), with the exception of LG2, LG3, and LG11. A major cluster containing eight hotspots was identified on LG8, and the largest hotspot, with eQTL for 116 genes, was located on LG9. eQTL hotspots were identified at a lower frequency in the E. grandis maternal map, where only five genomic regions located in LG1, LG3, and LG12 contained QTL for transcript levels of 10 genes or more.
Direction of effects in eQTL hotspot:
The general direction of effects in eQTL hotspots was evaluated in the F1 hybrid paternal map. Hotspots containing eQTL with effects predominantly in one direction suggest a genetic locus that affects a large number of genes in the same way and may be associated with differences in wood or growth characteristics between E. grandis and E. globulus. The proportions of genes with positive and negative eQTL effects at each hotspot were contrasted using a chi-square test. Nine hotspots had a significantly higher proportion of eQTL with either positive or negative effect, at a level of 0.01 (Figure 3), of which five were mostly of positive and four of negative effect. Six of the eight eQTL hotspots clustered in LG8 had effects in both directions. Hotspots could be due to gene-rich regions or to single or a few genes regulating transcript levels of a large number of genes (such as general transcription regulators). Many hotspots included a number of genes coding for enzymes of metabolic pathways, suggesting that some loci regulate the pathway as a whole (Kirst et al. 2004).
The quantitative control of expression of 2608 genes was analyzed in an E. grandis backcross progeny, generated by mating an F1 hybrid of E. grandis and E. globulus, to an unrelated E. grandis tree. Transcript levels were estimated for each of 91 genotypes with a technical replication using two dyes (Cy3 and Cy5), while the presence of a QTL was evaluated by partition of the progeny set into two populations, carrying alternative parental alleles for each locus. Therefore, when testing for the presence of a QTL, each allele was biologically replicated in ∼45 individuals from the segregating population. A small progeny population was used in this study (91 individuals), implying that some eQTL may have been missed or that their effect was overestimated (Beavis 1997). The high environmental variation in tree plantations may also lower the power of detection of eQTL. Despite these limitations, it was possible to identify eQTL for >40% of the genes represented in the cDNA microarray in both genetic maps. Distribution of eQTL was not random, as would have been expected by chance. Instead, many eQTL were clustered in specific genomic regions. The high genetic diversity and the contrasting phenotypic characteristics of wood (differentiating xylem) from E. globulus and E. grandis (Myburg 2001) translated into wide segregation of transcript abundance in the xylem of the progeny, and significant differences were identified for more than half of the genes in the cDNA microarray.
For 195 genes, detecting QTL for transcript-level variation in two genetic maps identified genetic loci that regulate quantitative variation in mRNA abundance in both the F1 hybrid (E. grandis × E. globulus) and the E. grandis genetic backgrounds. Because synteny had been previously established (Myburg et al. 2003), it was possible to evaluate whether homologous genomic regions were involved in regulation of expression of the same genes. EQTL for only 13 of 195 genes could be identified in both parental homologs. The proportion of genes with eQTL in homologous linkage groups was similar when a higher likelihood ratio (LR > 13) was considered as a threshold for eQTL detection. Lack of conservation of the genetic architecture of transcript abundance regulation in different genetic backgrounds suggests that many different genetic loci could be involved in modulation of transcription of these genes and that there is a complex and variable network of gene expression control.
Mapping of eQTL can indicate cis-acting or trans-acting regulation. The variation in transcript level can be determined by variation among alleles, that is, some property of the functional gene itself (cis-acting). Alternatively, variation may occur in a factor contributing to transcription rate, specificity, or stability of the transcript (trans-acting). In most cases, these alternatives may be distinguished by genetic or physical mapping of the transcribed gene. In the case of cis-regulation, the genomic location of the eQTL will colocalize with the gene. If variation is trans-regulated, the site controlling the variation may be located elsewhere. The strength of inference in distinguishing these alternatives lies in concluding that the locations of the gene and the eQTL are significantly different. The location of eQTL is rarely precise, and the precision of genetic map or physical location, although typically greater, may also have a significant error of estimation. In such experiments, on the basis of the limits of an eQTL interval, two eQTL may appear to overlap, although this could be due to effects of different genes. However, mapping of eQTL to nonhomologous chromosomes in different genetic backgrounds suggests that variation in the regulation of transcript abundance is predominantly in trans-acting loci. The complete genome sequence of Eucalyptus is not available and therefore only limited inferences can be made about whether the eQTL identified correspond to the physical location of the gene (cis-regulation) or to the genetic loci of its trans-regulator. Some of the genes represented in the cDNA set have been genetically mapped in the E. grandis backcross population and examples of cis- and trans-regulation have been identified (Figure 4). However, if most of the transcript-level regulation occurred in cis-, a high level of homology among the genetic location of eQTL would have been expected. In yeast, only 25% of the genes for which eQTL were identified colocalized to the position of the eQTL (Brem et al. 2002).
The developmental, morphological, and behavioral complexity of higher, multicellular eukaryotes is believed to have arisen as a result of more elaborate regulation of gene expression through a larger set of transcription regulators and more sophisticated DNA and protein regulatory complexes (Levine and Tjian 2003). Therefore, a large proportion of genes in higher plants may display complex patterns of trans-regulation. Furthermore, we were able to detect only those regulatory loci that diverged between the two parental species (heterozygous in the F1 hybrid) or those that are heterozygous in the E. grandis backcross parent.
Interactions among multiple segregating genetic loci also appear to play an important role in the control of variation in transcript abundance in this experimental population, because epistasis was identified for 310 genes in the F1 hybrid paternal map and for 285 genes in the E. grandis maternal map. For many genes, a higher proportion of transcript-level variation was explained by epistatic interactions rather than by the estimated additive effects of the interacting loci. In eukaryotes, upstream transcription factors often bind as homo- or heterodimers. Interaction among multiple genetic loci, representing different transcription regulators, may be important for genetic control of gene expression as many of these regulators act in combination to modulate transcription. Numerous other opportunities occur for the modification of mRNA levels before and after initiation of transcription and many could involve multiple loci interacting epistatically rather than additively.
EQTL identified in this study were in many cases clustered in specific genomic regions, or eQTL hotspots, as observed previously in yeast and mice (Brem et al. 2002; Schadt et al. 2003; Yvert et al. 2003). eQTL hotspots may represent gene-rich regions or the location of a transcription regulator that controls a large set of genes. For certain metabolic or regulatory pathways, mapping of transcript-level variation may identify the genetic loci that control the flux through the pathway. Functional annotation of the genes represented on the cDNA microarray identified functional relationships among genes represented by eQTL in several eQTL hotspots, as observed in yeast (Brem et al. 2002). A large set of genes involved in lignin biosynthesis had colocalized eQTL on LG4 and LG9 (Kirst et al. 2004). The ability to identify QTL clusters or eQTL hotspots related to specific metabolic functions may depend on the stringency of the genetic regulation of genes in specific pathways. The phenylpropanoid pathway and lignin biosynthesis may require stringent control, as some pathway intermediates are toxic to the cell, requiring an immediate response by other pathways to either process or secrete these compounds. Other metabolic and regulatory pathways may not be under such mechanisms of control and the genetic regulation of a network of genes may not be distinguishable in experiments with microarrays. In many cases, major effects of regulation may not be at the level of transcription.
Finally, the high incidence of eQTL hotspots in some locations may be partially biased, because the genes included here were selected to include specific functional categories (related to wood formation). A similar analysis with a microarray comprising cDNAs representing a large sample of functional genes of Eucalyptus could identify hotspots in other genomic regions.
This study provided a genome-wide map of the genetic loci that regulate transcript-level variation for genes involved in biosynthesis of the secondary wall in the differentiating xylem of Eucalyptus. Analysis of the genetic architecture of gene regulation involved in wood formation is also a powerful tool for identification of candidate genes for quantitative traits of commercial and biological interest, when combined with phenotypic data (Schadt et al. 2003). Colocalization of eQTL and QTL for traditional quantitative traits such as wood density and growth suggests candidate genes for further analysis. The concept is similar to the approach of mapping genes of metabolic or regulatory pathways that are potential candidates for regulation of quantitative variation in complex traits. This strategy in some cases has been able to identify genes that colocalize with QTL in forestry, but QTL typically span large genomic regions (20–30 cM) and many genes are tested (Gion et al. 2000; Brown et al. 2003). Identification of colocalized gene expression (eQTL) and trait QTL may be more informative, because only genes transcribed in the tissue of interest and differentially expressed between parental strains are identified. The identification of colocalized gene expression and phenotypic QTL can be supported by the analysis of correlation between transcript level and phenotype for positional candidates (genes whose eQTL overlaps the phenotypic QTL). Transcript levels measured by microarrays could account for genetic sources of variation associated with dominance and epistasis, as well as for nongenetic sources of variation, such as developmental and environmental variation. Therefore, an association between transcript-level variation and phenotypic variation in quantitative traits may be higher than that between genetic variation (genetic markers) and phenotype in QTL analysis. This higher association may be useful in distinguishing a specific gene from other positional candidates that control a quantitative trait.
We thank J. Vogel and S. V. Tingey (DuPont) for kindly providing database access and biological materials (cDNAs) used for the microarrays; J. G. De León (Forestal Oriental S.A.) for providing the plant material; G. Gibson and R. Wolfinger for assistance in establishing the appropriate ANOVA model for the analysis of the microarray data and for comments about the manuscript; B. Sosinski and L. He [Genome Research Laboratory-North Carolina State University (NCSU)] for establishing the microarray facility; C. Stasolla, L. van Zyl, D. Craig, and G. Passador-Gurgel for technical assistance and Rebecca Doerge for useful comments about the manuscript. This work was supported in part by the National Science Foundation (grant no. DBI 9975806), by the NCSU Forest Biotechnology Industry Consortium, and by a fellowship from the NCSU genomics program to M.K.
Communicating editor: S. R. McCouch
- Received December 2, 2004.
- Accepted January 11, 2005.
- Genetics Society of America