Abstract
Forest trees are ideally suited to association mapping due to their high levels of diversity and low genomic linkage disequilibrium. Using an association mapping approach, single-nucleotide polymorphism (SNP) markers influencing quantitative variation in wood quality were identified in a natural population of Pinus radiata. Of 149 sites examined, 10 demonstrated significant associations (P < 0.05, q < 0.1) with one or more traits after accounting for population structure and experimentwise error. Without accounting for marker interactions, phenotypic variation attributed to individual SNPs ranged from 2 to 6.5%. Undesirable negative correlations between wood quality and growth were not observed, indicating potential to break negative correlations by selecting for individual SNPs in breeding programs. Markers that yielded significant associations were reexamined in an Australian land race. SNPs from three genes (PAL1, PCBER, and SUSY) yielded significant associations. Importantly, associations with two of these genes validated associations with density previously observed in the discovery population. In both cases, decreased wood density was associated with the minor allele, suggesting that these SNPs may be under weak negative purifying selection for density in the natural populations. These results demonstrate the utility of LD mapping to detect associations, even when the power to detect SNPs with small effect is anticipated to be low.
NUMEROUS traits of agronomic importance are demonstrated to be under genetic control (Keurentjes et al. 2008), and there is considerable interest in characterizing the causative polymorphisms underlying their quantitative variation. In forest trees, quantitative variation in traits such as mechanical and pulping properties of wood, growth, cold hardiness, and drought acclimation are likely to result from allelic variation within multiple genes (Neale and Savolainen 2004; Oraguzie and Wilcox 2007). Because of the commercial importance of radiata pine (Pinus radiata D. Don), we are exploring the molecular basis of variation in its wood properties using an association genetics approach.
Radiata pine wood properties are variable in domesticated populations and exhibit a quantitative mode of inheritance with high heritability, indicating a strong underlying genetic component. High heritabilities have been observed for solid wood traits including density, cellulose microfibril angle (MFA), and modulus of elasticy (MOE) (Baltunis et al. 2007) and for carbohydrate composition, pulp yield, fiber length, and perimeter (Evans et al. 1997; Kibblewhite 1999); and genetic control has been established for several important production traits (Kumar 2004; Dungey et al. 2006; Gapare et al. 2006; Wu et al. 2007).
It is anticipated that variation in wood quality will be dependent on variation in numerous genes involved in xylogenesis. Studies of gene expression in developing xylem have revealed genes involved in spatially and temporally regulated processes such as cambial division, cell differentiation, expansion, and secondary cell wall biosynthesis (Allona et al. 1998; Paux et al. 2005; Pavy et al. 2005; Cato et al. 2006; Li et al. 2009). Studies in model plants indicate several thousand genes are likely to be involved in these processes, which can be grouped into four categories: information storage and processing, cellular processes and signaling, metabolism, and genes with unknown functions (Yuan et al. 2007). Studies in model plants have revealed specific functions for numerous genes in these developmental pathways. To date there have been few functional studies in forest trees revealing genes directly affecting wood quality (MacKay et al. 1997; Tsai et al. 1998; Valerio et al. 2003; Lu et al. 2004; Spokevicius et al. 2007), largely due to inherent features of forest trees including their long generation times, large size, and lack of inbred or pure lines (Oh et al. 2003).
Attempts to identify genes underlying phenotypic variation in forest trees typically focused on quantitative trait loci (QTL) mapping. In radiata pine several QTL associated with wood density, growth, and disease response have been identified (Devey et al. 2004a,b; Matheson et al. 2006). However, poor success in validating QTL markers in populations with different genetic and environmental backgrounds due to unfavorable pedigree linkage disequilibrium (LD) (Neale and Savolainen 2004), has led to a shift in research to population-level association (or LD) mapping. This approach aims to identify gene alleles that directly contribute to phenotypic variation by statistical inference of the cosegregation of genotype and phenotype data in unrelated populations (Neale and Savolainen 2004). Forest trees are ideally suited to association mapping as LD and population structure are frequently low (Neale and Savolainen 2004). Encouraging progress toward identifying molecular polymorphisms linked to phenotypic variation in trees has recently been achieved following this approach. The cinnamoyl CoA reductase gene in Eucalyptus (Thumma et al. 2005) and four cell wall genes in P. taeda (Gonzalez-Martinez et al. 2007) were shown to influence variation in MFA, early wood specific gravity, and late wood proportion. Recently a single-nucleotide polymorphism (SNP) in a cobra-like gene in Eucalyptus nitens was identified that is associated with cellulose content and pulp yield (Thumma et al. 2009). Further functional analysis revealed that this SNP is a regulatory polymorphism and is likely to affect both traits through its influence on allelic expression. This finding demonstrates the utility of populations harboring low LD in the study of causative polymorphisms. Association genetics has also been applied in model plants to identify SNPs influencing major traits in Arabidopsis and maize (Thornsberry et al. 2001; Olsen et al. 2004; Aranzana et al. 2005) and has been used extensively to dissect complex traits in humans where population structure parallels that of many tree species (Pritchard and Przeworski 2001; Burton et al. 2007).
Despite the potential promised from association genetics, several factors have been identified that influence power to detect true associations (Long and Langley 1999; Neale and Savolainen 2004; Gordon and Finch 2005; Newton-Cheh and Hirschhorn 2005). These factors can often be accounted for at the experimental design phase, by considering the extent of genome-wide LD, the candidate genes selected, the effect size, the number of genes affecting the trait, allele frequency, sample size, and rates of genotyping and phenotyping error. The use of natural populations or pedigrees for association studies can introduce confounding genetic structure (Burton et al. 2007; Zhao et al. 2007), which creates false LD between markers and QTL (Neale and Savolainen 2004; Ball 2007). Radiata pine occurs naturally as a regional ensemble of disjoint populations over a small range including three from mainland California and two on Guadalupe and Cedros Islands. Genetic diversity of the mainland populations, used in this study, is at least moderate among conifers (Karhu et al. 2006), and the populations exhibit a low yet significant level of genetic structure (FST ∼ 5%) (Plessas and Strauss 1986; Karhu et al. 2006), which may confound association tests. The aim of this study was to apply an LD mapping approach to identify allelic variants regulating wood formation in the natural populations of radiata pine. Several methods have been developed for the quantification of genetic structure and we have employed two of these to correct for allelic structure arising in populations and pedigrees when testing individual SNP associations (Ritland 1996; Pritchard et al. 2000). Because of the large size, anonymity, and complexity of conifer genomes (12n = 26,000 Mb) (Ahuja and Neale 2005), and the rapid breakdown of LD, we focused on SNPs from 36 cell wall candidate genes and their flanking sequences rather than a genome-wide scan. Where SNP associations were detected in the natural populations, we investigated whether significant associations could be validated in an Australia land race.
MATERIALS AND METHODS
Genetic material:
P. radiata trees in the discovery population were maintained in a 25-year-old provenance-progeny trial at Batlow, New South Wales (35°31′17″S; 148°08′40″E). This trial was established following an incomplete block design, with four trees per family per replicate, from seed collected from the species natural distribution in mainland California. Needles were sampled from one tree per family for 447 families in replicate 1 and included all three mainland provenances: Monterey (210), Año Nuevo (155), and Cambria (82). Needles were also collected from a second-generation progeny trial in Flynn, Victoria (BR9611: 38°15′46″S; 146°41′33″E), which served as a validation population that was previously described by Baltunis et al. (2007) and is composed of families descended from the Año Nuevo and Monterey provenances. This population was part of a 7-year-old trial established following an incomplete block design with four trees per family per replicate. Needles were collected from 458 individuals. This population contained two half-sibs (open pollinated) from each of 229 different families. Diploid DNA was extracted from needles using a modified CTAB protocol (Doyle and Doyle 1990) and further purified on QIAquick PCR purification columns according to the manufacturer's instructions.
Candidate gene selection and SNP genotyping:
Thirty-eight candidate genes chosen on the basis of their known or likely involvement in development of secondary xylem were previously sequenced in a panel of radiata pine individuals (Table 1). This included genes in five functional categories: (1) lignin biosynthesis, (2) cellulose synthesis, (3) cell wall structure, (4) cell expansion, and (5) abiotic stress responses. Prior to genotyping, SNPs were parsed from 72 candidate gene amplicon alignments covering 54,188 bp of DNA sequence (supporting information, Table S1). One hundred forty-nine haplotype-tagging SNPs were genotyped in parallel across the 447 individuals from the discovery population, using universal bead arrays (Illumina) (Shen et al. 2005). Linkage disequilibrium between unphased SNP data was assessed using HelixTree software 6.3.6 (Golden Helix). The overall level of LD between SNPs was low with <1.2% of pairwise correlations between sites (r2) exceeding 0.2 (Table S2). Significant SNP associations identified in the discovery population were regenotyped in 458 individuals from the validation population, using the iPLEX Gold assay (Sequenom).
Distribution of SNP markers across 36 candidate genes sequenced in the Batlow discovery population
Trait data analysis:
Discovery population:
Increment cores were sampled at breast height from the north side of each tree in the discovery population. Cell wall dimensions and wood quality data were collected via X-ray diffraction and point measurement on 1.5 × 0.3-cm wooden strips prepared from 12-mm incremental cores using SilviScan. Nine traits were measured including density, MOE, MFA, fiber wall thickness, fiber wall tangential diameter, fiber wall radial diameter, fiber coarseness, fiber-specific surface area, and cell population.
The first cambial ring does not necessarily correspond to the same calendar year tree to tree, and therefore increment cores were aligned between samples on the basis of cambial ring widths. On average, 20.5 rings were simultaneously partitioned along the core for each trait on the basis of individual density profiles, using Analyze (Maddock 2001). This process yielded ring-by-ring early and late wood estimates for each trait, as well as late wood proportion. Transition age was estimated for each sample using established methods (Gapare et al. 2006). Trait data were subsequently separated into juvenile and mature phases, where juvenile rings accounted for all rings up to and including the predicted transition ring. Significant differences in wood quality traits as well as gene expression have been reported between the juvenile and mature developmental stages (Li et al. 2009), prompting their analysis as separate components in addition to whole-core estimates.
Three principal components describing the uncorrelated variation present among a set of 13 whole-core traits (density, MFA, MOE, fiber wall thickness, fiber wall tangential diameter, fiber wall radial diameter, fiber coarseness, fiber-specific surface area, cell population, minimum density, maximum density, ring width, and cumulating ring width) were defined using StatistiXL 1.7 software and an eigenvalue cutoff of 1. The final trait set used for association analysis in the discovery population consisted of 62 directly measured and synthetic (including principal components) traits, which are listed in Table S3.
Validation population:
Phenotypic data for core length, density, MOE, and MFA, measured using SilviScan on 12-mm increment cores collected at breast height, were taken from the data set of Baltunis et al. (2007). We selected ring-by-ring and whole-core averages for the outer four of seven rings for each trait, corresponding to the juvenile wood. In total 36 traits were used for association tests (Table S3).
Association analysis:
Discovery population:
Associations between 62 traits and 149 SNPs were tested via a least-squares fixed effects general linear model implemented in Tassel (Bradbury et al. 2007). The statistical model is described by y = Xβ + e, where y is a vector for the observed dependent variable (trait), β is a vector containing independent fixed effects, including genetic marker and population structure matrices, X is the known design matrix, and e is the unobserved vector for the random residual (error) (Henderson 1975). Tests for significance were performed on 447 individuals from the discovery population, Año Nuevo (155), Monterey (210), and Cambria (82), as well as separately within the three natural populations. Since significant divergence has been detected between the three provenances in this population (Karhu et al. 2006), a Q-matrix describing individual population assignments was generated in Structure (Pritchard et al. 2000) and incorporated in the model. We utilized the same 149 SNP markers and 447 samples from the discovery population, assuming independence of markers and applying a previously optimized burn-in of 10,000, and 100,000 chains to estimate likelihoods from MCMC analysis. All P-values were adjusted for multiple testing on the basis of 2000 permutations of each trait across 149 SNP markers. Pedigree structure, or kinship, in the provenance-progeny trial was found to be low following analysis of pairwise kinship coefficients among the same 447 trees and 149 SNPS in Spagedi (Hardy and Vekemans 2002). The Loiselle and Ritland estimators (Loiselle et al. 1995; Ritland 1996) yielded similar results with coefficients ≤0 for a majority of analysis. Associations were tested using the Loiselle matrix and mixed linear model implemented in Tassel (y = Xβ + Zu + e, where X and Z are the known design matrices and u is an unknown vector of random additive genetic effects from multiple-background QTL). Consistent with a lack of kinship, the results were >99% similar to those of the general linear model (Table S4) and were subsequently omitted.
Validation population:
A panel of 25 SNPs producing significant associations in the discovery population using the general linear model were genotyped in the validation population. This included all SNPs shortlisted in Table 3 and 15 SNP markers that had P-values <0.05 in the discovery population but had q-values >0.1 (Table S1). Associations with 36 traits were tested using the same methods as in the discovery population. A Q-matrix was included in the model since significant population genetic structure was identified using Structure (Figure S1). Genotype data for four SNPs (2, 133, 17, and 9) could not be analyzed in the validation population: SNP 2, a low-frequency SNP in the Año Nuevo and Monterey provenances, did not segregate in the validation population; and SNPs 9, 17, and 133 failed quality control. The Loiselle and Ritland estimators yielded pairwise kinship coefficients ≤ 0 >50% of the time, and results for the mixed linear model were >96% similar to those of the general linear model (Table S5). Consequently results for the mixed model were omitted.
An alternative statistic based on the false discovery rate, or q-value, which provides an additional measure of significance dependent on the P-value distribution, was applied in both populations (Storey and Tibshirani 2003). Individual association q-values were calculated using Q-value software upon P-values generated using the general linear model. Bin counts between P-values of 0.85 and 1 were set to the mean count observed over the remaining distribution to eliminate bias due to the permutation test, which adjusted many P-values to near 1. Bayes factors were calculated for individual SNP associations, using the Bayesian statistical package ldDesign (Ball 2005). The interpretive scale of Kass and Raftery (1995) was used to grade strength of evidence for each association on the basis of estimated Bayes factors, where Bayes factor categories 0–2, 2–6, 6–10, and >10 corresponded to weak, positive, strong, and very strong evidence, respectively. Bayes factor calculation incorporated priors for allele frequency, P-value, and population size.
RESULTS
SNP genotypes:
The quality of GoldenGate genotype scores for individual SNPs was assessed from their GenTrain cluster and GenCall genotype scores in BeadStudio (Illumina). The average GenTrain score across all SNP loci and individuals was 0.67, but ranged from 0.3 to 0.89 for individual SNP clusters. In the most extreme cases the heterozygote cluster tended to be very diffuse, possibly representing variable probe binding efficiency due to anonymous variations in the SNP flanking region. GenCall scores for individual genotypes fell in a similar range, with an average score of 0.65 over all SNPs and individuals and average minimum and maximum scores of 0.25 and 0.89, respectively. This indicates that individual genotype data for the 149 SNPs assayed were of moderate to high quality (i.e., >0.2) and could be included in further analysis. Assessment of allelic segregation among all populations revealed that ∼73% of SNPs adhered to Hardy–Weinberg expectations when P-values were relaxed (<0.10). In some cases departure from the theoretical expectation may have resulted from a Wahlund effect for structured loci. When markers were examined within provenances, 95% adhered to the Hardy–Weinberg expectations in all three populations.
Wood quality trait analysis:
Whole-core traits were shown to be distributed normally following the one-sample Kolmigrov–Smirnov test (P < 0.05). The coefficient of variation was consistently higher for traits measured on the juvenile wood (Figure S2). In particular, juvenile variances for MOE and MFA were high compared to the mature component. Estimation of Pearson's correlation coefficient confirmed strong negative and positive correlations for many of the whole-core traits (Table S6). In general, fiber dimensions were strongly correlated with density and MOE. While MFA and MOE were strongly positively correlated, MFA was in general only moderately correlated with the other traits. The three principal components identified explained 84% of the total trait variance in the data. Graphical distribution of the casewise principal components analysis (PCA) scores for individuals and component score coefficients for traits are presented in Figure S3. Component loadings indicated the major traits contributing to components identified were as follows: PCA1, density, early wood density, MOE, late wood density, wall thickness, cell population, and specific surface area; PCA2, cell population, specific surface area, radial diameter, wall thickness, and coarseness; and PCA3, MFA.
SNPs affecting wood quality:
Significant associations in the discovery population, with and without adjustment for population genetic structure, are summarized in Table 2 and Table S7. In total 47 associations were found to be significant after correction for population structure and multiple testing, or 0.51% of all tests. The cumulative P-value distribution between 0 and 1 plotted as a density histogram highlights the skew in P-values near zero compared to the distribution expected if all SNPs were null (Figure 1). This resulted in significant q-values (<0.1) for approximately half of these associations. This approach was used to assemble a shortlist of 18 SNP associations with 10 SNPs, from nine genes (Table 3). Bayes factors estimated for shortlisted associations indicated very strong evidence for the observed associations as Bayes factors fell between 10 and 88 in all but four instances. The markers identified controlled between 2 and 6.5% of phenotypic variation (Table 3). Not accounting for colinearity between markers, the cumulative percentage of improvement in phenotype ranged between 7 and 41% or for individual traits between 0.7 and 16.3%. Undesirable negative correlations between shortlisted SNP associations and growth were not observed.
(a) P-value density histograms for associations tested in the discovery population (a) and the validation population (b) under the general linear model, with adjustment for structure and multiple testing (solid line). P-values for the general linear model without adjustment for structure are provided for comparison (shaded line). Modal bins were observed in both populations at 0 < P < 0.05, demonstrating a skew in P-values near zero. The dashed line is the density histogram expected if all SNPs were null.
Summary of association test results for the general linear model in the discovery (Batlow) and validation (Flynn) populations
Shortlisted SNP assocaitions identified in the Batlow association population
Inclusion of the Q-matrix in the general linear model resulted in a 19% decrease in the number of significant associations at P < 0.05 (Table 2, Figure 1), presumably due to adjusted significance of associations where structure was detected. Genetic differentiation (FST) across the three provenances for individual SNPs ranged between 0.01 and 36.86%, with a number of outliers suggested from the FST histogram (Table 3, Figure S4). The average differentiation among SNPs that dropped out after adjusting for structure was higher (10%) than that in the remaining loci (4%) (ANOVA P = 0.0004). Six SNPs (2, 57, 61, 64, 113, and 133) in the former class retained one or more significant associations, two of which were shortlisted in Table 3. It is possible that the Q-matrix was unable to sufficiently adjust for strong allelic structure in these cases. To explore the validity of these associations we examined the outcome of the model when SNPs were tested in isolation from structure (within populations). Despite small population sizes (n = 82–210), associations for most SNPs in Table 3 remained significant (P ∼ 0.05), including all of the “structured” SNP associations, in one or more of the three populations. This suggests that divergent allele frequencies may have arisen due to population-dependent selection for the trait of interest.
Trends in the frequency of associations possibly due to dependence on SNP function were also observed. When grouped according to physical position (i.e., intron, exon, and 5′- and 3′-UTR), the proportion of associations with P < 0.05 was significantly higher for the 3′-UTR and the exon than for other regions (Figure 2). This fits with the general expectation that polymorphisms from functional regions would more frequently afford phenotypic changes detectable in association tests. In addition, silent polymorphisms yielded 40% fewer significant associations than nonsynonymous sites, indicating a strong bias toward SNP associations where an amino acid has been substituted.
Percentage of significant associations (P < 0.05) per gene region and SNP type. Differences in the proportion of significant associations for silent and nonsynonymous sites were significant (P < 0.01) (Z = 2.6) for the binomial test.
SNP validation:
The proportion of significant associations identified in the validation population with and without adjustment for population genetic structure is presented in Table 2. In total 12 associations, with three SNPs, were found to be significant after corrections for population structure and multiple testing were applied, or 1.47% of all tests. This is three times higher than for the discovery population. The cumulative P-value distribution was similarly skewed toward zero (Figure 1), resulting in several associations with significant or near significant q-values (Table 4). Trait means and standard errors categorized by genotype for two of these SNPs are presented in Figure 3, and box-and-whisker plots are presented in Figure S5. For one polymorphism (SNP 60) associations were validated directly with density, meaning that the SNP was significantly associated with this trait type in both populations. The allele effect was conserved between populations, with the TT and CC homozygotes conferring lower and higher trait values, respectively, across the board, with the heterozygote intermediate to these states (Figure 3). Although SNP 60 was not associated with average density (trait 8) in the discovery population, the trend across states was the same for this trait. Reduced power due to sample size and inability to adjust for within-site environmental variation may have affected the sensitivity of this test to detect associations with average density in the discovery population. The frequency of the favorable genotype (GG) for density increased 15-fold in the validation population, suggesting that breeding selection for growth and form may have influenced allele frequency in this population. Association with growth (ring width) was also detected for SNP 60 in the validation population, but the effect was negatively correlated with density.
Mean and standard error for trait values categorized by genotype for two SNPs in the discovery population (left) and the validation population (right). In each case associations with density were detected in both populations. Gene action observed for SNP 60 was additive in all cases. In the case of SNP 64 the gene model was dominant in the validation population; however, the effect of the GG homozygote “flipped” between populations.
Significant associations identified from 23 SNPs retested in the Flynn validation population
In the case of SNP 64, associations were detected in the validation population with density, which was strongly correlated (R = 0.79) with specific surface area (trait 39), the associated trait in the discovery population. Although not significant, the trend across genotypes was the same for average density (trait 8) in the discovery population (Figure 3). The allele effect was only partially conserved between populations: i.e., the effect of the AA homozygote and the AG heterozygote remained constant (conferring lower and higher trait values, respectively), while the GG homozygote conferred lower trait values in the discovery population and trait values that were closer to those of the heterozygote in the validation population. Although it remains to be demonstrated, inconsistency in allele effect seen here may reflect between-site environmental interactions. In the case of SNP 146, associations with MFA were detected in the validation population, but this trait was not associated in the discovery population and was only moderately correlated with density (trait 4).
DISCUSSION
Associations with wood quality:
In the present study, 10 SNP markers were identified in the natural populations of radiata pine that were significantly associated with solid wood phenotypes after accounting for population genetic structure and multiple testing. Of these SNPs, 3 were also associated in a second population, although only 2 of these validated associations detected in the discovery population. In these cases validation provided a mechanism to verify associations and improve confidence in the SNP effect detected. The effects attributed to SNP alleles were small, in line with association studies in other tree species (Thumma et al. 2005, 2009; Gonzalez-Martinez et al. 2007, 2008; Ingvarsson et al. 2008; Eckert et al. 2009), and consistent with their quantitative nature. The number of significant markers identified in the discovery population was high compared to that in other association studies of solid wood traits (Thumma et al. 2005, 2009; Gonzalez-Martinez et al. 2007), which may bring into question those associations where validation failed. With the exception of SNP 60, markers yielding solid wood associations did not significantly affect growth, despite strong negative phenotypic correlations between these traits (Baltunis et al. 2007). This illustrates the utility of analyzing individual genetic components underlying wood variation to break negative correlations with growth that currently limit breeding programs.
Multiple associations (SNP 60 and SNP 61) were detected with phenylalanine ammonia lyase (PAL1), a lignin biosynthetic gene, homologous to ATPAL1 (AT2G37040), and the association with SNP 60 was detected in a second population. PAL1 catalyzes the conversion of phenylalanine to cinnamic acid, and its activity influences lignin content (Nakashima et al. 1997). As a major constituent of the cell wall, lignin content is expected to influence wood density, and this is the first report of a PAL gene modulating this trait. The decrease in juvenile wood density associated with the minor allele of SNP 60 (T allele: 0.14) appears to be additive and suggests a mildly deleterious effect on gene function that may be under weak negative purifying selection in the natural populations. Comparison with nine other Pinaceae species reveals that the “T” allele is not unique to radiata pine. This allele appears in at least two subgenera of Pinaceae (Figure S6), although its absence in spruce suggests that this allele may have been derived among pines. The overall low frequency of the T allele may reflect an evolutionary response to the associated phenotype. Density may be an important indicator of the hydraulic properties of wood that are under selection and has been shown to affect conductivity and drought survival in conifers, and high density may be an important adaptive trait linked to the prevention of vessel cavitation (Martinez-Meier et al. 2008; Dalla-Salda et al. 2009). Both PAL1 polymorphisms occur in exonic sequences and are unlinked, and although SNP 60 lies within the PAL-HAL active domain, it is synonymous and does not interfere with the substrate binding sites (Schwede et al. 1999). Silent SNPs can afford change via other mechanisms; for instance, a SNP in a cis-acting regulatory element is shown to influence allelic expression of COBRA in eucalyptus (Thumma et al. 2009), and further examples are emerging from association studies of human diseases (Veyrieras et al. 2008). Application of this SNP for breeding selection may require a trade-off on growth, since the favorable allele for density was associated with a 14% drop in growth for ring 4; however, associations were not detected for lower growth in the other rings.
The association with intronic SNP 64 from phenylcoumaran benzylic ether reductase (PCBER), a homolog of isoflavone reductase (AT4G39230.1), was detected in both populations and affected wood density and specific surface area. Although not shortlisted in the discovery population, the q-value for this association was low and near significant (q ∼ 0.1). The role of PCBER in wood development is not well established, but has previously been suggested to affect lignification of woody tissues due to its association with phenylpropanoid biosynthesis in poplar (Vander Mijnsbrugge et al. 2000a,b). In P. taeda PCBER is localized to the secondary xylem parenchyma (Kwon et al. 2001), where it presumably catalyzes synthesis of protective secondary metabolites such as lignans. Downregulation of PCBER has been reported to reduce lignin content (Boerjan et al. 2006), providing a possible mechanism for modulation of wood density. Similar to PAL1, selection against lower density associated with the minor allele (A allele: 11.9) in the natural populations could reflect the importance of density to vascular fitness.
Other associations:
The strong association with Rac13 that cannot be tested in the validation population remains of interest. RAC13 is homologous to a small GTP-binding protein (RAC13) that has been associated with the onset of cellulose synthesis in secondary walls of cotton (Delmer et al. 1995; Potikha et al. 1999), Arabidopsis (ATRAC2/ROP7: AT5G45970) (Brembu et al. 2005), and Zinnia (ZeRAC2) (Nakanomyo et al. 2002) and accumulates preferentially in tracheary elements. An association between an intronic RAC13 site (SNP2) and MFA of secondary cell wall cellulose of mature wood fibers was first identified in the discovery population where the minor allele (C allele: 0.072) was linked additively with low MFA (Figure S7), controlling >6% of the trait variation. This SNP was highly structured (FST = 30.3%) and was an outlier compared to the other SNPs in this data set (Figure S4), indicating that diversifying selection may have occurred at this locus. SNP 2 occurs at low frequency in Año Nuevo (C allele: 0.006) and Monterey (C allele: 0.002), but was common in Cambria (C allele: 0.38), and it is not surprising that this SNP does not segregate in the validation population, whose founders were primarily from Año Nuevo and Monterey. Importantly, SNP 2 was in low LD with the other five SNPs typed across this gene (R2 = 0.0002–0.0127), suggesting that selection may have acted locally on the intron where SNP 2 resides. When examined within Cambria, in isolation from structure (FST = 4%), both the association and the effect of the C allele were reproduced (Table 3), controlling up to 9% of variation in MFA. Low MFA is considered advantageous from a commercial forestry perspective as it results in wood with greater strength. Adaptive selection in natural populations may have similarly favored trees with improved wood strength via modulation of MFA at the Cambria site. The mechanism by which RAC13 might alter MFA is not established; however, related RAC proteins have been found to regulate cytoskeletal structure (Zhang et al. 2005), and indeed ATRAC2 has been shown to affect polar cell expansion, providing a possible path for modulation of cellular MFA (Wightman and Turner 2008).
Several of the genes harboring significant associations in this study have recently been associated with wood traits in other conifer species or show evidence of adaptive selection where fiber reinforcement may be a driving factor. Two polymorphic sites within AGP4, a cell wall arabinogalactan protein, were associated with late wood density and cell population in the mature and juvenile wood, respectively, in the discovery population, and this gene has been shown to be under diversifying selection in environmentally contrasted P. pinaster populations (Eveno et al. 2008). A single SNP from LP5 that shows signatures of strong diversifying selection in the natural populations of radiata pine (data not shown) was associated with fiber wall thickness and specific surface area in both the whole-core and mature wood fractions. An LP5-like gene was previously associated with water use efficiency in P. taeda (Gonzalez-Martinez et al. 2008). Both COMT2 and CAD, genes in the lignin biosynthetic pathway, harbored associations in the discovery population with traits including fiber wall thickness, specific surface area, early wood density and MOE, late wood MOE, and density averaged over the whole core. These genes were previously shown to influence similar traits (early and late wood density) in P. taeda (Gonzalez-Martinez et al. 2007).
Factors affecting power:
Wood quality traits are quantitative in nature and are regulated by many genes of small individual effect (Neale and Savolainen 2004). In the present study a small proportion of the total phenotypic variation for each trait type was accounted for. In part this results from incomplete sampling of the candidate gene network in combination with low genomic LD. Power to detect SNPs of small effect across multiple populations is further limited by constraints on trial design. Population sizes of many thousands have been recommended from simulation experiments to be necessary to detect associations in trees (Purcell et al. 2003; Ball 2005; Gordon and Finch 2005), and human studies employing large populations commonly detect SNPs with small effects (<2%). Phenotypic variation arising from diverse gene-by-environment interactions within and between sites was not accounted for in this study, whereas the discovery (Snowy Mountain Range) and validation (coastal valley in Southern Victoria) trials are strongly contrasted with respect to altitude, evapotranspiration, temperature, and soil type (New LocClim V 1.10). Quantitative evidence suggests that significant gene-by-environment interactions are limited for solid wood traits in radiata pine (Baltunis et al. 2007; Wielinga et al. 2009). Although overall rankings may not change for quantitative studies that examine all loci simultaneously, such interactions could affect correlation of phenotype and allele segregation for individual SNPs and might explain the “flip” in effect of the GG homozygote for SNP64. Improved power to detect and validate associations in future experiments could be achieved by (1) ensuring populations are sufficiently large, (2) designing trials such that within-site and between-site environmental interactions can be accounted for, (3) avoiding genetically structured material, and (4) establishing validation populations with families, or clonal material, from the discovery population. Tailoring region-specific suites of markers for breeding selection may be necessary where SNP effects are sensitive to local environment. Markers such as SNP 60 (PAL1) and SNP 64 (PCBER1) are robust and will have greatest utility in breeding programs.
Acknowledgments
We sincerely thank Bala Thumma for reviewing the manuscript, Colin Matheson and Charlie Bell for their insightful discussions, and Philippe Matter for technical support. We also thank the Neale laboratory at University of California, Davis, for kindly providing primers for several candidate genes. This research was funded by the Commonwealth Scientific and Industrial Research Organisation, ArborGen LLC, Forest and Wood Products Australia, and the Southern Tree Breeding Association.
Footnotes
Supporting information is available online at http://www.genetics.org/cgi/content/full/genetics.110.116582/DC1.
Communicating editor: M. Kirst
- Received March 11, 2010.
- Accepted May 12, 2010.
- Copyright © 2010 by the Genetics Society of America