Nuclear sequence variation and linkage disequilibrium (LD) were studied in 15 cold-hardiness- and 3 wood quality-related candidate genes in Douglas fir [Pseudotsuga menziesii (Mirb.) Franco]. This set of genes was selected on the basis of its function in other plants and collocation with cold-hardiness-related quantitative trait loci (QTL). The single-nucleotide polymorphism (SNP) discovery panel represented 24 different trees from six regions in Washington and Oregon plus parents of a segregating population used in the QTL study. The frequency of SNPs was one SNP per 46 bp across coding and noncoding regions on average. Haplotype and nucleotide diversities were also moderately high with Hd = 0.827 ± 0.043 and π = 0.00655 ± 0.00082 on average, respectively. The nonsynonymous (replacement) nucleotide substitutions were almost five times less frequent than synonymous ones and substitutions in noncoding regions. LD decayed relatively slowly but steadily within genes. Haploblock analysis was used to define haplotype tag SNPs (htSNPs). These data will help to select SNPs for association mapping, which is already in progress.
STUDIES of nuclear sequence variation and linkage disequilibrium (LD) across populations and genomic regions help to elucidate the evolutionary forces that shape patterns of variability (Nordborg and Innan 2002; Feder and Mitchell-Olds 2003; Luikart et al. 2003). Such studies are of general biological interest and are needed to design association mapping studies that will help us better understand the molecular basis of adaptation in plant populations (Goldstein and Weale 2001; Glazier et al. 2002; Schlötterer 2002; Borevitz and Nordborg 2003). However, with the exception of maize and Arabidopsis, little research has been conducted on LD in plants, although the mating system (selfing vs. outcrossing), population structure (continuous vs. isolated populations), life forms (annual vs. perennial), recombination rate, and other factors can strongly influence LD patterns (Flint-García et al. 2003). Douglas fir [Pseudotsuga menziesii (Mirb.) Franco] is found across a large and environmentally heterogeneous area in western North America and has evolved complex adaptive mechanisms (Campbell and Sugano 1975; Campbell and Sorensen 1978; Steiner 1979; Rehfeldt 1983, 1989; Li and Adams 1993; Aitken and Adams 1997; Anekonda et al. 2000). We are interested in the specific genes and alleles that underlie phenotypic variation in adaptive traits such as growth rate, bud set, bud flush, cold hardiness, and drought tolerance. Wood quality-related genes are also of great interest. Douglas fir is an excellent perennial plant species to use for studying these traits and genetic adaptation. It is evolutionarily old; phenotypically and genetically highly diverse; distributed in large, outcrossed, natural populations with high gene flow; and has relatively little within-population substructure (Merkle and Adams 1987; Moran and Adams 1989; Aagaard et al. 1998a,b; Viard et al. 2001). Douglas fir is also one of the most thoroughly studied trees in the United States and the most economically important tree in the Pacific Northwest.
Frost damage can negatively affect the annual growth of Douglas fir trees, particularly in the spring when new needle tissue is delicate and vulnerable. Fall frosts can damage actively elongating shoots in the autumn and adversely affect growth the following spring. Therefore, fall and spring cold hardiness are important adaptive traits in Douglas fir that show high genetic variation in common garden studies and vary among populations from environmentally diverse locations (reviewed in Wheeler et al. 2005).
Quantitative trait loci (QTL) mapping studies have confirmed these observations and have allowed us to begin dissecting these complex traits (Jermstad et al. 2001a,b, 2003; Wheeler et al. 2005). Several genomic regions responsible for genetic control of growth rhythm and cold-hardiness traits were found, but QTL mapping does not reveal which individual genes are responsible for these effects.
Association mapping is a powerful population genomic approach that unlike QTL mapping can identify individual genes and alleles that are responsible for phenotypic differences in adaptive traits (Neale and Savolainen 2004). However, limited genetic resources and the large genome of Douglas fir prevent a full genome scan. Instead, we plan to carry out a candidate gene-based association mapping using single-nucleotide polymorphisms (SNPs) (Rebbeck et al. 2004). SNPs are excellent markers for association mapping of genes controlling complex traits (e.g., Brookes 1999; Rafalski 2002; Carlson et al. 2004). However, to carry out association mapping it is necessary first to discover SNPs in candidate genes of interest and to study their variation. To achieve our goals we (1) developed a list of candidate genes for adaptive traits on the basis of data available from other plant species, (2) found their homologs or orthologs among Douglas fir genomic and EST sequences, (3) designed single-gene-specific primers to amplify single-gene PCR products, (4) sequenced them, (5) performed SNP discovery, (6) analyzed their diversity and LD, and (7) selected SNPs for association mapping. These steps are described in this article for a set of 18 genes that included late embryogenesis abundant protein genes, dehydrins, and other cold-induced and wood quality-related genes. The studied genes are mostly unlinked and represent a wide variety of protein-coding genes. Therefore, they are likely to reflect general genome variation in Douglas fir.
MATERIALS AND METHODS
Plant material and SNP discovery:
The SNP discovery panel consisted of 32 DNA samples isolated from 24 haploid (1N) seed megagametophytes collected from 24 unrelated trees from six regions in Washington and Oregon and 8 megagametophytes from the parents of a QTL mapping population—four megagametophytes from the maternal parent and four from the paternal parent (Figure 1) (Jermstad et al. 1998; see also supplemental Table 1S at http://www.genetics.org/supplemental/ for details). The eight samples from the mapping parents were used to test the PCR primers. If amplification and sequencing were successful, then the remaining 24 samples from the SNP discovery panel were used to amplify the DNA used for forward and reverse sequencing. The mapping parents segregated for a maximum of 2 alleles each; therefore, the maximum number of alleles studied and used for nucleotide diversity analysis was 28 (24 from the discovery panel and 4 from the mapping parents).
Candidate gene selection:
Using published data on differential expression and physiological mechanisms involved in cold tolerance in plant species, we selected ∼500 genes and proteins that included cold acclimation, cold induced, cold resistant, chaperones, cryoprotectins, calmodulins, some dehydrins, LEA, and other cold-hardiness-related candidate genes and proteins (e.g., Close 1997; Palva and Heino 1998; Thomashow 1998, 1999, 2001; Wanner and Junttila 1999; Seki et al. 2001, 2002; Fowler and Thomashow 2002; Nogueira et al. 2003; Provart et al. 2003; Rabbani et al. 2003; Browse and Lange 2004; Cook et al. 2004). Then, using BLASTX, BLASTN, and TBLASTX tools they were compared with all available Douglas fir sequences submitted to GenBank, including most of the ∼11,700 ESTs obtained recently from Douglas fir seedlings in our laboratory (http://staff.vbi.vt.edu/estap). The highly homologous Douglas fir sequences that matched sequences in our database of cold-resistance-related genes and proteins were used to design PCR primers for sequencing. For this study we preferably selected those genes that were also positional candidates that collocated with cold-hardiness QTL in a previous study (Wheeler et al. 2005). A number of candidate genes were previously mapped by RFLP analysis using cDNAs as hybridization probes (Jermstad et al. 1998). Most of the positional candidates were good expressional and functional candidates. For comparison, we also included three wood quality-related genes that were recently studied in loblolly pine, Pinus taeda (Brown et al. 2004a). The details on the list of 18 candidate genes used in this study are presented in Table 1.
DNA isolation, PCR primer design, amplification, and sequencing:
Genomic DNA was extracted from haploid megagametophytes after seed germination using the DNeasy plant mini kit (QIAGEN, Valencia, CA). The PCR amplification primers were based on the Douglas fir genomic, EST, or contig sequences (http://staff.vbi.vt.edu/estap) that were highly homologous to selected cold-resistance-related candidate genes described in other plants. The PCR primers were designed using the computer program GeneRunner v3.04 (Hastings Software, Hudson, NY; http://www.generunner.com) to amplify products 600–700 bp long. PCR amplifications were performed as described by Krutovsky et al. (2004) (see also supplemental Table 2S at http://www.genetics.org/supplemental/ for details on the primers). To obtain complete or almost complete gene sequences more than one primer pair was designed to amplify overlapping amplicons for the TBE, AT1, and APX genes (Table 1 and supplemental Table 2S). Nucleotide sequences were obtained directly by sequencing haploid PCR products using the ABI PRISM BigDye Primer Cycle sequencing kit v.3.1 (Applied Biosystems, Foster City, CA) and the ABI 3730 DNA analyzer at the College of Agricultural and Environmental Sciences Genomics Facility Center of the University of California (Davis, CA). All samples were sequenced in both directions. Raw sequences were base called by the PHRED program (Ewing and Green 1998; Ewing et al. 1998), assembled using the PHRAP program, and viewed through CONSED (Gordon et al. 1998, 2001). Multiple alleles of the same gene were aligned using the MACE program (B. Gilliland and C. Langley, University of California, Davis, CA). All chromatograms and SNPs were visually checked to exclude any sequencing errors.
Nucleotide diversity analysis:
Haplotypes were directly inferred from sequencing PCR products amplified in haploid megagametophytes from the SNP discovery panel. Multiple sequence alignments were analyzed using the DNA sequence polymorphism (DNASP) software version 4.0 (Rozas et al. 2003). Insertions and deletions (indels) were excluded from all estimates. Haplotype diversity (Hd) was computed using Equation 8.4 in Nei (1987), except that n was used instead of 2n. Nucleotide diversity was estimated by ΘW from the number of polymorphic segregating (S) sites (Watterson 1975, Equation 1.4a, but on a base pair basis; Nei 1987, Equation 10.3) and by π (Nei 1987, Equations 10.5 or 10.6, but on a per gene basis). Heterogeneity of ΘW among loci was assessed by using a likelihood-ratio test in which the probability of the observed number of segregating sites in a sample was calculated under the null hypothesis of a common, genomewide 4Neμ (Pg) and the alternative hypothesis of locus-specific values of 4Neμ (Pl), where an average for 18 genes ΘW was considered as a genomewide estimate. These probabilities were based on all (silent and nonsynonymous) segregating sites and were calculated for each gene by using the computer simulations that are implemented in DNASP. Simulations were based on the coalescent process for a neutral infinite-sites model and assumed a large constant population size (Hudson 1991). Then, the likelihood-ratio test statistic −2 ln(Pl/Pg) was calculated for each gene. Under certain assumptions this statistic is distributed as a χ2 with m − 1 d.f., where m is equal to the number of loci (see also Brown et al. 2004a).
Neutrality test statistics D (Tajima 1989, Equation 38), D*, and F* (Fu and Li 1993, pp. 700 and 702, respectively) were calculated and tested using 10,000 simulations to test the hypothesis that mutations in the gene are selectively neutral (Kimura 1983). If a population sample fits the infinite-sites model, π and ΘW have equal expectations. Tajima (1989) developed the D-test statistic, which is π − ΘW divided by the standard deviation of this difference. The difference between π and ΘW (Tajima's D) reflects the degree of nonequilibrium conditions in the genetic history of the population. The D*-test statistic is based on the differences between the number of singletons (mutations appearing only once among the sequences) and the total number of mutations. The F*-test statistic is based on the differences between the number of singletons and the average number of nucleotide differences between pairs of sequences. Significantly negative values for these statistics are consistent with negative (purifying) selection and can also indicate a recent selective sweep of a linked mutation, whereas significantly positive values are consistent with positive, balancing, or diversifying selection for two or more alleles (Kreitman 2000). To find regions under selection within genes the distributions of the D-, D*-, and F*-statistics were studied along the gene sequences using a sliding window with a window length and step size of 100 and 25 sites, respectively. Coalescence simulations without recombination were used to test deviations of the observed π- and ΘW-estimates from average values and the significance of the D-, D*-, and F*-statistics (Hudson 1991).
The nonsynonymous (dN; amino acid replacing) to synonymous (dS; no amino acid replacing) substitution ratio is a strong indicator of selection (Li 1997). The dN/dS ratio measures the magnitude and direction of selective pressure on a gene sequence, with ratios = 1, <1, and >1 indicating neutral evolution, negative selection, and positive selection, respectively. The average number of potentially nonsynonymous and synonymous substitution sites, estimates of the number of nonsynonymous (dN) and synonymous (dS) substitutions per site, variance and *standard errors, and Z-test  for neutrality (dN = dS) were computed using the molecular evolutionary genetics analysis (MEGA) software version 3.0 (http://www.megasoftware.net; Kumar et al. 2004) and the distance-based modified Nei-Gojobori method (Nei and Gojobori 1986) with the Jukes-Cantor model (Jukes and Cantor 1969) and bootstrap based on 1000 replicates (Nei and Kumar 2000).
Analysis of LD and haploblock structure within genes:
LD descriptive statistics r2 (Hill and Robertson 1968) and D′ (Lewontin 1964) were calculated using TASSEL (http://www.maizegenetics.net/bioinformatics/tasselindex.htm) and DNASP software. When more than two alleles were present at a locus, a weighted average of D′ or r2 was calculated (Farnir et al. 2000). If there were only two alleles at both loci, then a one-sided Fisher's exact test was calculated to determine the significance of LD. If there were more than two alleles, then permutations were used to calculate the proportion of permuted gamete distributions that are less probable than the observed gamete distribution under the null hypothesis of independence (Weir 1996). Only parsimony informative sites were included in the analysis of the LD decay within genes over distance. LD between genes was analyzed using alleles with a frequency of ≥15% for all genes, except the ERD15-like gene, for which alleles were less frequent.
Selection of htSNPs for association mapping:
To select haplotype tag SNPs (htSNPs) for association mapping, within-gene haploblock structure and haplotype coverage were studied using HaploblockFinder (Zhang and Jin 2003; http://cgi.uc.edu/∼kzhang), SNPtagger (Xiayi and Cardon 2003; http://www.well.ox.ac.uk/∼xiayi/haplotype/index.html), and SNPCherryPicker (Harris et al. 2003) software. These programs use different approaches and criteria to select htSNPs, and, therefore, we believe that they complement each other, and their combined use helps us select the consensus set of htSNPs.
Using MEGA a consensus neighbor-joining (NJ) tree (Saitou and Nei 1987) was reconstructed for all 28 Douglas fir samples on the basis of the 1000 Jukes-Cantor pairwise distance matrices (Jukes and Cantor 1969) calculated from the bootstrap-generated multiple-nucleotide alignments for all 18 genes combined. The FST statistic (Weir 1996), which measures the genetic variance among populations divided by the total genetic variance of the entire population, was used to quantify the degree of genetic differentiation between population samples from the six regions included in the SNP discovery panel using the ARLEQUIN ver. 2.0 software (Excoffier et al. 2004; http://lgb.unige.ch/arlequin). The analysis of molecular variance (AMOVA) approach implemented in Arlequin (Excoffier et al. 1992) is essentially similar to other approaches based on analyses of variance of gene frequencies, but it takes into account the number of mutations between haplotypes. The sample differentiation was also tested, using an exact test based on haplotype frequencies (Raymond and Rousset 1995; Goudet et al. 1996). The nearest-neighbor statistic (Snn) that measures how often the “nearest neighbors” (in sequence space) of sequences are from the same locality in geographic space was used to test for population differentiation among six regions and two states, from which samples were collected, as described in Hudson (2000).
Thirty-two haploid DNA samples were sequenced for 18 candidate genes. The average size of a gene sequence was ∼843 bp (Table 1). In total, 15,183 bp of genomic DNA for 18 genes or 441,664 bp considering all samples were sequenced. Indels were found in 12 sequences, with the average number of 2.7 indels per sequence and the average length of 14.8 bp per indel. However, if the TBE gene with numerous large indels is excluded from analysis, the average numbers are 1.9 indels per sequence and 4.2 bp per indel. The average numbers of exons and introns were 1.7 and 0.9 per sequence, respectively, for all 18 partially and completely sequenced genes, or 2.2 and 1.2 per gene for 6 genes that were sequenced completely. The exon and intron sizes varied greatly, with the average lengths of 281 and 246 bp for all genes, respectively, or 292 and 402 bp for 6 genes that were sequenced completely. However, if the TBE gene, which had uncommonly large introns, is excluded from analysis, then the average lengths of exons and introns based on the five completely sequenced genes become 258 and 246 bp, respectively, which are very similar to the values based on all 18 genes.
Four hundred SNPs were found in 18 genes, or 1 SNP for every 46 bp (Table 2). With the exception of 6 trinucleotide SNPs, all segregating sites had only two alternative nucleotides. Almost one-third of SNPs were singletons, and most SNPs (349) were either synonymous or in noncoding regions. Haplotype diversity was very high with Hd = 0.827 ± 0.043 and an average number of 11 different haplotypes per gene (Table 3). The total nucleotide diversities were also relatively high with π = 0.00655 ± 0.00082 and ΘW = 0.00702 ± 0.00269 on average. The estimates of nucleotide diversity, π and ΘW, varied significantly across loci (P < 0.00006 in the heterogeneity test) with values as low as π = 0.00237 in the 4CL2 gene and ΘW = 0.00229 in the formin-like gene and almost six to seven times higher in the LEA-EMB11-like gene, where π = 0.01378 and ΘW = 0.01594. Coalescence simulations also showed that the EF1A, TBE, AT1, and LEA-EMB11-like genes had statistically higher than average values for haplotype number (h) and diversity (Hd), while the formin-like gene had the lowest values for h, Hd, and ΘW. In general, π and ΘW were similar with a tendency for ΘW to be slightly higher than π, apparently as a result of an excess of low-frequency SNPs (supplemental Figure 1S at http://www.genetics.org/supplemental/). Due to this, the neutrality test statistics tended to be negative (Table 3).
A detailed description of the nucleotide diversity in different nucleotide sequence sites and regions is presented in Table 4. The nonsynonymous substitutions were almost five times less frequent than silent substitutions (synonymous substitutions and substitutions in noncoding regions), where π = 0.00210 vs. π = 0.01055 and ΘW = 0.00261 vs. ΘW = 0.01132 for nonsynonymous vs. silent substitutions, respectively.
Thirteen of the 18 genes had negative values of the neutrality test statistics D, D*, and F*, but they were significant only for the F3H1 gene (Table 3). The 4CL1 gene was also possibly under negative selection, but only D* was statistically significant (although the F*-value was almost significant with P = 0.065). Unlike F3H1 and 4CL1, the MT-like gene was possibly under positive selection. The sliding-window analysis revealed statistically significant values of D, D*, and F* in a few regions within the TBE, formin, AT1, and APX genes that were possibly under selection (see, for instance, APX in supplemental Figure 2S at http://www.genetics.org/supplemental/).
The neutrality of sequence polymorphism was also assessed using the ratio of nonsynonymous (dN) to synonymous (dS) nucleotide substitutions. Six genes had a dN/dS ratio significantly <1 (supplemental Table 3S at http://www.genetics.org/supplemental/). Only the 4CL1 gene had dN/dS > 1, but it was not statistically significant.
LD, haploblock structure within genes, and selection of htSNPs for association mapping:
A considerable amount of LD was found within sequences. A total of 4349 pairwise comparisons were estimated for parsimony informative sites among pairs of sites within 18 genes. Almost one-third of them (1316) showed LD statistically significant by a Fisher's exact test, which remained significant for 326 pairs even after Bonferroni correction. Figure 2 shows LD estimates (r2) plotted against the pairwise distances between parsimony SNPs within all 18 genes. The LD declined linearly as distances between sites increased, but a fair amount of LD remained, even for pairs separated by >500 bp. There were a few significant LDs for tightly linked or even for unlinked genes (supplemental Figure 3S at http://www.genetics.org/supplemental/), but none of them remained significant after Bonferroni correction. Depending on the threshold values used to define a block, from 1 up to 58 haploblocks per gene were revealed (supplemental Table 4S at http://www.genetics.org/supplemental/). These thresholds included a minimal LD value, minimal frequency of the SNP allele to be included, minimal chromosome and haplotype coverage, and htSNP coverage. However, using reasonable thresholds, there were approximately ∼2–3 haploblocks per gene (except very long genes such as TBE) that could be genotyped with approximately four to five SNPs per gene on average.
NJ trees revealed no significant clustering or obvious geographic structure among samples (supplemental Figure 4S at http://www.genetics.org/supplemental/). The nucleotide site data gave a low estimate of FST = 0.028 among six regions that was not statistically different from zero on the basis of 1000 permutations (Weir 1996). The exact differentiation test also did not reveal any differentiation (P = 1) between regions. The nearest-neighbor statistic revealed no significant differentiation between populations either in the pairwise tests or globally (Snn = 0.071, P = 0.848 on the basis of 1000 permutations).
This is the first extensive study of nucleotide diversity and LD for candidate genes in Douglas fir. It provides important data on nucleotide diversity and haplotype structure in Douglas fir natural populations, and, together with similar studies in pines, it lays a foundation for association mapping in conifers. An efficient strategy for selecting the most informative SNPs for association mapping was also developed.
Average nucleotide diversity in Douglas fir was higher than that in human and soybean, but lower than that in maize, and similar to that in Drosophila (Table 5). The similarity to Drosophila is not completely surprising given that both Douglas fir and Drosophila have large population sizes and high outcrossing rates. Compared with other conifers, Douglas fir has higher levels of diversity than do loblolly pine (Brown et al. 2004a; Neale and Savolainen 2004; S. C. González-Martínez, E. Ersoz, G. R. Brown, N. C. Wheeler and D. B. Neale, unpublished results), Scots pine (P. sylvestris) (Dvornyk et al. 2002; García-Gil et al. 2003), and sugi (Kado et al. 2003). Potentially, a difference in mutation rates and/or in historic effective population sizes (Ne) between Douglas fir and pines could explain the difference in the level of nucleotide diversity observed in Douglas fir and loblolly pine. Unfortunately, we are unaware of any direct estimates of mutation rate at the nucleotide level in pines vs. Douglas fir. Indirect estimates that are inferred from observed nucleotide differentiation between closely related pine species are based on assumptions of the neutral model as well as on the rough assumptions of divergence time and Ne. These estimates can be highly biased and produce a circular argument. Unfortunately, paleobotanical data are also very incomplete and highly inconclusive. Therefore, there are no unambiguous data that would suggest that Douglas firs have maintained large Ne during the Holocene or Pleistocene, while pines have not. However, more importantly, our study, as well as other conifer studies cited above, revealed manyfold difference in estimates of π and ΘW between different genes, which highlights the problems of comparing variation among species when estimates are based on one or a few loci (e.g., Dvornyk et al. 2002; García-Gil et al. 2003; Ingvarsson 2005). Comparisons among species should be either based on many loci or, ideally, restricted to orthologous loci.
The average π- and ΘW-values were similar in this study. Both π- and ΘW-values estimate the equilibrium neutral parameter Θ = 4Neμ for autosomal loci, a central parameter in population genetic models for the balance between mutation and random genetic drift, where Ne is the effective population size and μ is the neutral mutation rate per nucleotide site. This parameter summarizes the rate at which processes of mutation and random genetic drift generate and maintain variation within a gene, assuming that natural selection has not been operating. Although the number of segregating sites does not represent all the information in the sample, under the neutral infinite-sites model the frequency spectrum of sites is determined by Θ, which in turn is estimated by S. Violations of the assumptions of the infinite-sites model will lead to biases in the estimate of Θ. The similarity of π- and ΘW-values shows that those violations were not significant. Nevertheless, negative values of the neutrality test statistics (Tajima's D, D*, and F*) and dN/dS < 1 in most studied genes suggested that they are mainly under negative selection or reflect a recent population expansion. However, it is difficult, if not impossible, to distinguish between population growth and selection, if only intraspecific polymorphism is studied. The frequency spectrum can be different for different genes, depending on the combined effect of many factors, such as mutation, population size, recombination rate, gene conversion, and selection intensity. Comparison of intraspecific and interspecific polymorphism in orthologous genes between closely related species can help to detect or confirm genes under selection (Hudson et al. 1987; McDonald and Kreitman 1991; Kreitman 2000).
The rate of decay of LD with distance is a critical factor that affects the success of association mapping on the basis of SNPs in candidate genes. If LD affects large regions or genomic blocks, then association with phenotypic traits would be easier to detect, but it would be more difficult to assign it to the particular candidate gene or quantitative trait nucleotide (QTN). If LD decays quickly, then the associations found between a particular SNP and phenotypic trait would be more likely to be causative rather than due to linkage with other unknown genes. LD is a result of the interplay of many factors, such as mutation and recombination rates, mating system, selection, population size, structure, and history. The intragenic recombination that affects LD within genes was estimated in this study, but not presented and discussed here because we believe that the limited sample size was insufficient for its reliable estimation. The estimation of recombination requires that considerably larger segments of contiguous DNA be sequenced, and more data should be collected to fully address this problem. LD varies greatly in different species ranging from 200–1500 bp in maize up to 50–100 kb in Arabidopsis (see Rafalski and Morgante 2004, Table 2, for review). Our data indicate that LD decayed >50% over relatively short segments (from r2 = ∼0.25 to ∼0.10 within 2000 bp, Figure 2). These data confirmed recent studies in loblolly pine and suggest that conifers may have LD at the lower end of the spectrum (Brown et al. 2004a; S. C. González-Martínez, E. Ersoz, G. R. Brown, N. C. Wheeler and D. B. Neale, unpublished results), making these species potentially very amenable for candidate gene vs. genomewide-based studies (Neale and Savolainen 2004). Unlike candidate gene-based association studies the genomewide scans depend more on strong LD over long regions in the genome. However, it should be noted that this study was not specifically designed to address LD in the genome, but rather within genes, and many distal pairwise comparisons are underrepresented because studied sequences were relatively short.
A few significant LDs were found between tightly linked or even between unlinked genes (supplemental Figure 3S at http://www.genetics.org/supplemental/), although none of them remained significant after Bonferroni correction. Nevertheless, these associations could be a sign of either population substructure or strong epistatic interactions between genes. The latter one is especially likely for the EF1 and 60SRPL31a genes, because both are involved in ribosomal biosynthesis, and for the F3H1 and F3H2 genes that are apparently involved in the same metabolic pathway.
Association mapping requires careful selection of SNPs for genotyping. Our data will help us select the most informative and potentially useful htSNPs in 18 candidate genes for association mapping. We developed a complex approach that takes into account all available data to increase the likelihood of detecting associations. The polymorphic SNPs that were discovered in this study in coding regions, which cause nonsynonymous substitutions, mark haploblocks, and are under positive selection, are the best candidates for the association mapping study that is now in progress.
Connecting phenotype with genotype is the fundamental aim of genetics (Botstein and Risch 2003). The candidate gene-based association studies are considered as one of the best approaches to connect complex phenotypes with genotypes (Pflieger et al. 2001; Botstein and Risch 2003; Neale and Savolainen 2004; Rebbeck et al. 2004). This study proved that Douglas fir meets the most important conditions for candidate gene-based association studies such as high phenotypic variation, high SNP polymorphism in candidate genes, and moderate LD. The lack of population subdivision observed in the SNP discovery panel will also facilitate association mapping. However, it is too early to make a conclusion about population structure. The further study of a much larger sample of ∼1300 trees from an association study will provide more data on population structure.
We thank the members of the Douglas fir Genome Project for their participation and contributions (http://dendrome.ucdavis.edu/dfgp/collaborators.html), Nicholas Wheeler (Molecular Tree Breeding Services, Centralia, WA) for providing samples for the SNP discovery panel, Glenn T. Howe (Department of Forest Science, Oregon State University, Corvallis, OR) and Santiago C. González-Martínez (Unit of Forest Genetics, Center of Forest Research, Madrid, Spain) for reviewing the manuscript and helpful recommendations, and Jennifer Manares (University of California, Davis, CA) for technical assistance with PCR and gel electrophoresis. Funding for this project was provided by the Pacific Southwest Research Station, U.S. Department of Agriculture (USDA) Forest Service within the American Forest & Paper Association Agenda 2020 program. Trade names and commercial products or enterprises are mentioned solely for information and no endorsement by the USDA is implied.
↵1 Present address: Department of Forest Science, Texas A&M University, College Station, TX 77843.
Communicating editor: M. Nordborg
- Received April 13, 2005.
- Accepted August 29, 2005.
- Copyright © 2005 by the Genetics Society of America