| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Genetics, Vol. 172, 1915-1926, March 2006, Copyright © 2006
doi:10.1534/genetics.105.047126
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

,1
,2
* Department of Plant Sciences, University of California, Davis, California 95616,
Department of Forest Systems and Resources, Forest Research Institute, CIFOR-INIA, 28040 Madrid, Spain,
Weyerhaeuser Company, Weyerhaeuser Technical Center, Tacoma, Washington 98477 and
Institute of Forest Genetics, USDA Forest Service, Davis, California 95616
2 Corresponding author: Institute of Forest Genetics, USDA Forest Service, Department of Plant Sciences, University of California, 1 Shields Ave., Davis, CA 95616.
E-mail: dbneale{at}ucdavis.edu
| ABSTRACT |
|---|
|
|
|---|
sil = 0.00853), varying 100-fold among single genes. The level of within-gene LD was low, with an average pairwise r2 of 0.30, decaying rapidly from
0.50 to
0.20 at 800 bp. No apparent LD among genes was found. A selective sweep may have occurred at the early-response-to-drought-3 (erd3) gene, although population expansion can also explain our results and evidence for selection was not conclusive. One other gene, ccoaomt-1, a methylating enzyme involved in lignification, showed dimorphism (i.e., two highly divergent haplotype lineages at equal frequency), which is commonly associated with the long-term action of balancing selection. Finally, a set of haplotype-tagging SNPs (htSNPs) was selected. Using htSNPs, a reduction of genotyping effort of
3040%, while sampling most common allelic variants, can be gained in our ongoing association studies for drought tolerance in pine.
Genetic association between allelic variants and trait differences on a population scale is a powerful, and relatively recent, approach to identifying genes or alleles that contribute to variation in adaptive traits (LONG and LANGLEY 1999; see NEALE and SAVOLAINEN 2004 for conifers). Population stratification is the most common source of systematic bias in association studies (BUCKLER and THORNSBERRY 2002; HIRSCHHORN and DALY 2005). Putatively neutral molecular markers, such as nuclear microsatellites, are generally used to detect population structure and other population and demographic processes that might produce false positives in association studies (ROSENBERG et al. 2002). Optimization of association mapping requires knowledge of the patterns of nucleotide diversity and linkage disequilibrium for each particular species and candidate gene set. In addition, standard neutrality tests applied to DNA sequences of a single or a few gene(s) can be used in selecting candidate genes or amino acid sites that are putatively under selection for association mapping.
Forest trees play a crucial role in terrestrial ecosystems, offering major ecological benefits in terms of climate control, carbon fixation, and wildlife maintenance. Drought stress is the most common cause of tree mortality and is responsible for severe annual yield losses in commercial species (up to
65% in Pinus taeda L.; BURNS and HONKALA 1990). Understanding the physiological mechanisms and the genetic basis of drought-stress tolerance has been a long-standing interest for plant biologists (e.g., INGRAM and BARTELS 1996; SEKI et al. 2003; see NEWTON et al. 1991 for forest trees). However, progress on identification of drought-related genes and development of expressional studies in forest trees are relatively recent (CHANG et al. 1996; DUBOS and PLOMION 2003; WATKINSON et al. 2003). The molecular basis of dehydration tolerance in trees is extremely complex and a wide variety of expressional candidate genes has been suggested. Increased expression of dehydrins has been found in different conifer trees during both seed development (JARVIS et al. 1996) and drought stress (RICHARD et al. 2000; WATKINSON et al. 2003). CHANG et al. (1996), using a subtractive hybridization approach, identified four cDNA clones with drought-induced expression in P. taeda: lp2, with a high homology to S-adenosylmethionine synthetase (sams), an intermediate in the synthesis of ethylene; lp3, expressed predominantly in roots and later found to belong to a small family of ABA-inducible genes (PADMANABHAN et al. 1997); lp4, similar to a type I copper-containing glycoprotein; and lp5, expressed almost exclusively in roots and coding for a glycine-rich protein similar to cell wall proteins. Other major expressional candidate genes for drought-stress response identified in trees encode protein kinases (DUBOS and PLOMION 2003; DUBOS et al. 2003), cysteine proteases (TRANBARGER and MISRA 1996), iron storage proteins (LI et al. 1998), antioxidants (LI et al. 1998; KARPINSKA et al. 2001), and pathogenesis-related proteins (DUBOS and PLOMION 2001; DUBOS et al. 2003).
Conifers are long-lived, widely distributed organisms that, in general, exhibit high levels of heterozygosity and large effective population sizes. Therefore, it has been suggested that conifers may show high levels of nucleotide variation (DVORNYK et al. 2002). However, the first results on DNA sequence variation for conifers showed, at best, moderate estimates of nucleotide diversity (e.g., KADO et al. 2003; BROWN et al. 2004; POT et al. 2005). Average population differentiation was also moderate in conifers (KADO et al. 2003; but see POT et al. 2005 for korrigan and pp1 genes), even when extreme phenotypes were sampled (GARCÍA-GIL et al. 2003). For example, GARCÍA-GIL et al. (2003) did not find any functional differentiation at the photosensory domains of two phytochrome loci among populations sampled along a latitudinal cline that was associated with marked differences in growth phenology (as shown by common garden experiments). Patterns of nucleotide diversity and/or population differentiation that deviate from the neutral expectation, potentially indicating the action of natural selection, have been described only for a few genes and tree species [acl5 in Cryptomeria japonica (L. f.) D. Don (KADO et al. 2003); f3h1, 4cl1, and mt-like in Pseudotsuga menziesii (Mirb.) Franco (KRUTOVSKY and NEALE 2005); and pp1, korrigan, and CesA3 in pines (POT et al. 2005)]. Large effective population sizes in conifers would result in low linkage disequilibrium (LD) due to high recombination rates at the population level. This prediction agrees with empirical data in conifers, where lack of LD among genes and relatively rapid decay of LD within genes (2001500 bp) have been observed (BROWN et al. 2004; RAFALSKI and MORGANTE 2004). However, it is possible but currently unknown if more extensive LD exists in particular tree species or populations that experienced historical bottlenecks in Pleistocene glacial refugia, both in Europe and in America.
The standing variation in natural populations is patterned as a consequence of the interplay among genetic drift, demography, population structure, and natural selection. In this article, we used a data set of 21 nuclear microsatellites for detecting population structure and demographic processes that might cause spurious associations in association studies and bias neutrality tests, and sequenced all or portions of 18 candidate genes for drought-stress response in P. taeda, an important tree crop. Our sample covered the southeastern native range of P. taeda, including Florida, a putative Pleistocene glacial refugium of this species (SCHMIDTLING et al. 1999; AL-RABAB'AH and WILLIAMS 2002), which was not extensively sampled in our previous studies (see BROWN et al. 2004). We have used DNA sequences to estimate levels of nucleotide diversity and linkage disequilibrium, to identify candidate genes under selection (by means of neutrality tests), and to select haplotype-tagging single-nucleotide polymorphisms (htSNPs) for our current genetic association studies.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Candidate gene selection:
Candidate genes for drought-stress response were selected on the basis of (1) homology of contig assemblies of P. taeda expressed sequence tags (ESTs) in public databases (DDBJ/EMBL/GenBank) with drought-stress response genes in model species; (2) homology of sequences from the unigene set (
20,500 nonredundant genes) assembled at North Carolina State University on the basis of six xylem EST libraries (accessed through http://pinetree.ccgb.umn.edu/) with drought-stress response genes in model species; and (3) the overabundance of ESTs in root libraries from P. taeda trees under drought stress compared to control trees as indicated by "electronic" Northerns using the MAGIC Gene Discovery tool (University of Georgia, http://fungen.org/Projects/Pine/Pine.htm). Two other genes, ppap12 and lp3-3, were also selected because they showed differential expression under drought treatments as shown by reverse Northerns in P. pinaster (DUBOS et al. 2003) and P. taeda (PADMANABHAN et al. 1997), respectively.
DNA isolation, amplification, and sequencing:
Haploid genomic DNA was extracted from megagametophytes, using the Plant DNeasy kit (QIAGEN, Valencia, CA) after seed germination. PCR primers were designed to amplify a 400- to 1000-bp fragment in nine nuclear loci and previously published primers were used for an additional nine genes (see supplemental Table S2 at http://www.genetics.org/supplemental/). Primers were designed to amplify full-length genes for lp3-3, dhn-1, and dhn-2. Sequence data were obtained directly from PCR products on an ABI 377 automated sequencer, using the BigDye Terminator v. 3.1 cycle sequencing kit (Applied Biosystems, Foster City, CA). All samples were sequenced from both ends. Base calling and assembly of forward and reverse reads were done using phred and phrap programs (EWING et al. 1998; GORDON et al. 1998; http://bozeman.mbt.washington.edu/phredphrapconsed.html) under a Unix environment. Multiple alleles from a locus were aligned in the multiple-alignment consed extensions (MACE) program (B. Gilliland and C. Langley, University of California, Davis, CA). All chromatograms were checked visually and a putative sequence variant was accepted only when the phred scores for all sequences exceeded 25 at that site. Resequencing was performed as needed to maintain this quality criterion. Since the DNA samples were haploid, the identification of haplotypes (i.e., alleles) was unambiguous.
Mapping of candidate genes:
Six of the 18 candidate genes were mapped previously (BROWN et al. 2003). Mapping of the remaining 12 loci was attempted using two reference mapping populations of P. taeda, the qtl and base pedigrees (details in BROWN et al. 2001). Five candidate genes (lp3-1, dhn-1, rd21A-like, cpk3, and ppap12) were mapped using denaturing gradient gels (DGGE) according to TEMESGEN et al. (2001) and 1 (lp3-3) was mapped using a template-directed dye-termination incorporation assay (TDI) with fluorescence polarization (FP) detection (TDI 5'3' primer: TTGCCAGTAGCATACACATCTG). FPTDI was done using the AcycloPrime-FP SNP detection kit and a Wallac VICTOR2 fluorescence plate reader (Perkin-Elmer Life and Analytical Sciences, Torrance, CA). The other 6 candidate genes either were unlinked (sod-chl) or lacked suitable polymorphisms (i.e., parents of the pedigrees did not segregate for any SNP or primers for FPTDI could not be designed due to the existence of repetitive regions near SNPs: ferritin, erd-3, dhn-2, lp5-like, and ug-2_498). A consensus map was obtained together with other markers following BROWN et al. (2001).
Population structure and demographic processes:
Population stratification is the most common systematic bias producing false-positive associations in association studies (MARCHINI et al. 2004; HIRSCHHORN and DALY 2005). Moreover, the existence of population genetic structure or demographic processes, such as range expansions or retreats, might produce signatures on the allele frequency spectrum similar to those produced by the action of natural selection and mislead the interpretation of standard neutrality tests, such as Tajima's D. We used 21 highly polymorphic (average of 15 alleles per locus) nuclear microsatellites (nuSSRs), covering most P. taeda linkage groups, to test for population structure or demographic processes. The nuSSR data were kindly provided by C. Dana Nelson (Southern Institute of Forest Genetics, U.S. Department of Agriculture) and included 94 trees sampled from roughly the same range as the sequence data presented here (see supplemental Table S3 at http://www.genetics.org/supplemental/).
To test for population structure, we first used a model-based clustering algorithm (Structure software; PRITCHARD et al. 2000; ROSENBERG et al. 2002), which constructs groups of populations without any prior geographical information. Models with a putative number of clusters (K parameter) from one to four, noncorrelated allele frequencies, and both burn-in, to minimize the effect of the starting configuration, and run-length periods of 106 were run. Second, we computed genetic differentiation estimates (F-statistics, based on a nested ANOVA following WEIR and COCKERHAM 1984) among the three geographical regions included in the sample (Gulf Coast, Northeast, and Southeast). Both a permutation test (10,000 permutations) and a jackknifed estimator over loci were used to test for significance of population genetic structure among regions.
To test for genomewide departures from neutrality, such as those produced by demographic processes, the EwensWatterson test of neutrality (WATTERSON 1978, 1986), with probabilities calculated on the basis of both homozygosity and Fisher's exact tests (EwensWattersonSlatkin's exact test; SLATKIN 1994, 1996), was performed using the program Arlequin v. 2000 (SCHNEIDER et al. 2000). The EwensWatterson test enables the detection of deviations from the neutral model as either a deficit or an excess of homozygosity relative to the neutral equilibrium expectation, given the number of alleles found at a locus. It should be noted that homozygosity excess is a typical genomewide signature of population expansion (PAYSEUR et al. 2002; LUIKART et al. 2003). Once the test was computed for each of the 21 nuSSR loci, a MannWhitney U-test was used to detect whether expected and observed homozygosity values were drawn from the same distribution. The Bonferroni correction for multiple testing was applied when necessary.
Nucleotide variation and neutrality tests:
Analyses of sequence data were performed using DnaSP v. 4.0 (ROZAS et al. 2003). Nucleotide diversity was estimated by Watterson's
w (WATTERSON 1975) and
, the average number of pairwise nucleotide differences among sequences in a sample (NEI and LI 1979). Heterogeneity of sequence variation across loci was assessed using coalescence simulations without recombination. A number of statistical analyses were conducted to identify genes or amino acid sites departing from the standard neutral model of evolution. TAJIMA's (1989) D-statistic was computed for each locus for both the full sequence and a sliding window (window length and step size of 100 and 25 sites, respectively). Tajima's D-statistic reflects the difference between
and
w. At mutationdrift equilibrium, the expected value of D is close to zero. The Fs-test statistic for neutrality (FU 1997), based on the haplotype (gene) frequency distribution conditional on the value of
(estimated by
), was also calculated. Both Tajima's D- and Fu's Fs-test statistics can also reflect demographic changes (FU 1997; SANO and TACHIDA 2005). To compute tests that required data from an outgroup, putative orthologs of 14 genes were obtained from P. pinaster, a European species that might have diverged from P. taeda
120 million years ago (KRUPKIN et al. 1996). For 8 genes, we used sequences from GenBank (accession nos.: AL751338, lp3-1; BX255067, dhn-1; BX677401, lp5-like; BX252032, sod-chl; BX681838, sams-2; AY641535, pal-1; CR393126, ccoaomt-1; and AJ309112, ppap12) and, for the other six, we used sequences obtained directly from P. pinaster megagametophyte DNA using the same primer pairs for sequencing as in P. taeda (genes dhn-2, rd21A-like, pp2c, Aqua-MIP, erd-3, and ug-2_498; A. SOTO and M. T. CERVERA, unpublished data). Then, we computed: (1) Fay and Wu's H-test (FAY and WU 2000), on the basis of the relative excess of high-frequency-derived alleles expected immediately after a selective sweep; (2) the HudsonKreitmanAguadé (HKA) test (HUDSON et al. 1987), which tests for decoupling between polymorphism and divergence in a particular region; and (3) the McDonaldKreitman (MK) test (MCDONALD and KREITMAN 1991), on the basis of the comparison of synonymous and nonsynonymous substitutions within and between species. HKA tests were done comparing each gene against every other one. Finally, to detect positive selection at single amino acid sites, we estimated the rates of nonsynonymous and synonymous changes at each site in a sequence alignment using likelihood-based methods as implemented in the on-line DataMonkey package (KOSAKOVSKY-POND and FROST 2005a,b). For these analyses, we used both a conservative single-likelihood ancestor-counting (SLAC) method, related to that of SuzukiGojobori (SUZUKI and GOJOBORI 1999), and a fixed-effects likelihood (FEL) method, which directly estimates nonsynonymous and synonymous substitution rates at each site and is more adequate for data sets with a moderate number of sequences (n = 2040; KOSAKOVSKY-POND and FROST 2005a).
LD, haplotype diversity, and selection of htSNPs for association mapping:
The LD descriptive statistic r2 (HILL and ROBERTSON 1968) was calculated, only on the basis of informative sites (frequency of
= 0.063), using Tassel software (http://www.maizegenetics.net/index.php?page=bioinformatics/tassel/index.html). The r2 statistic summarizes both recombination and mutation history and it is less sensitive to sample size than other common LD statistics such as D' (FLINT-GARCÍA et al. 2003). Statistical significance of r2 was computed with a one-tailed Fisher's exact test and applying Bonferroni corrections for multiple testing. The decay of linkage disequilibrium with physical distance was estimated using nonlinear regression of LD between polymorphic sites, as estimated by r2, and the distance, in base pairs, between sites (REMINGTON et al. 2001; INGVARSSON 2005). To adjust the nonlinear function, we used the r2 expectation provided by HILL and WEIR (1988) for driftrecombination equilibrium with a low level of mutation and an adjustment for sample size n,
![]() | (1) |
| RESULTS |
|---|
|
|
|---|
324 kb (32 x 10,116 bp) of DNA sequence data (Table 1). Approximately 60% of the sequence data were obtained from coding regions. We found insertion/deletions (indels) in 13 genes, ranging from 1 to 67 bp (average of
8 bp). Five genes (dhn-1, dhn-2, lp5-like, rd21A-like, and pp2c) had indels within the coding region, including a 30-bp indel in dhn-1. The lengths of indels within coding regions were multiples of 3 bp, so they did not result in a shift of reading frame. Finally, highly variable TA microsatellite regions were observed in lp3-1 and ug-2_498 DNA sequences. Indels and microsatellite regions were excluded in further analyses.
|
Nucleotide variation and neutrality tests:
In total, we found 196 segregating sites, corresponding to 1 SNP per 50 bp (Table 2 and supplemental Table S4 at http://www.genetics.org/supplemental/). Two genes (rd21A-like and ccoaomt-1) had triallelic variants and the least frequent allele was recoded as missing data for further analyses. Of the 196 segregating sites, 37 (
20%) were nonsynonymous substitutions. Average nucleotide diversity at silent sites,
sil, was 0.00853, fivefold the diversity found at nonsynonymous sites (
a = 0.00166). Nucleotide variation was slightly higher at synonymous sites than in noncoding regions (
syn = 0.00909 and
noncoding = 0.00631; see supplemental Table S4), but these differences were not statistically significant. Average frequency of the less common nucleotide variant was similar at silent and nonsynonymous sites (17.16 and 13.58%, respectively) and frequency distributions for silent and nonsynonymous sites were not significantly different (P = 0.7145, KolmogorovSmirnov test). Coalescence simulations (implemented in DnaSP v. 4.0) showed lower values of
tot than the average for lp3-3, ferritin, pp2c, and erd3 (Table 2). Nucleotide variation, all sites considered, was higher than the average for only one gene, ccoaomt-1 (0.01179).
|
|
|
LD, haplotype diversity, and selection of htSNPs for association mapping:
Linkage disequilibrium within the sequenced gene regions varied, depending on the candidate gene locus, from very low (e.g., lp3-3, aqua-MIP, ferritin) to high (e.g., ppap-12, ccoaomt-1). We did not find any evidence of tight LD among sites from different genes, not even for those that putatively reside on the same chromosome (see, for instance, Figure 2, linkage group 8; similar results in other linkage groups are not shown). Decay of LD within genes was rapid (Figure 3). A nonlinear fitting of the squared correlation of allele frequencies (r2) as a function of distance between sites showed expected values of
0.20 at 800 bp. In a sample of 32 sequences, we found from 2 (pp2c; He = 0.06) to 14 (lp3-1; He = 0.91) haplotypes per candidate gene locus, with an average of 7.5 (He = 0.68). Selection of htSNPs based on construction of LD blocks was relatively successful, considering the low level of LD found within most genes (10 of 16 genes had average pairwise r2
0.20; Table 4). We found from 1 (ccoaomt-1) to 14 (rd21A-like) and 0 (lp3-3, aqua-MIP, ferritin, and pal-1) to 8 (rd21A-like) LD blocks for MAFs of 0.05 and 0.15, respectively. For common SNPs (MAF > 5%), we identified 94 htSNPs (of 139 available), resulting in a reduction in genotyping effort of 32.27%. The reduction of genotyping effort was increased to 39.74% (47 htSNPs of 78 available) when only frequent (MAF > 15%) SNPs were considered.
|
|
|
| DISCUSSION |
|---|
|
|
|---|
3040%, 50100 SNPs being enough to represent common allelic variants in the sequenced candidate gene loci.
The average level of variation (
sil = 0.00853) found in candidate genes for drought-stress response in P. taeda was similar to the one in wood- and disease-related candidates in this species (see review in NEALE and SAVOLAINEN 2004). Levels of silent variation in pal-1 for P. taeda (this study) and P. sylvestris L. (DVORNYK et al. 2002) were also similar (
sil
0.00490) and at the lower range of those of the genes studied here. Most standing variation in forest trees is normally found within populations (see, for instance, HAMRICK et al. 1992). The extensive sampling of Florida, which is considered a putative Pleistocene glacial refugium of the species (SCHMIDTLING et al. 1999; AL-RABAB'AH and WILLIAMS 2002), resulted in only slightly higher nucleotide variation estimates than those in previous studies of the species [average of 0.00604 vs. 0.00580, based on five gene fragments from our study, ccoaomt-1, pal-1, sams-2, ug_2-498, and lp3-1, that we also sequenced in BROWN et al.'s (2004) set of samples], the difference not being significant (P = 0.281) as shown by a pairwise signed rank test (n = 5). Bottlenecks, as those that might have occurred in forest trees during Pleistocene range shifts, can generate substantial LD due to a reduction in population size with accompanying genetic drift (FLINT-GARCÍA et al. 2003; RAFALSKI and MORGANTE 2004). Levels of LD in this study were lower than those found in BROWN et al. (2004) (see Figure 3), which might reflect more stable population dynamics in the putative glacial refugium of Florida. Compared also with BROWN et al. (2004), we found a larger range in nucleotide diversity in our study, where maximum per gene silent diversity (0.02052; lp5-like) was 100-fold the minimum estimate (0.00022; pp2c). The nucleotide diversity found in pine, compared with that in other plants, was moderate (see supplemental Table S5 at http://www.genetics.org/supplemental/), which, as first noted by DVORNYK et al. (2002), does not meet predictions based on their life history or other studies based on molecular markers, such as allozymes or RAPDs (HAMRICK et al. 1992; NYBOM and BARTISH 2000). Indeed, pines are highly outcrossing organisms showing generally large effective population sizes and higher heterozygosity than other plants (expected heterozygosity of 0.1630.193 in P. taeda based on 18 allozymes; SCHMIDTLING et al. 1999). It is striking, then, that average nucleotide variation in P. taeda (and other pines; see, for instance, POT et al. 2005) was consistently lower than that in Arabidopsis thaliana, the model selfing species. Estimates based on divergence time from related species showed mutation rates in pines (
0.51.5 x 1010/year; DVORNYK et al. 2002; BROWN et al. 2004) two orders of magnitude lower than those in angiosperms, including Arabidopsis (DVORNYK et al. 2002 and references therein). A lower overall rate of sequence evolution might explain the increasing evidence of low to moderate nucleotide diversity in pines.
A number of neutrality tests were conducted to identify genes or sites departing from standard neutral patterns. A selective sweep might have occurred at the early-response-to-drought-3 (erd3) gene, which had reduced nucleotide variation, as shown by pairwise HKA tests, and an excess of less frequent variants. This polymorphism pattern can result from genetic hitchhiking (BRAVERMAN et al. 1995; see OLSEN et al. 2002 for an example in plants). However, Fay and Wu's H-test did not find any excess of derived variants at high frequency for this gene (Fay and Wu's H = 0.363, P = 0.4140), which is a unique pattern produced by genetic hitchhiking (FAY and WU 2000). The observed site frequency spectrum might also have resulted from population expansion. Despite the lack of evidence of population expansion shown by our nuSSR survey, a relatively recent population expansion for the southern pines (note that pollen morphology among species of southern pines, including P. taeda, is indistinguishable) within the study range is supported by palynological data showing a steady increase of pine presence beginning 7000 years before present (WATTS and HANSEN 1994). Because the survival of P. taeda seedlings is strongly limited by the average annual minimum temperature (SCHMIDTLING 2001), range expansions and retreats in response to changing climatic conditions are expected in this species. Further evidence of population expansion in P. taeda from the southeastern United States comes from the skewed Tajima's D distribution (
70% of genes giving negative estimates of D) of the
50 genes currently sequenced in our laboratory (our unpublished data). A skew of the Tajima's D distribution toward negative values is a typical genomewide signature of population growth (SANO and TACHIDA 2005 and references therein).
One other gene, Caffeoyl-CoA-O-methyltransferase (ccoaomt-1), a methylating enzyme involved in lignification, had an excess of intermediate variants (significant positive Tajima's D), fewer haplotypes than expected (significant positive Fu's Fs), and high within-gene LD (average pairwise r2 of 0.90), resulting in a polymorphism pattern characterized by the existence of two distinctly major haplotype lineages at similar frequencies (named dimorphism; see Figure 1a). This gene also showed higher variation than the average in silent sites but lower variation in nonsynonymous sites (
sil of 0.01911 and null
a vs. averages of 0.00853 and 0.00166, respectively). The two haplotype lineages did not show any geographical pattern, both lineages being present in all the major biogeographical zones of the P. taeda range (see Figure 1b). All 13 polymorphic sites found in the sequenced fragment (see Table 1) were silent mutations and, consequently, we were not able to compute the MK test for this gene or identify a replacement polymorphism causing the singular haplotype structure found in ccoaomt-1. Pairwise HKA tests, which consider variable mutation rates across the genome, did not show an excess of polymorphism relative to the other loci, used here as reference. In a scenario of no population structure and population expansion, demography and population factors do not provide any satisfactory explanation for dimorphism in this gene. Dimorphism has often been considered as the outcome of the long-term action of balancing selection in different genes and species [PgiC in Leavenworthia species (FILATOV and CHARLESWORTH 1999); RPS5 and Rpm1 resistance genes in Arabidopsis (STAHL et al. 1999; TIAN et al. 2002)]. However, this pattern is also compatible with a constant-size neutral model with no recombination (see AGUADÉ 2001 for FAH1 and F3H in Arabidopsis) and evidence of natural selection acting in ccoaomt-1 remains inconclusive. Full-length sequencing of this gene, including the promoter region, is advisable. OLSEN et al. (2002) found two promoter haplogroups, weakly associated with flower developmental traits, in the TFL1 gene of Arabidopsis that appear to be maintained by selection. Further evidence of natural selection for this gene might also come from our ongoing association studies where
900 P. taeda clones will be used to test ccoaomt-1 haplotype differences in performance for adaptive traits related to growth, drought-stress response, and resistance to fungal disease.
In conifers, a candidate-gene-based strategy for association mapping is favored. Genomewide scans are implausible for conifers because of the number of SNPs needed to cover the large genome and because of the general lack of intergenic LD (NEALE and SAVOLAINEN 2004). Our results are relevant to define SNP genotyping strategies for our ongoing association mapping of drought-stress tolerance candidate genes in pines. Genes or portions of genes showing departure from the standard neutral model will be given priority, in particular ccoaomt-1, where a balanced polymorphism might have caused dimorphism at nearby linked regions. In total, we identified 196 polymorphisms, including 139 common SNPs (i.e., SNPs with minor allele frequency >5%) suitable for association mapping, in 18 candidate gene loci for drought-stress response in P. taeda. Pine genes might be structured in short blocks within which common variants are in strong LD but among which recombination has left little LD. Then, genotyping strategies based on htSNPs would produce only moderate reductions in genotyping effort. Depending on the minor allele frequency chosen, we found that genotyping of 50100 SNPs would suffice to represent common allelic variants, resulting in reductions of genotyping effort of
3040% in P. taeda association studies.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
| FOOTNOTES |
|---|
1 Present address: Molecular Tree Breeding Services, Centralia, WA 98531. ![]()
| LITERATURE CITED |
|---|
|
|
|---|
AGUADÉ, M., 2001 Nucleotide sequence variation at two genes of the phenylpropanoid pathway, the FAH1 and F3H genes, in Arabidopsis. Mol. Biol. Evol. 18: 19.
AL-RABAB'AH, M., and C. G. WILLIAMS, 2002 Population dynamics of Pinus taeda L. based on nuclear microsatellites. For. Ecol. Manage. 163: 263271.[CrossRef]
BRAVERMAN, J. M., R. R. HUDSON, N. L. KAPLAN, C. H. LANGLEY and W. STEPHAN, 1995 The hitchhiking effect on the site frequency spectrum of DNA polymorphisms. Genetics 140: 783796.[Abstract]
BROWN, G. R., E. E. KADEL, III, D. L. BASSONI, K. L. KIEHNE, B. TEMESGEN et al., 2001 Anchored reference loci in loblolly pine (Pinus taeda L.) for integrating pine genomics. Genetics 159: 799809.
BROWN, G. R., D. L. BASSONI, G. P. GILL, J. R. FONTANA, N. C. WHEELER et al., 2003 Identification of quantitative trait loci influencing wood property traits in loblolly pine (Pinus taeda L.). III. QTL verification and candidate gene mapping. Genetics 164: 15371546.
BROWN, G. R., G. P. GILL, R. J. KUNTZ, C. H. LANGLEY and D. B. NEALE, 2004 Nucleotide variation and linkage disequilibrium in loblolly pine. Proc. Natl. Acad. Sci. USA 101: 1525515260.
BUCKLER, IV, E. S., and J. M. THORNSBERRY, 2002 Plant molecular diversity and applications to genomics. Curr. Opin. Plant Biol. 5: 107111.[CrossRef][Medline]
BURNS, R. M., and B. H. HONKALA, 1990 Silvics of North America: 1. Conifers. 2. Hardwoods. Agriculture Handbook 654. U.S. Department of Agriculture, Forest Service, Washington, DC (http://www.na.fs.fed.us/spfo/pubs/silvics_manual/table_of_contents.htm).
CHANG, S., J. D. PURYEAR, M. A. D. L. DIAS, E. A. FUNKHOUSER, R. J. NEWTON et al., 1996 Gene expression under water deficit in loblolly pine (Pinus taeda): isolation and characterization of cDNA clones. Physiol. Plant. 97: 139148.[CrossRef]
DUBOS, C., and C. PLOMION, 2001 Drought differentially affects expression of a PR-10 protein in needles of maritime pine (Pinus pinaster Ait.) seedlings. J. Exp. Bot. 358: 11431144.
DUBOS, C., and C. PLOMION, 2003 Identification of water-deficit responsive genes in maritime pine (Pinus pinaster Ait.) roots. Plant Mol. Biol. 51: 249262.[CrossRef][Medline]
DUBOS, C., G. LE-PROVOST, D. POT, F. SALIN, C. LALANE et al., 2003 Identification and characterization of water-stress-responsive genes in hydroponically grown maritime pine (Pinus pinaster) seedlings. Tree Physiol. 23: 169179.[Medline]
DVORNYK, V., A. SIRVIÖ, M. MIKKONEN and O. SAVOLAINEN, 2002 Low nucleotide diversity at the pal1 locus in the widely distributed Pinus sylvestris. Mol. Biol. Evol. 19: 179188.
EWING, B., L. HILLIER, M. WENDL and P. GREEN, 1998 Basecalling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8: 175185.
FAY, J. C., and C.-I WU, 2000 Hitchhiking under positive Darwinian selection. Genetics 155: 14051413.
FILATOV, D. A., and D. CHARLESWORTH, 1999 DNA polymorphism, haplotype structure and balancing selection in the Leavenworthia PgiC locus. Genetics 153: 14231434.
FLINT-GARCÍA, S. A., J. M. THORNSBERRY and E. S. BUCKLER, IV, 2003 Structure of linkage disequilibrium in plants. Annu. Rev. Plant Biol. 54: 357374.[CrossRef][Medline]
FORD, M. J., 2002 Applications of selective neutrality tests to molecular ecology. Mol. Ecol. 11: 12451262.[CrossRef][Medline]
FU, Y. X., 1997 Statistical tests of neutrality of mutations against population growth, hitchhiking and background selection. Genetics 147: 915925.[Abstract]
GARCÍA-GIL, M. R., M. MIKKONEN and O. SAVOLAINEN, 2003 Nucleotide diversity at two phytochrome loci along a latitudinal cline in Pinus sylvestris. Mol. Ecol. 12: 11951206.[CrossRef][Medline]
GORDON, D., C. ABAJIAN and P. GREEN, 1998 Consed: a graphical tool for sequence finishing. Genome Res. 8: 195202.
HAMRICK, J. L., M. J. GODT and S. L. SHERMAN-BROYLES, 1992 Factors influencing levels of genetic diversity in woody plant species. New For. 6: 95124.
HILL, W. G., and A. ROBERTSON, 1968 Linkage disequilibrium in finite populations. Theor. Appl. Genet. 38: 226231.[CrossRef]
HILL, W. G., and B. S. WEIR, 1988 Variances and covariances of squared linkage disequilibria in finite populations. Theor. Popul. Biol. 33: 5478.[CrossRef][Medline]
HIRSCHHORN, J. N., and M. J. DALY, 2005 Genome-wide association studies for common diseases and complex traits. Nat. Rev. Genet. 6: 95108.[Medline]
HUDSON, R. R., M. KREITMAN and M. AGUADÉ, 1987 A test of neutral molecular evolution based on nucleotide data. Genetics 116: 153159.
INGRAM, J., and D. BARTELS, 1996 The molecular basis of dehydratation tolerance in plants. Annu. Rev. Plant Physiol. Plant Mol. Biol. 47: 377403.[CrossRef][Medline]
INGVARSSON, P. K., 2005 Nucleotide polymorphism and linkage disequilibrium within and among natural populations of European aspen (Populus tremula L., Salicaceae). Genetics 169: 945953.
JARVIS, S. B., M. A. TAYLOR, M. R. MACLEOD and H. V. DAVIES, 1996 Cloning and characterisation of the cDNA clones of three genes that are differentially expressed during dormancy-breakage in the seeds of Douglas fir (Pseudotsuga menziesii). J. Plant Physiol. 147: 559566.
KADO, T., H. YOSHIMARU, Y. TSUMURA and H. TACHIDA, 2003 DNA variation in a conifer, Cryptomeria japonica (Cupressaceae sensu lato). Genetics 164: 15471559.
KARPINSKA, B., M. KARLSSON, H. SCHINKEL, S. STRELLER, K. H. SÜSS et al., 2001 A novel superoxide-dismutase with a high isoelectric point in higher plants. Expression, regulation, and protein localization. Plant Physiol. 126: 16681677.
KOSAKOVSKY-POND, S. L., and S. D. W. FROST, 2005a Not so different after all: a comparison of methods for detecting amino acid sites under selection. Mol. Biol. Evol. 22: 12081222.
KOSAKOVSKY-POND, S. L., and S. D. W. FROST, 2005b Datamonkey: rapid detection of selective pressure on individual sites of codon alignments. Bioinformatics 21: 25312533.
KREITMAN, M., 2000 Methods to detect selection in populations with applications to the human. Annu. Rev. Genomics Hum. Genet. 1: 539559.[CrossRef][Medline]
KRUPKIN, A. B., A. LISTON and S. H. STRAUSS, 1996 Phylogenetic analysis of the hard pines (Pinus subgenus Pinus, Pinaceae) from chloroplast restriction site analysis. Am. J. Bot. 83: 489498.[CrossRef]
KRUTOVSKY, K. V., and D. B. NEALE, 2005 Nucleotide diversity and linkage disequilibrium in cold hardiness and wood quality related candidate genes in Douglas fir. Genetics 171: 20292041.
LI, L., X. H. ZHANG, C. P. JOSHI and V. L. CHIANG, 1998 Compression stress responsive expression of ferritin (accession no AF028072) and peroxidase genes (accession no AF028073) in developing xylem of loblolly pine (Pinus taeda). Plant Physiol. 116: 1604.
LONG, A. D., and C. H. LANGLEY, 1999 The power of association studies to detect the contribution of candidate genetic loci to variation in complex traits. Genome Res. 9: 720731.
LUIKART, G., P. R. ENGLAND, D. TALLMON, S. JORDAN and P. TABERLET, 2003 The power and promise of population genomics: from genotyping to genome typing. Nat. Rev. Genet. 4: 981994.[Medline]
MARCHINI, J., L. R. CARDON, M. S. PHILLIPS and P. DONNELLY, 2004 The effects of human population structure on large genetic association studies. Nat. Genet. 36: 512517.[CrossRef][Medline]
MCDONALD, J. H., and M. KREITMAN, 1991 Adaptive protein evolution at the Adh locus in Drosophila. Nature 351: 652654.[CrossRef][Medline]
NEALE, D. B., and O. SAVOLAINEN, 2004 Association genetics of complex traits in conifers. Trends Plant Sci. 9: 325330.[CrossRef][Medline]
NEI, M., 1987 Molecular Evolutionary Genetics. Columbia University Press, New York.
NEI, M., and W. H. LI, 1979 Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc. Natl. Acad. Sci. USA 76: 52695273.
NEWTON, R. J., E. A. FUNKHOUSER, F. FONG and C. G. TAUER, 1991 Molecular and physiological genetics of drought tolerance in forest species. For. Ecol. Manage. 43: 225250.[CrossRef]
NYBOM, H., and I. V. BARTISH, 2000 Effects of life history traits and sampling strategies on genetic diversity estimates obtained with RAPD markers in plants. Perspect. Plant Ecol. Evol. Syst. 3: 93114.
OLSEN, K. M., A. WOMACK, A. R. GARRETT, J. I. SUDDITH and M. D. PURUGGANAN, 2002 Contrasting evolutionary forces in the Arabidopsis thaliana floral developmental pathway. Genetics 160: 16411650.
PADMANABHAN, V., M. A. D. L. DIAS and R. J. NEWTON, 1997 Expression analysis of a gene family in loblolly pine (Pinus taeda L.) induced by water-deficit stress. Plant Mol. Biol. 35: 801807.[CrossRef][Medline]
PAYSEUR, B. A., A. D. CUTTER and M. W. NACHMAN, 2002 Searching for evidence of positive selection in the human genome using patterns of microsatellite variability. Mol. Biol. Evol. 7: 11431153.
POT, D., L. MCMILLAN, C. ECHT, G. LE-PROVOST, P. GARNIER-GÉRÉ et al., 2005 Nucleotide variation in genes involved in wood formation in two pine species. New Phytol. 167: 101112.[CrossRef][Medline]
PRITCHARD, J. K., and W. WEN, 2004 Documentation for Structure Software Version 2. Department of Human Genetics, University of Chicago, Chicago (http://pritch.bsd.uchicago.edu).
PRITCHARD, J. K., M. STEPHENS and P. DONNELLY, 2000 Inference of population structure using multilocus genotype data. Genetics 155: 945959.
RAFALSKI, A., and M. MORGANTE, 2004 Corn and humans: recombination and linkage disequilibrium in two genomes of similar size. Trends Genet. 20: 103111.[CrossRef][Medline]
REMINGTON, D. L., J. M. THORNSBERRY, Y. MATSOUKA, L. M. WILSON, S. R. WHITT et al., 2001 Structure of linkage disequilibrium and phenotypic associations in the maize genome. Proc. Natl. Acad. Sci. USA 98: 1147911484.
RICHARD, S., M. J. MORENCY, C. DREVET, L. JOUANIN and A. SÉGUIN, 2000 Isolation and characterization of a dehydrin gene from white spruce induced upon wounding, drought and cold stresses. Plant Mol. Biol. 43: 110.[CrossRef][Medline]
ROSENBERG, N. A., and M. NORDBORG, 2002 Genealogical trees, coalescent theory and the analysis of genetic polymorphisms. Nat. Rev. Genet. 3: 380390.[CrossRef][Medline]
ROSENBERG, N. A., J. K. PRITCHARD, J. L. WEBER, H. M. CANN, K. K. KIDD et al., 2002 Genetic structure of human populations. Science 298: 23812385.
ROZAS, J., J. C. SÁNCHEZ-DEL-BARRIO, X. MESSEGUER and R. ROZAS, 2003 DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics 19: 24962497.
SANO, A., and H. TACHIDA, 2005 Gene genealogy of test statistics of neutrality under population growth. Genetics 169: 16871697.
SCHMIDTLING, R. C., 2001 Southern Pine Seed Sources. USDA, GTR SRS-44, Asheville, NC.
SCHMIDTLING, R. C., E. CARROLL and T. LAFARGE, 1999 Allozyme diversity of selected and natural loblolly pine populations. Silvae Genet. 48: 3545.
SCHNEIDER, S., D. ROESSLI and L. EXCOFFIER, 2000 Arlequin Ver. 2000: A Software for Population Genetics Data Analysis. Genetics and Biometry Laboratory, University of Geneva, Geneva.
SEKI, M., A. KAMEI, K. YAMAGUCHI-SHINOZAKI and K. SHINOZAKI, 2003 Molecular responses to drought, salinity and frost: common and different paths for plant protection. Curr. Opin. Biotechnol. 14: 194199.[CrossRef][Medline]
SLATKIN, M., 1994 An exact test for neutrality based on the Ewens sampling distribution. Genet. Res. 64: 7174.[Medline]
SLATKIN, M., 1996 A correction to the exact test based on the Ewens sampling distribution. Genet. Res. 68: 259260.[Medline]
STAHL, M. G., G. DWYER, R. MAURICIO, M. KREITMAN and J. BERGELSON, 1999 Dynamics of disease resistance polymorphism at the Rpm1 locus of Arabidopsis. Nature 400: 667671.[CrossRef][Medline]
SUZUKI, Y., and T. GOJOBORI, 1999 A method for detecting positive selection at single amino acid sites. Mol. Biol. Evol. 16: 13151328.[Abstract]
TAJIMA, F., 1989 Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123: 585595.
TAKEUCHI, F., K. YANAI, T. MORII, Y. ISHINAGA, K. TANIGUCHI-YANAI et al., 2005 Linkage disequilibrium grouping of single nucleotide polymorphisms (SNPs) reflecting haplotype phylogeny for efficient selection of tag SNPs. Genetics 170: 291304.
TEMESGEN, B., G. R. BROWN, D. E. HARRY, C. S. KINLAW, M. M. SEWELL et al., 2001 Genetic mapping of expressed sequence tag polymorphism (ESTP) markers in loblolly pine (Pinus taeda L.). Theor. Appl. Genet. 102: 664675.[CrossRef]
TIAN, D., H. ARAKI, E. STAHL, J. BERGELSON and M. KREITMAN, 2002 Signature of balancing selection in Arabidopsis. Proc. Natl. Acad. Sci. USA 99: 1152511530.
TRANBARGER, T. J., and S. MISRA, 1996 Structure and expression of a developmentally regulated cDNA encoding a cysteine protease (pseudotzain) from Douglas-fir. Gene 172: 221226.