Abstract

We measured linkage disequilibrium in mostly noncoding regions of Cryptomeria japonica, a conifer belonging to Cupressaceae. Linkage disequilibrium was extensive and did not decay even at a distance of 100 kb. The average estimate of the population recombination rate per base pair was 1.55 × 10−5 and was <1/70 of that in the coding regions. We discuss the impact of low recombination rates in a large part of the genome on association studies.

LINKAGE disequilibrium (LD) in the coding genes and their surrounding regions of conifers has been reported to extend to only several hundred to a few thousand base pairs [Brown et al. (2004); Heuertz et al. (2006); Pyhäjärvi et al. (2007), but see Pyhäjärvi et al. (2011) for LD extending to several kilobases]. This observation led to the conclusion that conifers are not suitable for finding associations with traits using a genomic scan (Neale and Savolainen 2004). However, if the genome-wide extent of LD is not the same as that in the regions studied thus far, the strategy may change. Indeed, the extents of LD differ between genomic regions in maize possibly due to differences in the recombination rate (Gore et al. 2009). Furthermore, conifers generally have lower recombination rates than angiosperms (Jaramillo-Correa et al. 2010). Thus, it is important to know the extent of LD in regions other than the coding genes and their surrounding regions. However, the extent of LD in such regions has not been investigated in conifers to our knowledge possibly because of the lack of sequence information on those regions [but see Hamberger et al. (2009); Keeling et al. (2010); Kovach et al. (2010)].

We recently constructed a bacterial artificial chromosome (BAC) library for Cryptomeria japonica, a conifer belonging to Cupressaceae sensu lato (Gadek et al. 2000; Kusumi et al. 2000) and determined the sequences of eight random clones from the library. We used these sequences to design primers and resequenced fragments in those regions using the same tree samples used by Fujimoto et al. (2008) for the study of the coding regions. From these data, we investigated the extent of LD at a scale of up to 100 kb in the mostly noncoding regions covered by those BACs (see Supporting Information, File S1, for more details on materials and methods).

First, we give a summary of the sequence analysis of the BAC clones relevant to this study. Details of the analysis will be published elsewhere (M. Tamura, A. Watanabe, K. Uchiyama, N. Futamura, K. Shinohara, Y. Tsumura, H. Tachida, unpublished results). Eight random clones (BAC1–BAC8) from the BAC library developed by the Forestry and Forest Products Research Institute in Japan were sequenced. The total length of the sequences was ∼800 kb. These sequences were mostly noncoding, and at least 51% of them consisted of known (the majority being LTR retrotransposons) or unknown repeats. Only part (four exons) of one putative protein-coding gene, homologous to genes coding for calcium-dependent protein kinases, was found in BAC6. The ratio of nonsynonymous-to-synonymous divergence between the exons and the corresponding regions in Taxodium distichum, a close relative of C. japonica, was 0.086, suggesting that the gene was active when the two species diverged. The size of the largest intron of this gene was 70 kb (see Figure 1).

Figure 1 

Positions of the regions in BAC3, BAC6, and BAC7 used for the study. The regions sequenced for at least 41 samples are shown by “| |,” and the positions of the exons of a putative coding gene are indicated by solid boxes (for primer information, see Table S1). We expected to amplify only one copy of each fragment if it existed in a single copy in the genome because we used haploid genomic DNA from megagametophytes. However, because the genome of C. japonica contained many copies of very similar fragments, only sequences of several fragments each in BAC3, BAC6, and BAC7 indicated in the figure could be determined for at least 40 samples by direct sequencing. We used long PCR and gel electrophoresis to confirm the distances between the sites in the sequenced fragments. The lengths of the PCR products approximately agreed with those calculated from the assembled BAC sequences (data not shown).

In the resequencing experiment, in total, 19 fragments, including the coding regions in BAC6, could be sequenced for at least 40 samples, and their total length was 15,517 bp (Figure 1). We will call them the BAC regions hereafter. The fragment size ranged from 509 to 2016 bp, and the maximum lengths between pairs of sites in the sequenced regions in BAC3, BAC6, and BAC7 were 65,409, 110,613 and 103,427 bp, respectively.

We aligned those 40 or more sequences along with the corresponding sequence of the BACs. Thus, the total number of samples was at least 41. We computed various statistics of genetic diversity excluding sites involving indel variation (Table S2) using DNAsp 5.0 (Rozas et al. 2003). The average nucleotide diversity across all sites was 0.35%, which was similar to those at the silent sites in the mostly coding regions of the same species studied by Kado et al. (2003) and Fujimoto et al. (2008).

Figure 2 shows the squared allelic correlation coefficient, r2, against the number of base pairs separating the two sites. It decayed very little in all three BACs. Thus, LD extended to sites separated by up to 110 kb. We also estimated the population recombination rate, 4Ner, where Ne and r are the effective size of the population and the recombination rate, respectively, in the BAC regions using the composite-likelihood method (Hudson 2001; McVean et al. 2002) implemented in LDhat 2.1. The results for the BAC regions are shown in Table 1, along with those for the five coding genes previously studied by Fujimoto et al. (2008). The composite-likelihood curves for the data set are shown in Figure S1 (the BACs) and Figure S2 (the coding genes). The average r/u and 4Ner per base was 5.04 × 10−3 and 1.55 × 10−5, respectively, for the BAC regions and 4.46 × 10−1 and 1.18 × 10−3, respectively, for the five coding genes. Thus, the population recombination rate seems to be <1/70 in the BAC regions compared to that in the coding genes studied by Fujimoto et al. (2008).

Figure 2 

The levels of linkage disequilibrium in BAC3, BAC6, and BAC7. The squared allelic correlation coefficient, r2, for two sites is plotted against the number of bases separating them for BAC3 (A), BAC6 (B), and BAC7 (C).

Estimates of population recombination rate

Table 1 
Estimates of population recombination rate
RegionLength (bp)RM4Nr4Nr/bp4Nu/bpr/u
BACs
 BAC365,4110000.006590
 BAC6110,278132.72 × 10−50.003170.00858
 BAC7103,427721.93 × 10−50.002960.00653
Mean1.55 × 10−50.003460.00504
Coding genes
 NCED2,125230.001410.002130.66280
 AMT2,682220.000750.000930.80184
 Cal3,541230.000850.004540.18661
 AQU1,787140.002230.004690.47727
 Cryj24,257630.000700.006810.10348
Mean0.001180.003820.44640
RegionLength (bp)RM4Nr4Nr/bp4Nu/bpr/u
BACs
 BAC365,4110000.006590
 BAC6110,278132.72 × 10−50.003170.00858
 BAC7103,427721.93 × 10−50.002960.00653
Mean1.55 × 10−50.003460.00504
Coding genes
 NCED2,125230.001410.002130.66280
 AMT2,682220.000750.000930.80184
 Cal3,541230.000850.004540.18661
 AQU1,787140.002230.004690.47727
 Cryj24,257630.000700.006810.10348
Mean0.001180.003820.44640
Table 1 
Estimates of population recombination rate
RegionLength (bp)RM4Nr4Nr/bp4Nu/bpr/u
BACs
 BAC365,4110000.006590
 BAC6110,278132.72 × 10−50.003170.00858
 BAC7103,427721.93 × 10−50.002960.00653
Mean1.55 × 10−50.003460.00504
Coding genes
 NCED2,125230.001410.002130.66280
 AMT2,682220.000750.000930.80184
 Cal3,541230.000850.004540.18661
 AQU1,787140.002230.004690.47727
 Cryj24,257630.000700.006810.10348
Mean0.001180.003820.44640
RegionLength (bp)RM4Nr4Nr/bp4Nu/bpr/u
BACs
 BAC365,4110000.006590
 BAC6110,278132.72 × 10−50.003170.00858
 BAC7103,427721.93 × 10−50.002960.00653
Mean1.55 × 10−50.003460.00504
Coding genes
 NCED2,125230.001410.002130.66280
 AMT2,682220.000750.000930.80184
 Cal3,541230.000850.004540.18661
 AQU1,787140.002230.004690.47727
 Cryj24,257630.000700.006810.10348
Mean0.001180.003820.44640

One explanation for the difference of LD between the BACs and the coding regions is the difference in the recombination rate. In angiosperms, gene density and the recombination rate are positively correlated, possibly due to low gene density and low recombination rates in heterochromatic regions (Gaut et al. 2007). Therefore, if the genome of C. japonica is mostly heterochromatic, we expect to observe low recombination rates and low gene density in random BAC clones. Alternatively, the effective size, Ne, in the BAC regions might be smaller than that in the coding genes because of genetic draft (Gillespie 2000), background selection (Charlesworth et al. 1993), and/or the weak selection Hill–Robertson effect (McVean and Charlesworth 2000). However, because the levels of the silent nucleotide diversity in the two regions were comparable, this explanation seemed unlikely.

The slow decay of LD in the BAC regions is quite different from the rapid decay of LD within 500–2000 bp observed in Pinaceae and poplars (Brown et al. 2004; Ingvarsson 2005, 2008; Krutovsky and Neale 2005; González-Martínez et al. 2006; Heuertz et al. 2006; Pyhäjärvi et al. 2007; Ingvarsson et al. 2008; Li et al. 2010) although more extensive LD has been found in a few cases (Eckert et al. 2010; Pyhäjärvi et al. 2011). C. japonica is distantly related to these species (Chaw et al. 2000), and the difference may be ascribed to differences in the species studied; however, a more plausible explanation would be the differences in the regions studied. The other studies examined relatively narrow regions that included coding genes [except for the 80-kb gene-rich region surrounding the phytochrome B2 locus in aspen studied by Ingvarsson et al. (2008)], while our BAC regions were in mostly noncoding regions. Indeed, the extensive linkage disequilibrium found here might be consistent with the interlocus linkage disequilibrium found in coastal Douglas fir (Eckert et al. 2009) and in Pinus taeda (Eckert et al. 2010).

On the basis of the rapid decay of linkage disequilibrium in conifers, Neale and Savolainen (2004) concluded that genome-wide association studies would not be possible in conifers because of the enormous SNP marker density required (see also Neale and Kremer 2011). However, the number of necessary SNP markers may not be as large if most of the genome is segregating as blocks, as found in the BAC regions studied here. Thus, genome-wide association studies may be feasible in conifers whose genome sizes are generally large (Ohri and Khoshoo 1986), although it would be difficult to locate the causal polymorphism if LD is extensive (Atwell et al. 2010). Nevertheless, identifying causal polymorphisms is not necessary for genomic selection (Meuwissen et al. 2001). Of course, we need to choose the positions of SNP markers carefully so that they are spaced appropriately in terms of recombination rates by identifying recombination-poor and -rich regions (see Flint-Garcia et al. 2003). If heterochromatic regions are recombination-poor in conifers as they are in angiosperms (Gaut et al. 2007), then identifying heterochromatic regions is important for designing efficient markers.

In this article, we have shown that linkage disequilibrium in the BAC regions was much more extensive than that in the coding regions. However, our data are restricted to those from only three random BAC clones. We need to test the generality of these features in future studies.

Acknowledgements

We thank two anonymous referees for their helpful comments. This study was partially supported by the Program for Promotion of Basic and Applied Researches for Innovations in Bio-oriented Industry and by a Grant-in-Aid for Scientific Research from the Japan Society for the Promotion of Science (no. 22370083).

Literature Cited

Atwell
S
,
Huang
Y S
,
Vilhjlmsson
B J
,
Willems
G
,
Horton
M
et al. .,
2010
Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines
.
Nature
465
:
627
631
.

Brown
G R
,
Gill
G P
,
Kuntz
R J
,
Langley
C H
,
Neale
D B
,
2004
Nucleotide diversity and linkage disequilibrium in loblolly pine
.
Proc. Natl. Acad. Sci. USA
101
:
15255
15260
.

Charlesworth
B
,
Morgan
M T
,
Charlesworth
D
,
1993
The effect of deleterious mutations on neutral molecular variation
.
Genetics
134
:
1289
1303
.

Chaw
S M
,
Parkinson
C L
,
Cheng
Y C
,
Vincent
T M
,
Palmer
J D
,
2000
Seed plant phylogeny inferred from all three plant genomes: monophyly of extant gymnosperms and origin of Gnetales from conifers
.
Proc. Natl. Acad. Sci. USA
97
:
4086
4091
.

Eckert
A J
,
Wegrzyn
J L
,
Pande
B
,
Jermstad
K D
,
Lee
J M
et al. .,
2009
Multilocus patterns of nucleotide diversity and divergence reveal positive selection at candidate genes related to cold hardiness in coastal Douglas fir (Pseudotsuga menziesii var. menziesii)
.
Genetics
183
:
289
298
.

Eckert
A J
,
Bower
A D
,
González-Martínez
S C
,
Wegrzyn
J L
,
Coop
G
et al. .,
2010
Back to nature: ecological genomics of loblolly pine (Pinus taeda, Pinaceae)
.
Mol. Ecol.
19
:
3789
3805
.

Flint-Garcia
S A
,
Thornsberry
J M
,
Buckler
E S
,
2003
Structure of linkage disequilibrium in plants
.
Annu. Rev. Plant Biol.
54
:
357
374
.

Fujimoto
A
,
Kado
T
,
Yoshimaru
H
,
Tsumura
Y
,
Tachida
H
,
2008
Adaptive and slightly deleterious evolution in a conifer, Cryptomeria japonica
.
J. Mol. Evol.
67
:
201
210
.

Gadek
P A
,
Alpers
D L
,
Heslewood
M M
,
Quinn
C J
,
2000
Relationships within Cupressaceae sensu lato: a combined morphological and molecular approach
.
Am. J. Bot.
87
:
1044
1057
.

Gaut
B S
,
Wright
S I
,
Rizzon
C
,
Dvorak
J
,
Anderson
L K
,
2007
Recombination: an underappreciated factor in the evolution of plant genomes
.
Nat. Rev. Genet.
8
:
77
84
.

Gillespie
J H
,
2000
Genetic drift in an infinite population: the pseudohitchhiking model
.
Genetics
155
:
909
919
.

Gonzalez-Martinez
S C
,
Ersoz
E
,
Brown
G R
,
Wheeler
N C
,
Neale
D B
,
2006
DNA sequence variation and selection of tag single-nucleotide polymorphisms at candidate genes for drought-stress response in Pinus taeda L
.
Genetics
172
:
1915
1926
.

Gore
M A
,
Chia
J M
,
Elshire
R J
,
Sun
Q
,
Grills
G S
et al. .,
2009
The first generation haplotype map of maize
.
Science
326
:
1115
1117
.

Hamberger
B
,
Hall
D
,
Yuen
M
,
Oddy
C
,
Hamberger
B
et al. .,
2009
Targeted isolation, sequence assembly and characterization of two white spruce (Picea glauca) BAC clones for terpenoid synthase and cytochrome P450 genes involved in conifer defence reveal insights into a conifer genome
.
BMC Plant Biol.
9
:
106
.

Heuertz
M
,
De Paoli
E
,
Källman
T
,
Larsson
H
,
Jurman
I
et al. .,
2006
Multilocus patterns of nucleotide diversity, linkage disequilibrium and demographic history of Norway spruce (Picea abies [L.] Karst)
.
Genetics
174
:
2095
2105
.

Hudson
R R
,
2001
Two-locus sampling distributions and their application
.
Genetics
159
:
1805
1817
.

Ingvarsson
P K
,
2005
Nucleotide polymorphism and linkage disequilibrium within and among natural populations of European Aspen (Populus tremula, L., Salicaceae)
.
Genetics
169
:
945
953
.

Ingvarsson
P K
,
2008
Multilocus patterns of nucleotide polymorphism and the demographic history of Populus tremula
.
Genetics
180
:
329
340
.

Ingvarsson
P K
,
Victoria Garcia
M
,
Luquez
V
,
Hall
D
,
Jansson
S
,
2008
Nucleotide polymorphism and phenotypic associations within and around the phytochrome B2 locus in European Aspen (Populus tremula, Salicaceae)
.
Genetics
178
:
2217
2226
.

Jaramillo-Correa
J P
,
Verdú
M
,
González-Martínez
S C
,
2010
The contribution of recombination to heterozygosity differs among plant evolutionary lineages and life-forms
.
BMC Evol. Biol.
10
:
22
.

Kado
T
,
Yoshimaru
H
,
Tsumura
Y
,
Tachida
H
,
2003
DNA variation in a conifer, Cryptomeria japonica (Cupressaceae sensu lato)
.
Genetics
164
:
1547
1559
.

Keeling
C I
,
Dullat
H K
,
Yuen
M
,
Ralph
S G
,
Jancsik
S
et al. .,
2010
Identification and functional characterization of monofunctional ent-copalyl diphosphate and ent-kaurene synthases in white spruce reveal different patterns for diterpene synthase evolution for primary and secondary metabolism in gymnosperms
.
Plant Physiol.
152
(
3
):
1197
1208
.

Kovach
A
,
Wegrzyn
J L
,
Parra
G
,
Holt
C
,
Bruening
G E
et al. .,
2010
The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences
.
BMC Genomics
11
:
420
.

Krutovsky
K V
,
Neale
D B
,
2005
Nucleotide diversity and linkage disequilibrium in cold-hardiness- and wood quality-related candidate genes in Douglas fir
.
Genetics
171
:
2029
2041
.

Kusumi
J
,
Tsumura
Y
,
Yoshimaru
H
,
Tachida
H
,
2000
Phylogenetic relationships in Taxodiaceae and Cupressaceae sensu stricto based on matK gene, chlL gene, trnL-trnF IGS region, and trnL intron sequences
.
Am. J. Bot.
87
:
1480
1488
.

Li
Y
,
Stocks
M
,
Hemmilä
S
,
Källman
T
,
Zhu
H T
et al. .,
2010
Demographic histories of four spruce (Picea) species of the Qinghai-Tibetan Plateau and neighboring areas inferred from multiple nuclear loci
.
Mol. Biol. Evol.
27
(
5
):
1001
1014
.

McVean
G A T
,
Charlesworth
B
,
2000
The effects of Hill-Robertson interference between weakly selected mutations on patterns of molecular evolution and variation
.
Genetics
155
:
929
944
.

McVean
G
,
Awadalla
P
,
Fearnhead
P
,
2002
A coalescent-based method for detecting and estimating recombination rates from gene sequences
.
Genetics
160
:
1231
1241
.

Meuwissen
T H E
,
Hayes
B J
,
Goddard
M E
,
2001
Prediction of total genetic value using genome-wide dense marker maps
.
Genetics
157
:
1819
1829
.

Neale
D B
,
Kremer
A
,
2011
Forest tree genomics: growing resources and applications
.
Nat. Rev. Genet.
12
:
111
122
.

Neale
D B
,
Savolainen
O
,
2004
Association genetics of complex traits in conifers
.
Trends Plant Sci.
9
:
325
330
.

Ohri
D
,
Khoshoo
T N
,
1986
Genome size in gymnosperms
.
Plant Syst. Evol.
153
:
119
132
.

Pyhäjärvi
T
,
García-Gil
M R
,
Knürr
T
,
Mikkonen
M
,
Wachowiak
W
et al. .,
2007
Demographic history has influenced nucleotide diversity in European Pinus sylvestris populations
.
Genetics
177
:
1713
1724
.

Pyhäjärvi
T
,
Kujala
S T
,
Savolainen
O
,
2011
Revisiting protein heterozygosity in plants: nucleotide diversity in allozyme coding genes of conifer Pinus sylvestris
.
Tree Genet. Genomes
7
:
385
397
.

Rozas
J
,
Sanchez-DelBarrio
J C
,
Messeguer
X
,
Rozas
R
,
2003
DnaSP, DNA polymorphism analyses by the coalescent and other methods
.
Bioinformatics
19
:
2496
2497
.

Footnotes

Communicating editor: O. Savolainen

Author notes

1

These authors contributed equally to this study.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Supplementary data