Abstract
Bacterial artificial chromosome (BAC) physical maps embedding a large number of BAC end sequences (BESs) were generated for Oryza sativa ssp. indica varieties Minghui 63 (MH63) and Zhenshan 97 (ZS97) and were compared with the genome sequences of O. sativa spp. japonica cv. Nipponbare and O. sativa ssp. indica cv. 93-11. The comparisons exhibited substantial diversities in terms of large structural variations and small substitutions and indels. Genome-wide BAC-sized and contig-sized structural variations were detected, and the shared variations were analyzed. In the expansion regions of the Nipponbare reference sequence, in comparison to the MH63 and ZS97 physical maps, as well as to the previously constructed 93-11 physical map, the amounts and types of the repeat contents, and the outputs of gene ontology analysis, were significantly different from those of the whole genome. Using the physical maps of four wild Oryza species from OMAP (http://www.omap.org) as a control, we detected many conserved and divergent regions related to the evolution process of O. sativa. Between the BESs of MH63 and ZS97 and the two reference sequences, a total of 1532 polymorphic simple sequence repeats (SSRs), 71,383 SNPs, 1767 multiple nucleotide polymorphisms, 6340 insertions, and 9137 deletions were identified. This study provides independent whole-genome resources for intra- and intersubspecies comparisons and functional genomics studies in O. sativa. Both the comparative physical maps and the GBrowse, which integrated the QTL and molecular markers from GRAMENE (http://www.gramene.org) with our physical maps and analysis results, are open to the public through our Web site (http://gresource.hzau.edu.cn/resource/resource.html).
RICE is a staple food crop worldwide and a model organism for the study of monocots (Li et al. 2007). The genus Oryza contains 21 wild and 2 cultivated species (Oryza sativa and O. glaberrima) with 10 genome types (Ammiraju et al. 2006; Jacquemin et al. 2013). O. sativa is composed of two subspecies: japonica and indica (Han and Xue 2003). The indica varieties Minghui 63 (MH63) and Zhenshan 97 (ZS97) contain a number of important agronomic traits and are the parents of Shanyou 63, which is the most widely cultivated hybrid rice in China (Xie et al. 2010). Recombinant inbred line (RIL) populations, derived from a cross between MH63 and ZS97, have made great contributions for identifying and analyzing yield-related quantitative trait loci (QTL) (Li et al. 2000; Xing et al. 2002; Fan et al. 2006; Xue et al. 2008).
Comparative analysis is an important tool for structural, functional, and evolutionary genomics studies. Whole-genome sequences of japonica variety Nipponbare and indica variety 93-11 have been released and updated (Yu et al. 2002, 2005; International Rice Genome Sequencing Project 2005; Gao et al. 2013; Kawahara et al. 2013), and massive comparisons between these two genomes have been carried out to reveal the genetic variations (Han and Xue 2003; Feltus et al. 2004; Ma and Bennetzen 2004; Yu et al. 2005; Ding et al. 2007; Huang et al. 2008). The Oryza Map Alignment Project (OMAP) has constructed bacterial artificial chromosome (BAC) libraries and BAC-based physical maps for 17 of 23 Oryza species representing all the 10 genome types (Ammiraju et al. 2006, 2010b; Kim et al. 2008; Jacquemin et al. 2013). These BAC-based resources make whole or targeted genomic comparisons available and shed light on the genomic variation and evolution (Kim et al. 2007; Zhang et al. 2007; Ammiraju et al. 2008; Feng et al. 2009; Lu et al. 2009; Ammiraju et al. 2010a; Hurwitz et al. 2010). With the development of next generation sequencing technology, it is now cost effective and possible to extensively detect small nucleotide variations, such as SNPs and indels (Huang et al. 2010, 2012).
Although many studies have been made for comparisons within O. sativa, those comparative genomics studies were mainly focused on the small nucleotide variations. Genome-wide comparisons at the structural level (such as expansion/contraction, inversion, etc.) were limited due to the lack of suitable resources. Previously we made a physical mapping comparison of two japonica varieties, Nipponbare and Zhonghua 11 (ZH11), that revealed substantial variations (SNPs, indels, and expansions/contractions) (Lin et al. 2012). Subsequently we constructed a BAC-based physical map of 93-11 that provided a resource for completion of the 93-11 reference sequence and robust intersubspecies comparisons (Pan et al. 2013). Although the varieties in both subspecies display high genetic diversity, especially in indica groups (Garris et al. 2005; Huang et al. 2012), to date, no suitable genomic resources of other indica varieties are available to carry out intrasubspecies comparisons with 93-11 and differentiate the real intersubspecies variations from the variety-specific variations between 93-11 and Nipponbare.
MH63 and ZS97 are typical indica varieties and important materials in functional genomics studies. Here we report the generation and application of the BAC libraries, BAC end sequences (BESs), and BAC-based physical maps of MH63 and ZS97. With these new resources, we detected intersubspecies BAC-sized and contig-sized structural variations by comparing these two physical maps with the Nipponbare reference sequence, and intersubspecies and intra-indica subspecies polymorphic simple sequence repeats (SSRs), SNPs, multiple nucleotide polymorphisms (MNPs), and indels by comparing these two BES data sets with the Nipponbare and 93-11 reference sequences. Using the physical maps of O. glaberrima (accession no. International Rice Germplasm Center (IRGC) 96717), O. rufipogon (OR105491) (accession no. IRGC 105491), O. rufipogon (OR106424) (accession no. IRGC 106424), and O. nivara (accession no. IRGC 100897, W0106) from OMAP (http://www.omap.org) as a control, we identified many conserved and divergent regions related to the evolution and domestication processes. Both the comparative physical maps and the GBrowse, which integrated the QTL and molecular markers from GRAMENE (http://www.gramene.org) with our physical maps and analysis results, are open to the public through our Web site (http://gresource.hzau.edu.cn/resource/resource.html).
Materials and Methods
BAC library construction, end sequencing, fingerprinting, and contig assembly
The BAC libraries of MH63 and ZS97 were constructed as previously described (Luo et al. 2001; Luo and Wing 2003; C. Wang et al. 2013). The megabase genomic DNA was isolated from young seedlings. The vector, pAGIBAC1 with HindIII, was prepared as previously described (Luo et al. 2006). The Escherichia coli DH10B T1-resistant competent cells were used as a host, and the clones were arranged into 384-well plates. Copies of the two BAC libraries were stored at both the Arizona Genomics Institute (www.genome.arizona.edu) and the Genome Resource Laboratory of Huazhong Agricultural University (http://gresource.hzau.edu.cn). For insert sizing, 220 clones of each library were randomly selected, and the plasmids were restricted with NotI and analyzed on 1% agarose CHEF gels (Bio-Rad, Hercules, CA) with a 5- to 15-sec linear ramp time at 6 V/cm and 14° in 0.5× TBE buffer for 16 hr. BAC end sequencing and fingerprinting were performed following previous protocols (Luo et al. 2003; Kim et al. 2007; Lin et al. 2012; X. Wang et al. 2013). BESs with a quality score less than Phred 16 or total length <100 bp were removed. All of the BESs are deposited in GenBank (KG702200–KG737748 for MH63 and KG737749–KG771717 for ZS97). The clones that produced <25 or >180 fingerprint bands were excluded in contig assembly. The clones were assembled into contigs with the program FPC (Soderlund et al. 2000).
Contig alignment and manual editing
The Os-Nipponbare-Reference-IRGSP-1.0 and updated 93-11 whole-genome sequences were used as reference sequences (Gao et al. 2013; Kawahara et al. 2013). The physical maps of MH63 and ZS97 were aligned to reference sequences and displayed by SyMAP (Soderlund et al. 2006). The contigs and alignments were manually edited and improved against the Nipponbare reference sequence as described by Kim et al. (2007) and Lin et al. (2012).
Comparative analysis of physical maps and integration of resources
The BAC-sized structural variations were detected as described by Hurwitz et al. (2010) with some modifications. We used a more conservative cutoff value (insert size ranges 75–190 kb for MH63 and ZS97 and 70–190 kb for 93-11) to eliminate false positives. Also, we added the following criteria to acquire more precise results:
The expansion/contraction clones, with differences of <20 kb between the length on the contig according to average consensus band (CB) size and its hit length on the reference sequence, were excluded.
The hit location of a discordant clone on the reference sequence was restricted to within the hit location of the corresponding contig on the reference sequence (defined by SyMAP on comparative physical maps). The two independent alignments (BESs alignment and SyMAP alignment) ensured that the BES could be mapped to the correct location on the reference sequence.
The clones whose hit regions spanned the gaps on the reference sequence were removed from the results, considering the imprecise sizes of the gaps.
The latest edition of Os-Nipponbare-Reference-IRGSP-1.0 (Kawahara et al. 2013) was used for the Nipponbare reference sequence.
The clones selected for expansion/contraction verifications were digested by NotI and separated on CHEF gels. The reference sequences of Maize, Sorghum, and Brachypodium distachyon were downloaded from the NCBI database (http://www.ncbi.nlm.nih.gov), and the contigs of MH63 and ZS97 were aligned to these reference sequences by SyMAP (Soderlund et al. 2006). GBrowse 2.0 (Stein et al. 2002) was used to integrate the genomic data.
BES analysis
SSRs in the BESs were identified by SciRoko3.4 (Kofler et al. 2007), with the criteria of a minimum repeat number of 3 and a minimum total length of 15 bp. The primers for SSR were designed by Primer3 (Rozen and Skaletsky 2000). Bowtie2 (Langmead and Salzberg 2012) was used to map the SSR primers to reference sequences, and the primers that could be mapped to their original locations in BESs were considered effective ones. At first, a pair of primers, designed for a SSR locus, was mapped to the reference sequence and BES dataset, respectively. If the mapped region in reference sequence and the mapped region in BES contained the same SSR motif but had a different length, this SSR locus were defined as a polymorphism between the reference sequence and the BES. To verify the SSR loci, a total of 21 pairs of primers were randomly selected from the above Primer3-designed primers and used for PCR with the MH63 and ZS97 genomic DNA as templates. Before performing the comparisons of SSR contents and distributions, the BESs of each species were clustered by CAP3 (Huang and Madan 1999) to reduce the redundancy.
The known repetitive sequences (homologous with the identified rice repetitive sequences) within BESs were identified by RepeatMasker (http://repeatmasker.org), using the Repbase of rice (Jurka et al. 2005) combined with the rice repetitive sequences downloaded from the Michigan State University Rice Genome Annotation Project (MSU: ftp://ftp.plantbiology.msu.edu/pub/data/TIGR_Plant_Repeats/). As described by Lin et al. (2012) and Aggarwal et al. (2009), we manually detected the variety-specific repeat sequences in MH63 and ZS97 genomes and annotated them with Blast2GO (Gotz et al. 2008). The BESs and physical maps of wild rice were downloaded from OMAP (http://www.omap.org). The small variations (SNP, MNP, insertion, and deletion) between the repeat/organelle sequences-masked BESs and reference sequences were detected with the method and standard of Feltus et al. (2004).
Results
Generation of MH63 and ZS97 BAC libraries, BAC end sequences, and fingerprints
To obtain basic genomic resources of MH63 and ZS97, we constructed high-quality and deep-coverage BAC libraries for both varieties. Each library contained 36,864 clones arrayed in ninety-six 384-well plates, and the average insert sizes for the MH63 and ZS97 libraries were 125 and 117 kb, covering ∼10.7-fold and ∼10.0-fold genome, respectively, based on the 430 Mb genome size.
The 18,432 clones from the first forty-eight 384-well plates of each library were end sequenced bidirectionally and fingerprinted. After quality trimming, 35,549 MH63 BESs with an average length of 659 bp and a cumulative length of 23,421,124 bp and 33,969 ZS97 BESs with an average length of 666 bp and a cumulative length of 22,623,600 bp were successfully generated. In terms of fingerprinting, we successfully generated 17,310 BAC fingerprints for MH63 with an average CB unit size of 1078 bp and 16,580 BAC fingerprints for ZS97 with an average CB unit size of 1114 bp (Supporting Information, Table S1).
Construction of the MH63 and ZS97 physical maps
With the tolerance of 4 and the cutoff of 10−25, 16,580 and 12,965 clones were assembled into 1230 and 1929 contigs, and 730 and 3615 clones were left as singletons for MH63 and ZS97, respectively. Based on the average CB unit size, the total and average contig sizes are 353,085 kb and 287 kb and 393,765 kb and 204 kb for MH63 and ZS97, respectively (Table S1). The primary FPC assembly for each map was referred to as the phase I physical map.
The two phase I physical maps were manually edited, referring to the Nipponbare reference sequence. The resulting phase II physical maps of MH63 and ZS97 consisted of 574 and 1228 contigs, containing 16,735 and 13,972 clones, and leave 575 and 2608 clones as singletons, respectively. The total and average contig sizes are 304,227 kb and 530 kb for MH63 and 353,177 kb and 288 kb for ZS97. In contigs, a total of 15,823 clones for MH63 and 12,118 clones for ZS97 have paired-end sequences that are invaluable resources for comparative genomics (Table S1). The phase II physical maps of MH63 and ZS97 can be downloaded from our Web site.
Comparison of physical maps with the rice reference sequences
Genome-wide BAC-sized structural variations:
The genome structural variations could be detected based on the information of paired-end BES hits and the physical maps (Hurwitz et al. 2010). We aligned 9563 clones of MH63, 7243 clones of ZS97, and 15,142 clones of 93-11 to the Nipponbare reference sequence through BESs. In Nipponbare, 33 contractions (83 clones) and 140 expansions (464 clones), 69 contractions (201 clones) and 31 expansions (84 clones), and 46 contractions (124 clones) and 180 expansions (726 clones) were identified when MH63, ZS97, and 93-11 BAC clones were aligned to the Nipponbare reference sequence, respectively (Table 1). The structural variations were widely dispersed across the Nipponbare genome, except for the middle region of chromosome 5 in which no expansion was found (Figure 1). The detailed information of structural variations is shown in File S1.
The expansions and contractions in japonica variety Nipponbare detected by the genome-wide comparisons with indica varieties MH63, ZS97, and 93-11. For each chromosome, lines 1–3 from top to bottom show expansions, and lines 4–6 show contractions.
To validate the putative expansions and contractions, we compared the hit distance of paired-end BESs on the Nipponbare reference sequence with the actual size of each BAC clone measured on the CHEF gel. A total of 186 MH63 clones (from 14 contractions and 50 expansions) and 77 ZS97 clones (from 12 contractions and 18 expansions) were randomly selected and experimentally analyzed. The differences between the actual sizes of all clones and their hit lengths on Nipponbare were confirmed and were >20 kb, except for the 4 clones of one MH63 contraction (File S2).
The regions, which were identified as expansions in the Nipponbare genome relative to one or more of the three indica varieties, have higher than average repeat contents (48% in expansions vs. 42.73% in the whole genome). The repeats in the expansion regions were mainly LTR Gypsy/DIRS1 (55.98% of expansion repeats vs. 48.98% of whole-genome repeats). However, the ratio of DNA transposons in expansion (29.40% of expansion repeats) was lower than the average of the whole genome (34.64% of whole-genome repeats). It is interesting that the gene density of the expansion regions is approximately equal to the value of the whole genome. By analyzing the gene ontology (GO) terms in these expansions, we found a statistically significant (P < 0.05) overrepresentation of genes related to response to stimulus in biological processes, especially the genes related to response to stress and abiotic stimulus. We also found an overrepresentation of GO terms of macromolecule biosynthetic process, gene expression, and photosynthesis of metabolic process, under biological processes. The nucleic acid binding and transcription regulator activity categories under molecular function and the nitrogen compound metabolic process under biological processes were found to be underrepresented (File S3). A total of 14 expansion regions and 1 contraction region were shared when the three indica varieties were compared with the Nipponbare reference sequence. Of the above shared expansion regions, most were also shared in comparisons of OR105491, OR106424, and O. nivara to the Nipponbare reference sequence (File S4). In these shared expansion regions, 54.93% of the sequences were classified as repeat elements, in which the ratio of Gypsy/DIRS1 was 58.40% and that of DNA transposons was 25.69%, although the gene density was still approximately equal to the value of the whole genome.
Genome-wide contig-sized structural variations:
We constructed the intersubspecies comparative physical maps by aligning the MH63 and ZS97 physical map contigs onto the Nipponbare reference sequence, using the BESs embedded in the contigs as bridges (Figure 2). The anchored contigs accounted for 85% and 75% of the total contigs, and the percentage of CB units of anchored contigs to total contigs was 95% and 86% in the comparisons of MH63 and ZS97 to the Nipponbare reference sequence, respectively. A total of 16,356 MH63 clones and 12,845 ZS97 clones were anchored to the reference sequence, of which 13,456 and 10,654 clones were directly anchored by BES hits (6258 and 4707 clones with paired-end BES hits), respectively (Table 2). The unanchored contigs either were unique regions in MH63 and ZS97 genomes, relative to reference sequences, or corresponded to the gap regions of the reference sequence. We also aligned the MH63 and ZS97 physical map contigs onto the 93-11 reference sequence. However, considering that the 93-11 reference sequence contains many gaps and assembly errors (Pan et al. 2013) that would affect the outputs of the comparisons, we displayed only the comparative results on our Web site and do not describe them here.
The SyMAP alignments of the MH63 phase II physical contigs to the Nipponbare reference sequence Chr1–Chr6. For each panel, the left boxes represent the contigs of MH63 and the right bar represents the Nipponbare chromosome. The green and red lines on the Nipponbare chromosomes indicate the centromere and gap locations, and the vertical black lines represent the gene locations. The centromere locations of chromosomes 4 and 5 were not available. The purple lines represent the alignments between BESs and reference sequences. The alignments for the remaining six chromosomes are available through our Web site (http://gresource.hzau.edu.cn/resource/resource.html).
In anchored contigs, the single-end BES hits were resources that were as invaluable as paired-end BES hits for detecting the structural variations. Based on the CB unit size and distance of BES hits, the expansions and contractions could be visualized by nonparallel alignment lines on comparative physical maps (Lin et al. 2012). Figure 3 presents two expansions and two contractions on the Nipponbare reference sequence relative to MH63 and ZS97 contigs. The estimated sizes of the two expansion regions were 213 kb and 203 kb, and those of the two contraction regions were 38 kb and 90 kb, respectively. A user-friendly visualization of the expansions/contractions, with zoom features, may be browsed from our Web site. We also found many inversely matched BESs in the two alignment projects. Using the alignment of MH63 contigs to the Nipponbare reference sequence as an example, we detected 1225 inversely matched BESs, and 705 were clustered into 260 groups of ≥2 BES hits (Table 2). The clusters of inversely matched BESs on comparative maps (viewed as cross alignment lines) may reflect genetic or spurious inversions (Lin et al. 2012; Deng et al. 2013; Pan et al. 2013). Figure 4 illustrates a 6844-bp region on the Nipponbare reference sequence [chromosome (Chr)5: 3,550,117–3,556,961] inversely matched by 5 BESs of MH63 contig37. Because the same region was normally matched by BESs of Nipponbare physical contig41 (downloaded from OMAP), and MH63 contig37 has a high quality (with deep coverage and high stringency), this region could most probably contain a genetic inversion between MH63 and Nipponbare. We did PCR experiments to verify this speculation and the results showed that the OSIABa0011B13.r was on the left of the OSIABa0041A04.r in the MH63 contig37. Because the hit location of the OSIABa0011B13.r on the Nipponbare reference sequence was on the right of the hit location of the OSIABa0041A04.r, the genetic inversion between MH63 and Nipponbare was confirmed (Figure 4). All detailed information of the inversely matched BESs is listed in File S5.
Alignments between MH63 contig1540 (A), MH63 contig514 (B), ZS97 contig244 (C), and ZS97 contig1079 (D) and the Nipponbare reference sequence. For each panel, the left vertical blue lines represent the BAC clones, and the purple lines in the center represent the alignments between the BESs and the reference sequence. A and C exhibit expansions in reference sequence, and B and D exhibit contractions in reference sequence. More structural variations can be browsed through our Web site (http://gresource.hzau.edu.cn/resource/resource.html).
Discovery and confirmation of the inversions by BES alignments. (A) The alignments between the BESs of partial MH63 contig37 and the Nipponbare reference sequence. (B) The alignments between the BESs of the Nipponbare contig41 and the same region of the Nipponbare reference sequence. The top red bars represent the Nipponbare reference sequence, and the blue lines represent the alignments of BESs to the reference sequence. The green bars represent the normally matched clones and the pink bars represent the inversely matched clones. (C) Experimental validation of the genetic inversion. Left, the pair of primers designed from the BES OSIABa0011B13.r. of the clone a0011B13 in the MH63 contig37. Right, the pair of primers designed from the BES OSIABa0041A04.r. of the clone a0041A04 in the MH63 contig37.
Contigs spanning the remaining physical gaps of the Nipponbare reference sequence:
Although the Nipponbare reference sequence is well known for its high quality, it still contains 44 physical gaps in the latest version (Kawahara et al. 2013). Of these, a total of 41 physical gaps can be spanned by contigs from one to all of our four physical maps (ZH11, 93-11, MH63, and ZS97 physical maps). One gap only could be spanned by MH63 contig and two gaps only could be spanned by ZS97 contigs. The BAC sequences of the gap-corresponding contigs could provide an initial insight into the contents of the gaps and help to close these gaps. Three gaps cannot be spanned by any contigs of the four physical maps (File S6). These three gap-corresponding regions may have complex structures and seem to be conserved in O. sativa. Of these three gaps, the seventh physical gap on chromosome 4 also cannot be spanned by any physical map contigs of three ancestors of O. sativa (O. nivara, OR105491, and OR106424). However, this gap can be spanned by a physical map contig of O. glaberrima that has a farther evolutionary distance with O. sativa and the above three mentioned O. sativa ancestors (Kovach et al. 2007; Huang et al. 2012). This result indicates that this gap-corresponding region was conserved in evolution and domestication of O. sativa. The other two gaps (the sixth physical gap on chromosome 3 and the eighth physical gap on chromosome 4) can be spanned by physical map contigs of all three wild rice except for the sixth physical gap on chromosome 3 that cannot be spanned by the physical map contig of OR106424.
Analyses of MH63 and ZS97 BESs
Simple sequence repeats in BESs:
SSRs are effective and robust molecular markers in genetic and genomic analyses. We identified 3781 SSR loci and designed 3157 pairs of primers from 3165 MH63 BESs, and 3750 SSR loci and designed 3163 pairs of primers from 3129 ZS97 BESs. After mapping these primers to the Nipponbare and 93-11 reference sequences, we identified 415 (18%) and 383 (15.9%) polymorphic SSR loci from MH63 and 409 (17.3%) and 325 (12.9%) polymorphic SSR loci from ZS97, respectively (File S7). Ten pairs of MH63 and 11 pairs of ZS97 primers were randomly selected to experimentally verify the SSR loci and all of these SSR loci were confirmed (File S8). The SSRs from BESs of japonica variety ZH11, O. glaberrima, OR105491, OR106424, and O. nivara were also identified, and the ratios of polymorphic SSRs to the Nipponbare and 93-11 reference sequences were 6.7% and 20.6%, 20.9% and 26%, 19.8% and 22.3%, 19.5% and 22.6%, and 20.7% and 22.8%, respectively (File S7). The detailed information of SSRs and primers is available from our Web site.
We compared the SSR contents and distributions among the above analyzed species. The results showed that all of the species have similar SSR densities and motif types. For example, the CCG, AG, and AT motifs were predominant in all of the species and accounted for between 27% in O. glaberrima and 32% in MH63 (Figure S1). The SSRs composed of AT or AAT motifs showed the most variable repeat length, with an average of 49.5 ± 46.1 bp and 31.6 ± 27.2 bp in MH63 and 56.9 ± 63 bp and 34 ± 24.5 bp in ZS97. Similar to the results for wild rice (Kim et al. 2008), A/T-rich dinucleotide and trinucleotide repeat motifs have longer average and variable repeat lengths in cultivated rice than other motifs.
Repeat sequences in BESs:
A total of 8,781,060 bp (37.49%) of MH63 BESs and 8,936,689 bp (39.5%) of ZS97 BESs were homologous with the repeat sequences deposited in databases. In terms of repeat category, the retroelement Gypsy/DIRS1 was most common in these two species, comprising 19.93% and 20.95% of total BESs of MH63 and ZS97, respectively (Table S2).
To search variety-specific repeat sequences and reveal their functions, the known repeat/organelle sequences-masked BESs were self-blasted and the high-score pairings (HSPs) sequences were further analyzed (Aggarwal et al. 2009; Lin et al. 2012). A total of 119 sequences (49,197 bp) in MH63 BESs and 84 sequences (34,813 bp) in ZS97 had at least six HSPs in BESs (25.62 presentation times in 100-Mb MH63 sequence and 26.52 presentation times in 100-Mb ZS97 sequence) and were identified as new repeat sequences by this method. Of these sequences, 38 from MH63 and 29 from ZS97 were considered as specific repeats in indica varieties that have less than or equal to five HSPs in the whole Nipponbare reference sequence. With this standard, a total of 12 sequences from MH63 and 15 sequences from ZS97 were regarded as MH63- and ZS97-specific repeats, respectively. Of the specific repeats in indica varieties, 22 sequences from MH63 and 18 sequences from ZS97 contained coding regions, most of which were annotated as proteins containing the NBS-LRR domain, a typical domain related to disease resistance in plants. This result indicates that repeat sequences may play an important role in the evolution process of disease-resistance-related genes. The BESs of other Oryza species from OMAP were also used to estimate the frequencies of these specific sequences in wild rice (the presentation times per every 100 Mb of BESs were used because the wild rice have different numbers of BESs and genome size), and the result indicates that most of the specific sequences have high frequencies in OR105491, OR106424, O. nivara, and O. glaberrima, which have closer distances to O. sativa (Ma and Bennetzen 2004; Kovach et al. 2007) (File S9).
Substitutions and indels between BESs and the reference sequences:
To identify small genetic variations, we compared the repeat-masked BESs of MH63 and ZS97 with their orthologous regions in Nipponbare and 93-11 genomes. A total of 46,349 SNPs, 1007 MNPs, 3806 insertions, and 5134 deletions were identified by the comparisons of MH63 and ZS97 BESs to the Nipponbare reference sequence, and 49.16% of SNPs, 44.59% of MNPs, 39.99% of insertions, and 45.62% deletions were located in Nipponbare gene regions. A total of 25,034 SNPs, 760 MNPs, 2534 insertions, and 4003 deletions were identified by the comparisons of MH63 and ZS97 BESs to the 93-11 reference sequence, and 34.35% of SNPs, 27.5% of MNPs, 29.08% of insertions, and 35.12% deletions were located in 93-11 gene regions (File S10). Compared to other variation types, the SNP, followed by the deletion, has the highest frequency of occurring in exon regions. The BES regions encompassed by the 21 pairs of primers, which were used to experimentally verify the SSR loci above, contained 14 SNPs, 5 insertions, and 8 deletions. All these variations were confirmed by sequencing of the PCR products (File S8). The masked BESs of wild species were also used to identify the genetic variations (File S10). The detailed variations are available from our Web site.
In agreement with the findings of Nasu et al. (2002) and Feltus et al. (2004), the SNPs that we found were not evenly distributed across the Nipponbare chromosomes, and many low- and high-SNP frequency regions could be observed that represent the low- and high-divergence regions (Figure 5). The average SNP frequencies of each Nipponbare chromosome were obviously different and chromosome 4 had the highest SNP frequency (Table S3). A total of 3943 SNPs and 75 MNPs were shared when the three indica varieties were compared with the Nipponbare reference sequence, and these variations were putative intersubspecies variations. In these variations, 1111 (28.18%) SNPs and 10 MNPs were located in exon regions, 949 SNPs and 22 MNPs in intron regions, and 490 SNPs and 23 MNPs in 1-kb upstream regions of genes. Of the 3943 SNPs and 75 MNPs shared by the three indica varieties, the 1668 SNP and 32 MNP loci were also shared in comparison between O. nivara and Nipponbare and the 1230 SNP and 21 MNP loci in comparison between OR106424 and Nipponbare. The results indicate that these variations may have occurred in Nipponbare or represent the genetic diversity between the ancestors of japonica and indica.
SNP distributions across the Nipponbare chromosomes. The SNPs were identified from the comparisons of repeat-masked BESs of three indica varieties: MH63, ZS97, and 93-11 with the Nipponbare reference sequence. The frequency was defined as the number of SNPs divided by the length of the repeat-masked reference sequence within a 100-kb window. The x-axis shows the chromosome position in which each unit is a megabase. The left y-axis shows SNP frequency. The right y-axis shows the percentage of repeat sequences in a 100-kb window (dotted black lines).
Resource integration and application
To facilitate functional and comparative genomics studies, we integrated physical maps (MH63, ZS97, 93-11, ZH11, O. nivara, OR105491, OR106424, and O. glaberrima) and analysis results with the information of QTL and molecular markers from GRAMENE into GBrowse. The contigs, clones, SNPs, MNPs, indels, and polymorphic SSRs were linked with molecular markers, QTL, and gene models and were aligned to the Nipponbare reference sequence. The discordantly matched clones indicated the high-divergence regions in the Nipponbare genome. To allow searching the collinear regions, we aligned the above eight physical maps to the genome sequences of Maize, Sorghum, and B. distachyon to allow searching the collinear regions and carrying out comparative genomic studies. The GBrowse and comparative physical maps are available to the public at our Web site.
Discussion
The O. sativa ssp. indica cv. MH63 line, which was bred from a cross between IR30 and Gui630, is an elite restoring line and has the following characters: strong ability and broad spectrum in restoration, resistance to rice blast, and high production yields. The indica cv. ZS97 is one of the most widely used sterile lines. Together, MH63 and ZS97 are the parents of Shanyou 63, which is the most widely cultivated hybrid rice in China (Xie et al. 2010). Based on the RIL populations of MH63 and ZS97, Xing et al. (2002) constructed a genetic linkage map and Xie et al. (2010) constructed an ultrahigh-density linkage map. With these RIL populations as materials, many important agronomic trait-related genes and QTL have been identified and analyzed, such as GS3 (Fan et al. 2006; Mao et al. 2010), Ghd7 (Xue et al. 2008), and Xa26 (Sun et al. 2004). The integrated genomic and genetic resources reported here provided a platform for identifying and analyzing other important agronomic traits possessed by the two varieties.
Genome-wide comparative analysis at the structural level would facilitate the understanding of the evolution and domestication processes of cultivated rice. In other model systems, analyses of genome-wide structural variations have proved to be an effective strategy for uncovering the evolution process and relating the genetic diversity to the given phenotypes (Zhao et al. 2004; Tuzun et al. 2005; Kidd et al. 2008; Nicholas et al. 2009; Springer et al. 2009; Zhang et al. 2011). The BAC libraries, BESs, physical maps of MH63 and ZS97, and the integrated information provided here are invaluable resources for inter- and intra-subspecies genome-wide comparative analysis in O. sativa.
The studies from OMAP shed light on the genome-wide structural variations between Nipponbare and wild rice species (Kim et al. 2007, 2008; Hurwitz et al. 2010). Previously we constructed the physical maps of ZH11 and 93-11 and compared them with the Nipponbare and 93-11 reference sequences (Lin et al. 2012; Pan et al. 2013). Here we identified BAC-sized and contig-sized structural variations in O. sativa by comparisons of the physical maps of MH63 and ZS97 to the Nipponbare reference sequence. The shared structural variations between the Nipponbare genome and the three indica varieties were putative intersubspecies genetic diversities that may have occurred before divergence of these indica varieties. Huang et al. (2012) provided the model that japonica rice was first domesticated and then indica rice was generated from the cross between japonica and wild rice. Among the above shared regions, some could be conserved between the indica group and its wild ancestors (the regions that were also shared in one or more of three relative wild species) and not interrupted by the gene introgression of early japonica, whereas others could be diverged immediately after the cross between japonica and wild rice (the regions that were not shared with wild species).
Transposable elements (TEs) are one of the major contributing genomic features that relate to an organism’s genome size and its course of evolution (Ma and Bennetzen 2004). In our analysis, the expansion regions contained more repeat content, especially the LTR element Gypsy/DIRS1, but the gene density in these regions was approximately the same as the average value of the whole genome. By skimming the gene annotations, we observed a number of LTR retrotransposon-related genes in the expansion regions and a statistically significant difference of GO term representations, such as the genes related to response to stress and abiotic stimulus. Most of the unique repeat sequences in the indica group contained the motif of NBS-LRR, which has been reported to be a typical domain related to disease resistance. These results further demonstrate that TEs, especially LTR retrotransposons, may contribute largely to gene formation, genome evolution, genetic diversity, and asymmetry in O. sativa.
Our results exhibited that the Nipponbare reference sequence contained more expansions than contractions of BAC and contig sizes relative to MH63 and 93-11 physical maps. Han and Xue (2003) and Ma and Bennetzen (2004) also found more insertions in Nipponbare than in indica variety GLA4 and proposed that the Nipponbare genome was expanded relative to indica. In the japonica group, Lin et al. (2012) reported that the Nipponbare genome is ∼7% larger than ZH11. The larger genome size of Nipponbare could be mainly due to the amplification of LTR retrotransposons, especially the Gypsy/DIRS1 elements. However, more contractions of BAC size were detected on the Nipponbare reference sequence relative to the ZS97 physical map. This could be caused by the smaller average contig size of ZS97 than MH63 and 93-11 physical maps because the structural variations were restricted to contig hit regions.
Small nucleotide variations played an important role in the rice domestication process (Fan et al. 2006; Konishi et al. 2006; Li et al. 2006; Sweeney et al. 2006; Jin et al. 2008; Shomura et al. 2008). Feltus et al. (2004) identified many SNPs and indels between 93-11 and Nipponbare. Xie et al. (2010) identified many SNPs by resequencing 238 RILs derived from a cross between MH63 and ZS97, and Huang et al. (2010) identified many SNPs by sequencing 517 rice landraces at low coverage. Our SNPs, MNPs, and indels increased the nucleotide variation densities and provided the materials for identifying and analyzing important agronomic trait-related genes in MH63 and ZS97. Similar to the distribution of SNPs between Nipponbare and 93-11 (Feltus et al. 2004), the SNPs between Nipponbare and MH63 or ZS97 were also not evenly distributed. According to the domestication model of Huang et al. (2012) as mentioned above, the low-frequency regions could be from the exchanged regions between japonica and the wild rice. The shared variations in indica varieties relative to Nipponbare could be indica-specific gene candidates, possibly for domestication and adaptation. The repeat-masked BESs of O. nivara, OR106424, and OR105491 were also used to identify variations relative to Nipponbare and 93-11, and the shared variations relative to O. sativa could be putative domestication loci.
In conclusion, we generated the BAC libraries, BESs, and physical maps for MH63 and ZS97. Compared with the reference sequences of Nipponbare and 93-11, we detected a number of polymorphic SSRs, SNPs, indels, and structural variations. The physical maps, comparative analysis results, molecular markers, and QTL from GRAMENE and gene models were integrated and aligned to the Nipponbare genome in Gbrowse. These invaluable resources provide a platform for comparative genomics, positional cloning, whole-genome sequencing, functional genomics research, and crop improvement in O. sativa. Both the comparative physical maps and the GBrowse are open to the public through our Web site. The BAC-based whole-genome sequencing of MH63 and ZS97 is now underway. Once whole-genome reference sequences become available, more genetic diversity and important agronomic traits-related genes will be revealed and understood clearly.
Acknowledgments
We thank So-Jeong Lee, Nicholas B. Sisneros, Fusheng Wei, HyeRan Kim, Yeisoo Yu, Jetty S. S. Ammiraju, and other members of the Arizona Genomics Institute for production of the BAC library, fingerprints, and BESs. This work was supported by the National Natural Science Foundation of China (grant 30971748) and the Chinese 111 Project (grant B07041).
Footnotes
Communicating editor: J. A. Birchler
- Received November 20, 2013.
- Accepted January 7, 2014.
- Copyright © 2014 by the Genetics Society of America