- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- HTML Page - index.htslp
-
All Versions of this Article:
genetics.107.070375v1
176/1/527 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Guo, W.
- Articles by Zhang, T.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Guo, W.
- Articles by Zhang, T.
Originally published as Genetics Published Articles Ahead of Print on April 3, 2007.
Genetics, Vol. 176, 527-541, May 2007, Copyright © 2007
doi:10.1534/genetics.107.070375
A Microsatellite-Based, Gene-Rich Linkage Map Reveals Genome Structure, Function and Evolution in Gossypium
Wangzhen Guo1, Caiping Cai1, Changbiao Wang, Zhiguo Han, Xianliang Song, Kai Wang, Xiaowei Niu, Cheng Wang, Keyu Lu, Ben Shi and Tianzhen Zhang2
National Key Laboratory of Crop Genetics & Germplasm Enhancement, Cotton Research Institute, Nanjing Agricultural University, Nanjing 210095, China
2 Corresponding author: National Key Laboratory of Crop Genetics & Germplasm Enhancement, Nanjing Agricultural University, Nanjing 210095, Jiangsu Province, Peoples' Republic of China.
E-mail: cotton{at}njau.edu.cn
The mapping of functional genes plays an important role in studies of genome structure, function, and evolution, as well as allowing gene cloning and marker-assisted selection to improve agriculturally important traits. Simple sequence repeats (SSRs) developed from expressed sequence tags (ESTs), ESTSSR (eSSR), can be employed as putative functional marker loci to easily tag corresponding functional genes. In this paper, 2218 eSSRs, 1554 from G. raimondii-derived and 754 from G. hirsutum-derived ESTs, were developed and used to screen polymorphisms to enhance our backbone genetic map in allotetraploid cotton. Of the 1554 G. raimondii-derived eSSRs, 744 eSSRs were able to successfully amplify polymorphisms between our two mapping parents, TM-1 and Hai7124, presenting a polymorphic rate of 47.9%. However, only a 23.9% (159/754) polymorphic rate was produced from G. hirsutum-derived eSSRs. No relationship was observed between the level of polymorphism, motif type, and tissue origin, but the polymorphism appeared to be correlated with repeat type. After integrating these new eSSRs, our enhanced genetic map consists of 1790 loci in 26 linkage groups and covers 3425.8 cM with an average intermarker distance of 1.91 cM. This microsatellite-based, gene-rich linkage map contains 71.96% functional marker loci, of which 87.11% are eSSR loci. There were 132 duplicated loci bridging 13 homeologous At/Dt chromosome pairs. Two reciprocal translocations after polyploidization between A2 and A3, and between A4 and A5, chromosomes were further confirmed. A functional analysis of 975 ESTs producing 1122 eSSR loci tagged in the map revealed that 60% had clear BLASTX hits (<1e10) to the Uniprot database and that 475 were associated mainly with genes belonging to the three major gene ontology categories of biological process, cellular component, and molecular function; many of the ESTs were associated with two or more category functions. The results presented here will provide new insights for future investigations of functional and evolutionary genomics, especially those associated with cotton fiber improvement.
COTTON (Gossypium spp.) is a major cash crop, being the world's leading natural fiber for the manufacture of textiles and edible oil. Cotton consists of at least 45 diploid and 5 allotetraploid species (FRYXELL 1992). The allotetraploid cotton species, which include two commercially important cultivated species, Gossypium hirsutum L. and G. barbadense L., were generated by A- and D-compound genomes (FRYXELL 1992). The best living models of the ancestral A- and D-genome parents are G. herbaceum and G. raimondii, respectively (ENDRIZZI et al. 1985). A-genome diploid cottons produce spinnable fibers and have been cultivated, while D-genome species produce very short and appressed fibers. Nevertheless, many quantitative trait loci (QTL) for fiber-related traits have been identified in the D-subgenome of tetraploid cotton (JIANG et al. 1998; KOHEL et al. 2001; PARK et al. 2005; PATERSON et al. 2003; SHEN et al. 2005; ULLOA et al. 2005), suggesting that the D-genome contains important genes or regulators of fiber morphogenesis and fiber properties.
In recent years, the goal of cotton breeding has changed from enhancing yield to improving fiber quality with the acceleration of spinning speeds. Therefore, systematically elucidating the molecular mechanisms of cotton fiber development and regulation and identifying the key genes or QTL affecting fiber quality will be of great significance to improving cotton fiber quality. A high-density molecular map, especially one that includes functional markers associated with fiber genes or the fiber transcriptome, will be very important in allowing direct tagging of target genes associated with fiber quality. A genetic map will supply molecular markers linked closely with fiber quality QTL and allow the study of interactions among functional genes. To date, several genetic maps of cotton genomes have been constructed using diverse DNA molecular markers and mapping populations (ULLOA and MEREDITH 2000; ULLOA et al. 2002; ZHANG et al. 2002; LACAPE et al. 2003; MEI et al. 2004; RONG et al. 2004; ZHANG et al. 2005; FRELICHOWSKI et al. 2006). The most saturated tetraploid cotton map is from RONG et al. (2004), which is composed of 2584 loci at 1.72-cM intervals in 26 linkage groups. However, these tagged loci were mostly from restriction fragment length polymorphism (RFLP) probes, which are not practical for molecular marker-assisted selection breeding.
Microsatellites or simple sequence repeats (SSRs) are tandem repeats of short (1 to 6 bp) DNA sequences. SSRs exist throughout the whole genome of an organism in both noncoding and coding regions. The distinguishing features of SSR loci include their high information content, codominant inheritance pattern, even distribution along chromosomes, reproducibility, and locus specificity (KASHI et al. 1997; RÖDER et al. 1998a,b). In the past, genomic SSRs (gSSRs) were developed on the basis of isolating and sequencing clones containing putative SSR regions in cotton, together with designing and testing flanking primers. Their development is typically costly, time consuming, and labor intensive. However, as a plethora of DNA sequences have been deposited in online databases, they can now be easily downloaded from GenBank and surveyed for identification of SSRs. Expressed sequence tag (EST) derived-SSRs (eSSRs) have some intrinsic advantages over gSSRs because they are obtained easily and inexpensively by electronic sorting and are present in expressed regions of the genome. The usefulness of eSSRs also lies in their expected transferability because the primers are designed on the basis of the more highly conserved coding regions of the genome (VARSHNEY et al. 2005). In recent years, great efforts have been made to develop genome SSRs (REDDY et al. 2001; NGUYEN et al. 2004; FRELICHOUSKI et al. 2006) (http://www.resgen.com) and ESTSSRs (SAHA et al. 2003; QURESHI et al. 2004; HAN et al. 2004, 2006; TALIERCIO et al. 2006) for cotton, and a web page (http://www.genome.clemson.edu/projects/cotton) for cotton microsatellite database (CMD) involving all of the available cotton SSR information has been constructed (BLENDA et al. 2006). These SSR markers have been widely used in cotton genetic mapping (REDDY et al. 2001; ZHANG et al. 2002; HAN et al. 2004; NGUYEN et al. 2004; ABDURAKHMONOV et al. 2005; SONG et al. 2005; PARK et al. 2005; FRELICHOUSKI et al. 2006; HAN et al. 2006). Recently, more cotton ESTs, mostly from different fiber developmental stages, were publicly released in GenBank (http://www.ncbi.nlm.nih.gov/dbEST/dbEST_summary.html), and a global assembly of cotton ESTs from 30 cDNA libraries with their Uniport BLASTX hits, gene ontology annotation, and Pfam analysis results have been made freely accessible (UDALL et al. 2006). These important data provide new valuable resources for developing functional markers and performing functional analysis on the basis of mapping information.
In our laboratory, a polymerase chain reaction (PCR)-based linkage map was constructed and enhanced using a [(TM-1 x Hai7124) x TM-1] interspecific BC1 mapping population in allotetraploid cotton (HAN et al. 2004, 2006; SONG et al. 2005). In this study, 2218 new eSSR markers from EST sequences in G. raimondii (C. B. WANG et al. 2006) and G. hirsutum were developed and used to screen polymorphisms between the mapping parents TM-1 and Hai7124. The results enabled us to integrate 816 polymorphic eSSR marker loci into our backbone genetic map. This mostly microsatellite-based, gene-rich, saturated cotton linkage map is helpful for improving our understanding of structural and evolutionary genomics and, ultimately, for mining new genes associated with fiber development to aid in the molecular breeding of fiber-related genes.
Development of eSSR markers:
New eSSR primer pairs (2218) (designated as "NAU" for Nanjing Agricultural University) were developed using 58,906 nonredundant EST sequences from G. raimondii, 12,463 from G. hirsutum acc. TM-1, and 11,692 from G. hirsutum cv. Xuzhou142, which are publicly available in GenBank (http://www.ncbi.nlm.nih.gov/dbEST/dbEST_summary.html). Among the 1554 G. raimondii-derived eSSRs, 747 were derived from 719 EST sequences from the first true leaf library and 807 were derived from 778 EST sequences from a 3- to 3-day post-anthesis (dpa) ovule cDNA library (C. B. WANG et al. 2006). Of the 664 G. hirsutum-derived eSSRs, 454 were derived from 454 EST sequences from a 3- to 3-dpa TM-1 ovule cDNA library and 210 were derived from 295 EST sequences from a 5- to 10-dpa Xuzhou142 fiber cDNA library. The search standards for different repeat motifs are as described in C. B. WANG et al. (2006). The program Primer 3.0 (http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi) was used in eSSR primer design. The primers were synthesized by Invitrogen (Shanghai, China). These newly developed eSSR primer sequences, Genbank accession number, repeat motif and number, expected product size, and polymorphic information between TM-1 and Hai7124 are presented in supplemental Table S1 at http://www.genetics.org/supplemental/. Other SSR primer information used in the article can be easily downloaded at http://www.mainlab.clemson.edu/cmd/projects.
Plant material, DNA extraction, PCR amplification, and electrophoresis:
The mapping population was composed of 138 BC1 individuals that were generated from the cross [(TM-1 x Hai7124) x TM-1] (SONG et al. 2005). TM-1 is a genetic standard line of Upland cotton and Hai7124 is a commercial Sea island Verticillium-resistant cultivar.Cotton genomic DNA was isolated from the two parents and each BC1 individual as described by PATERSON et al. (1993). SSRPCR amplifications were performed using a Peltier Thermal Cycler-225 (MJ Research) and electrophoresis of the products was performed as described by ZHANG et al. (2000, 2002).
Construction of genetic linkage map:
All 2218 eSSR primer pairs were first used to screen polymorphisms between TM-1 and Hai7124. Markers found to be polymorphic were then used to survey 138 individuals of the BC1 mapping population. The maternal (TM-1) genotype and the heterozygous (F1) genotype were scored as 1 and 3 in the BC1 population, respectively. Missing data were noted as "". The
2 test for goodness of fit was used to assess the Mendelian 1:1 inheritance in the BC1 segregating population.
JoinMap 3.0 (VAN OOIJEN and VOORRIPS 2001) was employed to construct the genetic linkage map. The Kosambi mapping function (KOSAMBI 1944) was used to convert recombination frequency to genetic map distance (centimorgan, cM). All linkage groups were determined at log-of-odds (LOD) scores
6. Linkage groups were assigned to chromosomes on the basis of our backbone linkage maps (HAN et al. 2004, 2006) and BACFISH [fluorescence in situ hybridization (FISH) using bacterial artificial chromosome (BAC) clones as probes] results (K. WANG et al. 2006). So we used our published chromosome naming system (K. WANG et al. 2006) in which the A-subgenome chromosomes are identified as A1 through A13, and the homeologous D-subgenome chromosomes are designated D1 through D13 since chromosome homeology has been established following Cotton Genetic Nomenclature in the United States (KOHEL 1973). Aneuploid tests using a series of cytologically identified monotelodisomic (25II + Ii) and monosomic (25II + I) chromosome substitution aneuploid lines (F1) available for those newly anchored markers on the distal regions were used to confirm linkage groups. These aneuploid hybrids were produced by crossing aneuploids with a TM-1 background with G. barbadense acc. 3-79.
Putative gene ontology and metabolic pathway analysis:
The mapped markers were categorized on the basis of their homologous gene function. A putative gene ontology and a high-level functional category of mapped markers were obtained on the basis of the UniProt Gene Ontology database (CAMON et al. 2004). A perl script that allows these ESTs to be compared with the UniProt protein database (http://www.pir.uniprot.org) has been written. A list of gene associations between the UniProt database entries and their gene annotations are maintained by the Gene Ontology Consortium (http://www.geneontology.org/GO.current.annotations.shtml). The gene ontology numbers for the best homologous hits were used to find molecular function, cellular component, and biological process ontology for these sequences. Furthermore, BLAST2GO (http://www.blast2go.de) offers metabolic pathway analysis using the KEGG (Kyoto Encyclopedia of Genes and Genomes) database (http://www.genome.jp/kegg/) (KANEHISA and GOTO 2000).High polymorphism amplified by G.raimondii-derived eSSRs:
A total of 58,906 nonredundant EST sequences in G. raimondii from the NCBI were selected and characterized for eSSRs. A total of 2620 microsatellite sequences containing 2818 eSSRs with motifs ranging from 1 to 6 bp were searched, with trinucleotide repeats being most abundant (38.31%), followed by dinucleotide repeats (24.09%) (C. B. WANG et al. 2006). From these ESTs containing SSRs in G. raimondii, 1554 eSSR primer pairs were developed and used to screen the interspecific polymorphisms between the two mapping parents, G. hirsutum acc. TM-1 and G. barbadense cv. Hai7124. Among them, 744 of the primer pairs amplified polymorphisms and yielded a 47.9% polymorphic rate, which is more than twice as high as the 18.2% rate from G. hirsutum and the 23.3% rate from G. arboreum-derived eSSRs reported previously (HAN et al. 2004, 2006). The highly polymorphic G. raimondii-derived primers can supply the more portable PCR markers needed for saturated genetic map construction and marker-assisted improvement of the world's leading fiber crop.To explore why these G. raimondii-derived eSSRs produce such a high polymorphism rate (47.9%) in tetraploid cotton, the relationships between polymorphism, repeat, motif type, and tissue origin were further investigated. Of our 1554 newly developed eSSR primer pairs, 763, 361, 138, 89, and 66 were for trinucleotide, dinucleotide, tetranucleotide, hexanucleotide, and pentanucleotide repeats, respectively. Their polymorphic rates were as high as 58.43% (52/89) for hexanucleotide repeats, followed by 52.17% (72/138) for tetranucleotide, 50.69% (183/361) for dinucleotide, 44.30% (338/763) for trinucleotide, and 37.88% (25/66) for pentanucleotide repeat motif eSSRs. Furthermore, a polymorphic rate as high as 54.01% (74/137) was also observed for the compound motif eSSRs. Thus the polymorphic rate from tetranucleotide and dinucleotide repeat types was slightly higher than that from trinucleotide repeat types at 51.10 and 45.77%, respectively.
Among all identified motif types, A/T occurred at the highest frequency (18.37%), followed by AT/TA (14.83%), AAG/TTC (9.62%), and AG/TC (6.46%) (C. B. WANG et al. 2006). With the exception of the A/T motif, ESTs containing di- to hexa- SSR repeat motifs were used to design ESTSSR primer pairs. Our comparison of different motif types between the polymorphic and monomorphic eSSRs revealed no relationship between polymorphism and motif type. AT/TA was the most abundant motif with a polymorphic frequency of 17.91%, followed by the motifs AAG/TTC (10.59%) and AG/TC (7.46%) (Figure 1). At the same time, the most repeated motif types were also AT/TA followed by AAG/TTC in monomorphic eSSR. Therefore, AT/TA and AAG/TTC appeared to be the most abundant repeat motif types in G. raimondii ESTs.
|
No relationship was observed between polymorphism and tissue origins. Of 744 polymorphic eSSRs, 387 corresponded to ESTs from the 3- to 3-dpa ovule cDNA library and 357 were from the first true leaf cDNA library. However, the polymorphic rates were similar, 47.9 and 47.8%, respectively, for these two origins between TM-1 and Hai7124.
Construction of a microsatellite-based, gene-rich linkage map in tetraploid cotton:
Both the 1554 G. raimondii-derived and 664 G. hirsutum-derived eSSRs (supplemental Table S1 at http://www.genetics.org/supplemental/) were employed to screen interspecific polymorphisms between G. hirsutum L. acc. TM-1 and G. barbadense L. cv. Hai7124. Among them, 744 and 159 amplified polymorphisms and yielded polymorphic rates of 47.9 and 23.9%, respectively. Of these polymorphic markers, 604 were codominant, 158 were dominant in Hai7124, and 141 were dominant in TM-1. As TM-1 was used as the recurrent parent in the backcrossing population, the 141 dominant loci from TM-1 could not be used to construct genetic maps. Therefore, 762 polymorphic eSSRs were used to enhance our genetic map. From them, a total of 885 discrete loci were generated, with 650 SSRs amplifying a single locus, 101 SSRs amplifying two loci, and 11 SSRs amplifying three loci.The newly constructed genetic map using Joinmap software is composed of 1790 loci including 1122 eSSR loci, 495 genomic gSSR loci, 121 SRAP marker loci, 7 loci from end-sequencing data of BAC clones and 45 genes (data not shown) in 26 linkage groups and cover 3425.8 cM with an average intermarker distance of 1.91 cM (Figure 2). Of these, 883 loci were integrated into our previously published map containing 907 loci and spanning 5060 cM with an average intermarker distance of 5.57 cM using Mapmaker software (HAN et al. 2006). The enhanced linkage groups account for 820 loci (1675.6 cM) with a 2.04-cM interval distance in the A-subgenomes and 970 loci (1750.2 cM) with a 1.80-cM interval distance in the D-subgenomes. The largest gap between two adjacent loci is 28.0 cM (on chromosome D1). The number of intervals remaining in the tetraploid map >10 cM was reduced to 33. Among these, 20 were in the At genome and 13 were in the Dt genome (Table 1).
|
|
Notably, the D- and A-genome species-derived eSSRs were preferentially tagged in their corresponding homologous subgenome in the tetraploid map. However, the polymorphic eSSRs from G. hirsutum were evenly distributed in the At/Dt subgenome and chromosome (Table 2). Among 660 loci amplified by G. raimondii-derived eSSR, 257 loci (141 from fiber and 116 from leaf ESTs) were assigned to A-subgenome chromosomes and 403 loci (206 from fiber and 197 from leaf ESTs) were assigned to D-subgenome chromosomes, with a ratio of tagged loci of At:Dt = 1:1.6. Some linkage groups were greatly saturated due to the addition of loci identified using G. raimondii-derived eSSR. For example, 53 new loci were added to D5, 40 to D12, 36 to D8, 35 to A5, and 34 to both D2 and D7. The polymorphic eSSR from G. arboreum mapped 95 loci to the A-subgenome and 64 loci to the D-subgenome, with a ratio of tagged loci of At:Dt = 1.5:1. However, the polymorphic eSSR from G. hirsutum contributed 150 loci to the A-subgenome and 153 loci to the D-subgenome, with a ratio of tagged loci of At:Dt = 1:1. A similar phenomenon was also observed in the mapping of G. arboreum and G. hirsutum-derived eSSR (HAN et al. 2004, 2006).
|
More duplicated loci had been integrated into the 13 homeologous chromosome pairs in tetraploid cotton. In this map, the duplicated loci identified by 132 SSR primer pairs sufficiently bridged 13 expected homeologous At/Dt chromosomes. Ten duplicated loci were in A1 and D1 homeologous chromosomes, 2 in A2 and D2, 7 in A3 and D3, 5 in A4 and D4, 14 in A5 and D5, 14 in A6 and D6, 9 in A7 and D7, 16 in A8 and D8, 12 in A9 and D9, 7 in A10 and D10, 18 in A11 and D11, 9 in A12 and D12, and 9 in A13 and D13 (Figure 2 and supplemental Table S2 at http://www.genetics.org/supplemental/).
Two post-polyploidization reciprocal translocations of A2/A3 and A4/A5 in the At subgenome were also further confirmed by several homologous loci, such as NAU2994, NAU3875, BNL3590, and JESPR101 in A2 and D3; NAU1070 and NAU3439 in A3 and D2; NAU569, NAU667, NAU2376, NAU3824, BNL4030, and JESPR65 in A5 and D4; and NAU3649 in A4 and D5 (Figure 2).
Of 885 newly produced discrete loci, 780 (87.7%) loci fit and 105 (12.3%) deviated from Mendelian 1:1 inheritance. Of 105 deviated segregation loci, 50 favored an excess of heterozygotes and 55 favored an excess of homozygotes. Notably, the most distorted segregated loci integrated in the map were clustered to several chromosomal regions. Two distorted intervals were found in the A7 and D7 homeologous chromosome pairs. Seventeen consecutive loci were spanning 15.4 cM in A7 and 16 loci spanning 13.2 cM in D7, with each located near the distal region of their corresponding chromosome. All were skewed toward the heterozygotes (Figure 2), which indicated that Hai7124 alleles were preferentially transmitted in these intervals.
We mapped 1122 eSSR loci homologous to ESTs, 121 SRAP loci with target coding sequences in the genome (LI and QUIROS 2001), and 45 genes in the presently revised map containing 1790 loci. The newly developed map contained 71.96% functional marker loci, in which 87.11% were eSSR loci. Furthermore, 1122 eSSR loci were identified mainly by 993 eSSRs developed from 975 ESTs belonging to the transcriptomes of different cDNA libraries from G. arboreum, G. raimondii, and G. hirsutum. The chromosome tagging information of these ESTs is shown in Table 2. In this tetraploid map, 502 eSSR loci were tagged in the A-subgenome, with 74, 55, and 45 loci on the A5, A11, and A8 chromosomes, respectively, and 620 eSSR loci were tagged in the D-subgenome, with 83, 63, and 60 loci on the D5, D12, and D8 chromosomes, respectively. Because most ESTs homologous to the mapped eSSR loci were from fiber ESTs, further exploring the relationship between these EST loci and fiber developmental genes and their potential usages in QTL mapping of fiber qualities may prove to be interesting.
Putative functions of the products of ESTs containing SSR:
The revised map contains 1122 mapped markers homologous to 975 ESTs from the A-, AD-, and D-transcriptomes. To explore the potential utility of the eSSR markers for use in the research of cotton genome structure and functional distribution, 975 ESTs were used to search for similar protein sequences in the Uniprot database (BLASTX). The functional information and chromosome location of 1122 eSSR loci homologous to ESTs is presented online as supplemental Table S2 (http://www.genetics.org/supplemental/). Using the best hits found by BLASTX (<1e10), an inferred putative gene ontology annotation was found for nearly 50% of the tagged sequences. Of the 475 known functional ESTs, 247 were associated with genes belonging to biological process, 324 with cellular component, and 290 with molecular function. Of the 247 known biological-process annotations, two main types were associated with physiological and cellular processes, 81.4 and 76.1%, respectively. Of the 324 annotations belonging to the cellular-component function, 93.2% were associated with cells and 83.3% with organelles. Of the 290 ESTs that belong to the molecular-function category, the functions of the largest portions were catalysis (41.7%) and binding (40.0%) (Figure 3). Many ESTs were elucidated functions in two or more categories, with 109 associated with the three major gene ontology categories, 47 with biological processes and cellular components, 73 with biological processes and molecular functions, and 48 with cellular components and molecular functions. The number of ESTs with only one known function of a biological process, cellular component, or molecular function was 18, 120, and 60, respectively.
|
Further investigation of the chromosomal distribution of ESTs with known molecular function showed that most loci of known function were found in the A5 and D5 homeologous groups. Many ESTs in the two linkage groups appeared to be involved with transcription, including transcription factor activity, DNA binding, RNA binding, ethylene-responsive element binding, GTP binding, ATP binding, and calmodulin binding. Moreover, some important functional genes associated with fiber development such as E6 and the fiber proteins Fb37 and Fb28, as well as cellulose synthase, were also tagged in the pair of homeologous groups.
Next, 475 known functional ESTs were searched using the KEGG database to determine whether they had a role in metabolism, which showed that only 39 belonged to a known metabolic pathway (supplemental Table S3 and supplemental Figure S1 at http://www.genetics.org/supplemental/). A representative sample of the major metabolic pathways consisted of 21 ESTs located on 13 chromosomes that were responsible for carbohydrate/energy metabolism, 11 ESTs located on 6 chromosomes responsible for amino acid metabolism, 6 ESTs located on 5 chromosomes responsible for lipid metabolism, 6 ESTs located on 4 chromosomes responsible for folding, sorting, and degradation, and 3 ESTs located on 3 chromosomes responsible for signal transduction. Interestingly, 5 ESTs responsible for carbohydrate/energy metabolism, 4 for amino acid metabolism, and 2 for folding, sorting, and degradation were simultaneously found on the A7 chromosome.
Many studies have revealed higher polymorphism levels in genomic SSR markers than in transcribed regions of DNA, i.e., eSSRs. Using 20 eSSRs and 22 gSSRs to genotype the A- and B-genomes of wheat, the eSSRs produced a 25% polymorphism rate whereas the gSSRs produced a 53% polymorphism rate (EUJAYL et al. 2002). This is also true in Gossypium, where polymorphism between G. hirsutum and G. barbadense is as high as 49 and 56%, respectively, for gSSRs markers (REDDY et al. 2001; NGUYEN et al. 2004). One exception is that only a 21% polymorphism rate was observed between G. hirsutum and G. barbadense, which was identified by developing BAC-ends SSR markers (FRELICHOWSKI et al. 2006). A roughly similar polymorphism rate was observed between G. hirsutum and G. barbadense for eSSR markers. HAN et al. (2004, 2006) identified a 23.3 and 18.2% polymorphic rate between TM-1 and Hai7124, respectively, from G. arboreum and G. hirsutum-derived eSSRs. Similarly, PARK et al. (2005) also detected a 19.8% polymorphic rate using G. arboreum-derived eSSRs. In this study, a 23.9% polymorphic rate between TM-1 and Hai7124 was identified from eSSRs developed from two cDNA libraries in G. hirsutum. However, a nearly 48% polymorphism rate between G. hirsutum and G. barbadense was observed using G. raimondii-derived eSSR, which is much higher than that observed by previous public data. The set of G. raimondii-derived primers with a high polymorphic frequency between G. hirsutum and G. barbadense have been used to construct the saturated genetic map; meanwhile, they are also useful for elucidating the role of the D-genome in the origin and evolution of tetraploid cotton species.
Why do G. raimondii-derived eSSRs have much higher polymorphism than that derived from G. arboreum and G. hirsutum species? From an evolutionary standpoint, the AD-tetraploid species (2n = 4x = 52) originates from an interspecific hybridization event between A- and D-genome diploid Gossypium species. The A- and D-genome diploids are estimated to have diverged from a common ancestor between 6 and 11 million years ago (WENDEL 1989). G. arboreum are Old World cultivated cotton species and G. hirsutum are New World cultivated cotton species, whereas G. raimondii is wild and cannot produce spinnable fiber. Studies have shown that the expression of duplicated genes in tetraploid species at the transcriptional level may have three fates: (1) silencing of one of the duplicated copies (WENDEL 2000; ADAMS et al. 2003); (2) molecular interactions mediated by concerted evolutionary processes leading to a rapid sequence conversion of homologous loci, homology-specific sequence elimination, and extensive genomic rearrangements (WENDEL et al. 1995; ADAMS et al. 2003); or (3) independent evolution of the duplicated copies in allopolyploids (CRONN et al. 1999; SMALL and WENDEL 2000).
Incorporating EST data from different cotton species and tissues in molecular genetic studies can allow a preliminary analysis of phylogenetic evolution. The eSSR markers employed in this study were developed in our laboratory from seven libraries: GA_Ea (G. arboreum developing fibers; 710 dpa), GH_7235 (G. hirsutum acc. 7235 developing fibers; 525 dpa), GH_ Xuzhou 142 (G. hirsutum cv. Xuzhou 142; 0- to 5-dpa ovules, 510 dpa, and 3- to 22-dpa fibers); GH_TMO (G. hirsutum acc. TM-1; 3- to 3-dpa immature ovules); GR_Ea (G. raimondii whole seedlings with the first true leaves), and GR_Eb (G. raimondii bolls; 3-dpa flower buds to +3-dpa bolls). All SSR searches from the above EST sequences used the same cut-off values for primer design.
A comparison of the polymorphism rates between G. hirsutum and G. barbadense derived from different genomes, yielded the highest polymorphism rates (47.9% from fiber development tissue and 47.8% from the first true leaf tissue) for G. raimondii-derived eSSRs. Even when the eSSRs were all from fiber developmental tissue of different cotton species, the polymorphism rate from D-genome species was also higher than that from A- and AD-genome cultivated species. Further comparison of eSSR distributions among A-, AD-, and D-transcriptomes showed that the three genome species were similar in their abundance of common motifs. The frequency of trinucleotide and hexanucleotide motif repeats were most common at 74.55% for AD-, 68.43% for A-, and 55.57% for D-genome species, followed by the dinucleotide motif at 31.42% for D-, 18.68% for A-, and 17.15% for AD-genome species.
The repeat frequencies of tetranucleotide and petanucleotide were at a low level in the three species (Figure 4). The most common motif in the three genome species was AT/TA for dinucleotide, AAG/TTC for trinucleotide, and AAAN/TTTN, AAAAN/TTTTN, and AAAAAN/TTTTTN for tetranucleotide, petanucleotide, and hexanucleotide, respectively (Figure 5). However, differences in the type of abundant motifs were observed in the three genome species; i.e., D-genome species had fewer trinucleotide and hexanucleotide motif repeats and more dinucleotide repeats than A- and AD-genome species. Because trinucleotide and hexanucleotide motifs could stably reside in the coding region and suppress frameshift mutations (VARSHNEY et al. 2005) and AT dimeric repeats have been found in the untranslated region of many species (MORGANTE et al. 2002; SOOK et al. 2005), the differences in the type of abundant motifs may imply that different transcriptomes can potentially function as factors regulating gene expression in their individual genome species, leading to different expression characteristics of A- and D-subgenomes in tetraploid genomes. The different transcribed sequences from G. raimondii might undergo relaxed selection in their corresponding paralogous regions, evolve into silent or nonsynonymous sites, or code for genes unrelated to fiber development. This hypothesis is strongly supported by previous reports. A-genome diploid and AD-tetraploid cottons each produce spinnable fibers (FRYXELL 1979). Many AA-subgenome ESTs associated with fiber development are selectively enriched in G. hirsutum (YANG et al. 2006). Purifying transcripts from the diploid A-genome and tetraploid A-subgenome revealed that tagging efficiencies in a cultivated tetraploid genetic map were relatively low. On the other hand, the D-subgenome harbored greater nucleotide and allelic diversity than did the A-subgenome in both species of G. hirsutum and G. barbadense on the basis of a comparison of duplicated paralogous Adh loci (SMALL and WENDEL 2002; SMALL et al. 1999), which also suggested that differential evolutionary pressures act on the two D-subgenomes. On the basis of the above analysis, the transcriptional products from diploid D-genome species coupled with less selection pressure in the tetraploid genome could lead to a higher frequency of recombination in their paralogous sites in allotetraploid cotton. Using EST information from diploid G. raimondii combined with expression and gene diversity studies in allotetraploids might provide new information for understanding the evolution of allotetraploid species.
|
|
The tagging results for the A-, D-, and AD-genome-derived eSSRs also showed that eSSRs derived from A-genome species were preferentially tagged in the A-subgenome, and that eSSRs from D-genome species were preferentially tagged in the D-subgenome in the tetraploid linkage map. However, eSSRs derived from AD-genome species were evenly tagged in the A- and D-subgenomes of the tetraploid linkage map. These findings indicated that there were different rates of gene evolution among A- and D-genome species even though the A- and D-genome types have a common genetic origin with Gossypium. However, in AD-genome species formed by a polyploidization event of the diploid A- and D-genome, duplicated functional genes from the A- and D-subgenome were independently expressed at the same abundance owing to the At:Dt = 1:1 tagging efficiency of allotetraploid transcriptional products from fiber development stages. Even though D-genome species could not produce spinnable fiber, many studies have suggested that there are important genes that are most likely regulators of fiber morphogenesis and fiber properties in the D-subgenome in cultivated tetraploid species (JIANG et al. 1998; KOHEL et al. 2001; PATERSON et al. 2003; PARK et al. 2005; SHEN et al. 2005; ULLOA et al. 2005). Data obtained from the tagging of A-, AD-, and D-genome markers may provide new insights into polyploidy evolution and provide a foundation for elucidating the role of the D-genome in tetraploid cotton species in the future.
ABDURAKHMONOV, I. Y., A. A. ABDULLAEV, S. SAHA, Z. T. BURIEV, D. ARSLANOV et al., 2005 Simple sequence repeat marker associated with a natural leaf defoliation trait in tetraploid cotton. J. Hered. 96: 644653.
ADAMS, K. L., R. CRONN, R. PERCIFIELD and J. F. WENDEL, 2003 Genes duplicated by polyploidy show unequal contributions to the transcriptome and organ-specific reciprocal silencing. Proc. Natl. Acad. Sci. USA 100: 46494654.
ARPAT, A. B., M. WAUGH, J. P. SULLIVAN, M. GONZALES, D. FRISCH et al., 2004 Functional genomics of cell elongation in developing cotton fibers. Plant Mol. Biol. 54: 911929.[CrossRef][Medline]
AYERS, N. M., A. M. MCCLUNG, P. D. LARKIN, H. F. J. BLIGH, C. A. JONES et al., 1997 Microsatellite and single nucleotide polymorphism differentiate apparent amylase classes in an extended pedigree of US rice germplasm. Theor. Appl. Genet. 94: 773781.[CrossRef]
BLENDA, A., J. SCHEFFLER, B. SCHEFFLER, M. PALMER, J. M. LACAPE et al., 2006 CMD: A Cotton Microsatellite Database resource for Gossypium genomics. BMC Bioinformatics 7: 132.[CrossRef][Medline]
CAMON, E., M. MAGRANE, D. BARRELL, V. LEE, E. DIMMER et al., 2004 The gene Ontology Annotation (GOA) database: sharing knowledge in uniprot with gene ontology. Nucleic Acids Res. 32: D262D266.
CRONN, R. C., R. L. SMALL and J. F. WENDEL, 1999 Duplicated genes evolve independently after polyploid formation in cotton. Proc. Natl. Acad. Sci. USA 96: 1440614411.
ENDRIZZI, J. E., E. L. TURCOTTE and R. J. KOHEL, 1985 Genetics, cytology, and evolution of Gossypium. Adv. Genet. 23: 271375.[CrossRef]
EUJAYL, I., M. E. SORRELLS, M. BAUM and P. WOLTERS, 2002 Isolation of EST-derived microsatellite markers for genotyping the A and B-genomes of wheat. Theor. Appl. Genet. 104: 399407.[CrossRef][Medline]
FRELICHOWSKI, JR., J. E., M. B. PALMER, D. MAIN, J. P. TOMKINS, R. G. CANTRELL et al., 2006 Cotton genome mapping with new microsatellites from Acala Maxxa BAC-ends. Mol. Gen. Genomics 275: 479491.[CrossRef][Medline]
FRYXELL, P. A., 1979 The Natural History of the Cotton Tribe, pp. 3747. Texas A&M University Press, College Station, TX.
FRYXELL, P. A., 1992 A revised taxonomic interpretation of Gossypium L. (Malvaceae). Rheedea 2: 108165.
HAN, Z. G., W. Z. GUO, X. L. SONG and T. Z. ZHANG, 2004 Genetic mapping of EST-derived microsatellites from the diploid Gossypium arboreum in allotetraploid cotton. Mol. Genet. Genomics 272: 308327.[CrossRef][Medline]
HAN, Z. G., C. B. WANG, X. L. SONG, W. Z. GUO, J. Y. GUO et al., 2006 Characteristics, development and mapping of Gossypium hirsutum derived EST-SSR in allotetraploid cotton. Theor. Appl. Genet. 112: 430439.[CrossRef][Medline]
JIANG, C. X., R. J. WRIGHT, K. M. EL-ZIK and A. H. PATERSON, 1998 Polyploid formation created unique avenues for response to selection in Gossypium (Cotton). Proc. Natl. Acad. Sci. USA 95: 44194424.
KANEHISA, M., and S. GOTO, 2000 KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28: 2730.
KASHI, Y., D. KING and M. SOLLER, 1997 Simple sequence repeats as a source of quantitative genetic variation. Trends Genet. 13: 7478.[CrossRef][Medline]
KOHEL, R. J., 1973 Genentic nomenclature in cotton. J. Hered. 64: 291295.
KOHEL, R. J., J. YU, Y-H. PARK and G. R. LAZO, 2001 Molecular mapping and characterization of traits controlling fiber quality in cotton. Euphytica 121: 163172.[CrossRef]
KOSAMBI, D. D., 1944 The estimation of map distance from recombination values. Ann. Eugen. 12: 172175.
LACAPE, J. M., T. B. NGUYEN, S. THIBIVILLIERS, B. BOJINOV, B. COURTOIS et al., 2003 A combined RFLP-SSR-AFLP map of tetraploid cotton based on a Gossypium hirsutum x Gossypium barbadense backcross population. Genome 46: 612626.[Medline]
LI, G., and C. F. QUIROS, 2001 Sequence-related amplified polymorphism (SRAP), a new marker system based on a simple PCR reaction: its application to mapping and gene tagging in Brassica. Theor. Appl. Genet. 103: 455461.[CrossRef]
MEI, M., N. H. SYED, W. GAO, P. M. THAXTON, C. W. SMITH et al., 2004 Genetic mapping and QTL analysis of fiber-related traits in cotton (Gossypium). Theor. Appl. Genet. 108: 280291.[CrossRef][Medline]
MORGANTE, M., M. HANAFEY and W. POWELL, 2002 Microsatellites are preferentially associated with non-repetitive DNA in plant genomes. Nat. Genet. 30: 194200.[CrossRef][Medline]
NGUYEN, T. B., M. GIBAND, P. BROTTIER, A. M. RISTERUCCI and J. M. LACAPE, 2004 Wide coverage of the tetraploid cotton genome using newly developed microsatellite markers. Theor. Appl. Genet. 109: 167175.[CrossRef][Medline]
PARK, Y. H., M. S. ALABADY, M. ULLOA, B. SICKLER, T. A. WILKINS et al., 2005 Genetic mapping of new cotton fiber loci using EST-derived microsatellites in an interspecific recombinant inbred line cotton population. Mol. Gen. Genomics 274: 428441.[CrossRef][Medline]
PATERSON, A. H., C. BRUBAKER and J. F. WENDEL, 1993 A rapid method for extraction of cotton (Gossypium spp.) genomic DNA suitable for RFLP or PCR analysis. Plant Mol. Biol. Rep. 11: 122127.
PATERSON, A. H., Y. SARANGA, M. MENZ, C. JIANG and R. J. WRIGHT, 2003 QTL analysis of genotype x environmental interactions affecting cotton fiber quality. Theor. Appl. Genet. 106: 384396.[Medline]
QURESHI, S. N., S. SAHA, R. V. KANTATY and J. N. JENKINS, 2004 EST-SSR: a new class of genetic markers in cotton. J. Cotton Sci. 8: 112123.
REDDY, O. U. K., A. E. PEPPER, I. Y. ABDURAKHMONOV, S. SAHA, J. N. JENKINS et al., 2001 The identification of dinucleotide and trinucleotide microsatellite repeat loci from cotton G. hirsutum L. J. Cotton Sci. 5: 103113.
RÖDER, M. S., V. KORZUN, K. WENDEHAKE, J. PLASCHKE, M. H. TIXIER et al., 1998a A microsatellite map of wheat. Genetics 149: 20072023.
RÖDER, M. S., V. KORZUN, B. S. GILL and M. W. GANAL, 1998b The physical mapping of microsatellite markers in wheat. Genome 41: 278283.
RONG, J. K., C. ABBEY, J. E. BOWERS, C. L. BRUBAKER, C. CHANG et al., 2004 A 3347-locus genetic recombination map of sequence-tagged sites reveals features of genome organization, transmission and evolution of cotton (Gossypium). Genetics 166: 389417.
SAHA, S., M. KARACA, J. N. JENKINS, A. E. ZIPF, U. K. REDDY et al., 2003 Simple sequence repeats as useful resources to study transcribed genes of cotton. Euphytica 130: 355364.[CrossRef]
SHEN, X. L., W. Z. GUO, X. F. ZHU, Y. L. YUAN, J. Z. YU et al., 2005 Molecular mapping of QTLs for qualities in three diverse lines in Upland cotton using SSR markers. Mol. Breed. 15: 169181.[CrossRef]
SMALL, R. L., and J. F. WENDEL, 2000 Phylogeny, duplication, and intraspecific variation of Adh sequences in new world diploid cotton (Gossypium L., Malvaceae). Mol. Phyl. Evol. 16: 7384.[CrossRef][Medline]
SMALL, R. L., and J. F. WENDEL, 2002 Differential evolutionary dynamics of duplicated paralogous Adh loci in allotetraploid cotton (Gossypium). Mol. Biol. Evol. 19: 597607.
SMALL, R. L., J. A. RYBURN and J. F. WENDEL, 1999 Low levels of nucleotide diversity at homeologous Adh loci in allotetraploid cotton (Gossypium L.). Mol. Biol. Evol. 16: 491501.[Abstract]
SONG, X. L., K. WANG, W. Z. GUO, J. ZHANG and T. Z. ZHANG, 2005 A comparison of genetic maps constructed from haploid and BC1 mapping populations from the same crossing between Gossypium hirsutum L. x G. barbadense L. Genome 48: 378390.[Medline]
SOOK, J., A. ALBERT, J. CHRISTOPHER, T. JEFF and M. DORRIE, 2005 Frequency, type, distribution and annotation of simple sequence repeats in Rosaceae ESTs. Funct. Integr. Genomics 5: 136143.[CrossRef][Medline]
TALIERCIO, E., R. D. ALLEN, M. ESSENBERG, N. KLUEVA, H. NGUYEN et al., 2006 Analysis of ESTs from multiple Gossypium hirsutum tissues and identification of SSRs. Genome 49: 306319.[Medline]
UDALL, J. A., J. M. SWANSON, K. HALLER, R. A. RAPP, M. E. SPARKS et al., 2006 A global assembly of cotton ESTs. Genome Res. 16: 441450.
ULLOA, M., and W. R.. MEREDITH, JR., 2000 Genetic linkage map and QTL analysis of agronomic and fiber quality traits in an intraspecific population. J. Cotton Sci. 4: 161170.
ULLOA, M., W. R. MEREDITH, JR., Z. W. SHAPPLY and A. L. KAHLER, 2002 RFLP genetic linkage maps from F2.3 populations and a joinmap of Gossypium hirsutum L. Theor. Appl. Genet. 104: 200208.[CrossRef][Medline]
ULLOA, M., S. SAHA, J. N. JENKINS, W. R. MEREDITH, J. C. MCCARTY et al., 2005 Chromosomal assignment of RFLP linkage groups harboring important QTLs on an intraspecific cotton (Gossypium hirsutum L.) joinmap. J. Hered. 96: 132144.
VAN OOIJEN, J. W., and R. E. VOORRIPS, 2001 Joinmap Version 3.0: Software for the Calculation of Genetic Linkage Maps. CPRO-DLO, Wageningen, The Netherlands.
VARSHNEY, R. K., A. GRANER and M. E. SORRELLS, 2005 Genic microsatellite markers in plants: features and applications. Trends Biotechnol. 23: 4855.[CrossRef][Medline]
WANG, C. B., W. Z. GUO, C. P. CAI and T. Z. ZHANG, 2006 Characterization, development and exploitation of EST- derived microsatellites in Gossypium raimondii. Ulbrich. Chin. Sci. Bull. 51: 557561.[CrossRef]
WANG, K., X. L. SONG, Z. G. HAN, W. Z. GUO, J. Z. YU et al., 2006 Complete assignment of the chromosomes of Gossypium hirsutum L. by translocation and fluorescence in situ hybridization mapping. Theor. Appl. Genet. 113: 7380.[CrossRef][Medline]
WENDEL, J. F., 1989 New World tetraploid cottons contain Old World cytoplasm. Proc. Natl. Acad. Sci. USA 86: 41324136.
WENDEL, J. F., 2000 Genome evolution in polyploids. Plant Mol. Biol. 42: 225224.[CrossRef][Medline]
WENDEL, J. F., A. SCHNABEL and T. SEELANAN, 1995 Bidirectional interlocus concerted following allopolyploid speciation in cotton (Gossypium). Proc. Natl. Acad. Sci. USA 92: 280284.
WILKINS, T. A., and A. B. ARPAT, 2005 The cotton fiber transcriptome. Physiol. Plantm. 124: 295300.[CrossRef]
WILKINS, T. A., A. B. ARPAT and B. SICKLER, 2005 Cotton fiber genomics: developmental mechanisms. Pflanzenschutz-Nachrichten Bayer. 58: 119139.
YANG, S. S., F. CHEUNG, J. J. LEE, M. HA, N. E. WEI et al., 2006 Accumulation of genome-specific transcripts, transcription factors and phytohormonal regulators during early stages of fiber cell development in allotetraploid cotton. Plant J. 47: 761775.[CrossRef][Medline]
ZHANG, J., Y. T. WU, W. Z. GUO and T. Z. ZHANG, 2000 Fast screening of microsatellite markers in cotton with PAGE/silver staining. Cotton Sci. Sin. 12: 267269.
ZHANG, J., W. Z. GUO and T. Z. ZHANG, 2002 Molecular linkage map of allotetraploid cotton (Gossypium hirsutum L. x Gossypium barbadense L.) with a haploid population. Theor. Appl. Genet. 105: 11661174.[CrossRef][Medline]
ZHANG, Z. S., Y. H. XIAO, M. LUO, X. B. LI, X. Y. LUO et al., 2005 Construction of a genetic linkage map and QTL analysis of fiber-related traits in upland cotton (Gossypium hirsutum L.). Euphytica 144: 9199.[CrossRef]
Communicating editor: J. A. BIRCHLER
This article has been cited by other articles:
![]() |
F. Wang, B. Yue, J. Hu, J. McD. Stewart, and J. Zhang A Target Region Amplified Polymorphism Marker for Fertility Restorer Gene Rf1 and Chromosomal Localization of Rf1 and Rf2 in Cotton Crop Sci., August 7, 2009; 49(5): 1602 - 1608. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Lu, J. Curtiss, R. G. Percy, S. E. Hughs, S. Yu, and J. Zhang DNA Polymorphisms of Genes Involved in Fiber Development in a Selected Set of Cultivated Tetraploid Cotton Crop Sci., August 7, 2009; 49(5): 1695 - 1704. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Wang, B. Guan, W. Guo, B. Zhou, Y. Hu, Y. Zhu, and T. Zhang Completely Distinguishing Individual A-Genome Chromosomes and Their Karyotyping Analysis by Multiple Bacterial Artificial Chromosome-Fluorescence in Situ Hybridization Genetics, February 1, 2008; 178(2): 1117 - 1122. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Hisano, S. Sato, S. Isobe, S. Sasamoto, T. Wada, A. Matsuno, T. Fujishiro, M. Yamada, S. Nakayama, Y. Nakamura, et al. Characterization of the Soybean Genome Using EST-derived Microsatellite Markers DNA Res, January 11, 2008; (2008) dsm025v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. J. Chen, B. E. Scheffler, E. Dennis, B. A. Triplett, T. Zhang, W. Guo, X. Chen, D. M. Stelly, P. D. Rabinowicz, C. D. Town, et al. Toward Sequencing Cotton (Gossypium) Genomes Plant Physiology, December 1, 2007; 145(4): 1303 - 1310. [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- HTML Page - index.htslp
-
All Versions of this Article:
genetics.107.070375v1
176/1/527 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Guo, W.
- Articles by Zhang, T.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Guo, W.
- Articles by Zhang, T.














