- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Rong, J.
- Articles by Paterson, A. H.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Rong, J.
- Articles by Paterson, A. H.
A 3347-Locus Genetic Recombination Map of Sequence-Tagged Sites Reveals Features of Genome Organization, Transmission and Evolution of Cotton (Gossypium)
Junkang Ronga, Colette Abbeyb, John E. Bowersa, Curt L. Brubakerc,d, Charlene Changb, Peng W. Cheea,e, Terrye A. Delmonteb, Xiaoling Dingb, Juan J. Garzab, Barry S. Marlera, Chan-hwa Parka, Gary J. Piercea, Katy M. Raineya, Vipin K. Rastogib, Stefan R. Schulzea, Norma L. Trolinderf, Jonathan F. Wendelc, Thea A. Wilkinsg, T. Dawn Williams-Coplina, Rod A. Wingh, Robert J. Wrightb,g, Xinping Zhaob, Linghua Zhub, and Andrew H. Patersona,ba Plant Genome Mapping Laboratory, University of Georgia, Athens, Georgia 30602,
b Department of Soil and Crop Science, Texas A&M University, College Station, Texas 77843,
c Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa 50011,
d Commonwealth Scientific and Industrial Research Organization, Canberra, Australia,
e Coastal Plains Experiment Station, University of Georgia, Athens, Georgia 30602,
f Department of Plant and Soil Science, Texas Tech University, Lubbock, Texas 79409,
g Department of Agronomy and Range Science, University of California, Davis, California 95616
h Department of Plant Sciences, University of Arizona, Tucson, Arizona 85721
Corresponding author: Andrew H. Paterson, University of Georgia, 111 Riverbend Rd., Room 228, Athens, GA 30602., paterson{at}uga.edu (E-mail)
Communicating editor: J. A. BIRCHLER
| ABSTRACT |
|---|
We report genetic maps for diploid (D) and tetraploid (AtDt) Gossypium genomes composed of sequence-tagged sites (STS) that foster structural, functional, and evolutionary genomic studies. The maps include, respectively, 2584 loci at 1.72-cM (
600 kb) intervals based on 2007 probes (AtDt) and 763 loci at 1.96-cM (
500 kb) intervals detected by 662 probes (D). Both diploid and tetraploid cottons exhibit negative crossover interference; i.e., double recombinants are unexpectedly abundant. We found no major structural changes between Dt and D chromosomes, but confirmed two reciprocal translocations between At chromosomes and several inversions. Concentrations of probes in corresponding regions of the various genomes may represent centromeres, while genome-specific concentrations may represent heterochromatin. Locus duplication patterns reveal all 13 expected homeologous chromosome sets and lend new support to the possibility that a more ancient polyploidization event may have predated the A-D divergence of 611 million years ago. Identification of SSRs within 312 RFLP sequences plus direct mapping of 124 SSRs and exploration for CAPS and SNPs illustrate the "portability" of these STS loci across populations and detection systems useful for marker-assisted improvement of the world's leading fiber crop. These data provide new insights into polyploid evolution and represent a foundation for assembly of a finished sequence of the cotton genome.
THE cotton genus, Gossypium L., is an excellent system for examining many fundamental questions relating to genome evolution, plant development, and crop productivity. Gossypium is composed of
45 diploid and 5 allopolyploid species that occur naturally throughout the arid and semiarid regions of Africa, Australia, Central and South America, the Indian subcontinent, Arabia, the Galápagos, and Hawaii (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Allotetraploid cottons are all indigenous to the New World and unite the Old World A genome with the New World D genome in an A-genome cytoplasm (![]()
![]()
![]()
611 million years ago (MYA) and have been reunited in a common tetraploid nucleus
1.11.9 MYA (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Particularly interesting questions concerning Gossypium are the genetics underlying productivity and the quality of the world's leading natural fiber. World cotton commerce of
$20 billion annually is made possible by an unusual feature of a few members of this taxon. ![]()
![]()
![]()
![]()
![]()
The joining in a common nucleus of A and D genomes with very different evolutionary histories appears to have created unique avenues for response to selection. Directional selection by humans has consistently produced AD-tetraploid cottons that have yield and/or quality characteristics superior to those of A-genome diploids. Breeding of G. hirsutum (AD1) has emphasized maximum yield, while G. barbadense (AD2) is prized for fibers of superior length, strength, and fineness. Curiously, the D genome, from an ancestor that does not produce spinnable fibers (![]()
![]()
![]()
![]()
![]()
![]()
Herein, we further elucidate the structure, function, and evolution of the Gossypium genomes. We describe the first cotton map that coalesces into the expected 26 linkage groups (chromosomes), provide new insights into its transmission genetics and genome organization, and reveal new evidence suggesting that an ancient polyploidization event may have predated the A vs. D genome divergence of 611 MYA. The genetically anchored sequence-tagged sites (STS) comprising this map will foster many structural, functional, and evolutionary genomic studies relevant to development, evolution, and agriculture.
| MATERIALS AND METHODS |
|---|
Plant materials:
The tetraploid (AtDt) map was constructed on the basis of additional probes applied to the mapping population reported by ![]()
![]()
![]()
To assign linkage groups to chromosomes, the hypoaneuploid stocks used by ![]()
Probes:
The probes used in this research are summarized in online supplemental Table 1 at http://www.plantgenome.uga.edu/cottonmap.htm and include genomic DNA (gDNA), cDNA, and simple-sequence repeats (SSR). Preparation and origin of most gDNA clones is described by ![]()
![]()
|
|
Map construction:
Linkage groups were built using MAPMAKER/EXP 3.0 (![]()
![]()
4500-cM genome of cotton. Initial frameworks were nucleated using a small group of markers and the "compare" function. Additional markers were added into the framework with the "try" command in MapMaker and a custom computer program written in Microsoft Visual Basic (![]()
![]()
Genetic interpretation of the restriction fragment length polymorphism (RFLP) phenotypes and nomenclature followed the system of ![]()
DNA sequencing and sequence data processing:
Plasmids were isolated using an alkaline lysis method modified for a 96-well plate format. Cycle sequencing reactions were performed using the BigDye terminator cycle sequencing kit (Applied Biosystems, Foster City, CA) and MJ Research (Watertown, MA) PTC-10 thermocyclers. Finished cycle sequencing reactions were filtered through Sephadex filter plates directly into Perkin-Elmer (Norwalk, CT) MicroAmp optical 96-well reaction plates. Sequence data processing used a pipeline anchored by a Microsoft Access relational database (RDB), which links sample and experimental information with each chromatogram file. The pipeline begins with the presequencing archival of experimental information such as sample names and specific clone attributes, which are imported into the ABI sequence plate records. Chromatogram files are archived in the RDB and then transferred to a Sun workstation for processing using Phred (![]()
| RESULTS |
|---|
Genetic maps:
The tetraploid cotton map is composed of 2584 loci in 26 linkage groups (Table 1), a 1879-locus (nearly threefold) increase compared to our previously published map of 705 loci (![]()
![]()
|
The diploid D-genome map is composed of 763 loci detected by 662 probes and spans 1493.3 cM in 13 linkage groups (Table 1). This is a 153% increase (467 loci) over the prior version of this map, which included 306 loci. Ten loci were omitted from this map for similar reasons as the tetraploid map. Despite the significant increase in locus number, the total length of the map is essentially the same. This map is based on a total of 1929 detected crossovers corresponding to 1942 potentially distinct map locations. We have mapped markers at 505 (26.0%) of these locations. Overall, the average distance between consecutive loci is 1.96 cM, varying among chromosomes from 1.58 cM (D9) to 3.11 cM (D10). The biggest gap is 14.7 cM (D5) and only nine intervals are >10 cM. Detailed features of the map are in online supplemental Table 4 at http://www.plantgenome.uga.edu/cottonmap.htm.
The D genome is shorter than the tetraploid At and Dt subgenomes as a whole in genetic length (Table 1, Fig 1). Chromosome length varies over a somewhat narrower range in the diploid D genome than in the tetraploid, from 77.9 cM (D3) to 174.5 cM (D7). Markers were distributed more sparsely in the D genome than in the tetraploid genome (1.96 vs. 1.72 cM).
|
Recombinational interference:
Recombinational interference was assessed by comparing the frequency of occurrence of "double crossover" genotypes (i.e., aa-ab-aa; bb-ab-bb) to "adjacent crossover" genotypes (i.e., aa-ab-bb; bb-ab-aa) as a function of the size of the interval that contains the two crossovers required to produce each genotype. In the absence of interference, these two different classes of genotypes would be equally probable; however, as a whole double crossovers were significantly (P < 0.01) more abundant than adjacent crossovers in both At and Dt subgenomes (563 vs. 383, 462 vs. 370). Virtually all of this difference was accounted for by cases in which the two loci were separated by <10 cM (Fig 2). In intervals >20 cM, the two classes of genotypes occur in a different frequency in the At and Dt subgenomes. In the At genome, the frequency of double crossovers is still generally greater than that of adjacent crossovers (except in the 20- to 29.9-cM interval) and the difference reaches significance in the 40- to 49.9- and 60- to 69.9-cM intervals. On the contrary, in the Dt genome, the two types of crossover are equally abundant in the 20- to 29.9-cM interval, and adjacent crossovers are more abundant in larger intervals.
|
In the diploid population, in total, double crossovers were slightly more abundant than adjacent crossovers (271 vs. 242), but this small difference was not statistically significant. However, over distances of 010 cM, the frequency of double crossovers is significantly higher than that of adjacent crossovers, consistent with the tetraploid mapping population (Fig 2). Like the Dt genome, over distances >20 cM, adjacent crossovers are generally more abundant than double crossovers, but the difference reaches significance only in the 20- to 29.9-cM interval (Fig 2).
Patterns of DNA marker locus distribution:
DNA markers were unevenly distributed over the chromosomes of each map (Fig 1). To analyze this statistically, we partitioned each linkage group into intervals of 10 cM in length, except that the last interval in each group was either
15 or
5 cM to accommodate the varying lengths of the linkage groups. On the basis of the total number of loci per linkage group, the Poisson probability distribution function was applied to identify bins that contained significant (P < 0.01) excesses or deficiencies of various classes of markers.
A total of 65 intervals (shaded regions) comprising 49 clusters (16 composed of two consecutive intervals; see online supplemental Table 5a at http://www.plantgenome.uga.edu/cottonmap.htm) were marker rich, with one to three clusters on each chromosome except tetraploid chromosomes 1 and 25 and diploid linkage groups D3, D6, D7, D8, and D10. Many of the marker-rich regions corresponded in location between the tetraploid subgenomes and/or between the diploid and tetraploid maps. The Dt and D maps corresponded most closely: among eight concentrations of probes in the D-genome map, seven (87.5%) correspond to concentrations in the Dt map (vs. an overall 11.7% of intervals that are marker rich). The single D-genome marker-rich interval that is incongruous with the Dt counterpart corresponds to a marker-rich interval in the At genome. Among the 18 marker-rich regions in the Dt genome, 14 (78%) correspond to 14 marker-rich regions in the At genome (vs. an overall 12.7% of intervals that are marker rich). In five cases, a single At or Dt (3 and 2, respectively) marker-rich region corresponds to each of two unlinked marker-rich regions. In a single case, two genetically linked marker-rich regions that are separated by a normal region correspond on homeologous chromosomes (A01 and chromosome 18). Among the marker-rich regions that do not correspond between At and Dt, one corresponds between D and Dt, one between At and D, three are unique to Dt, and nine are unique to At.
A total of 17 intervals comprising 12 clusters were marker poor, with 9 on nine different Dt chromosomes, 3 on two different At chromosomes, and none on the D diploid (see online supplemental Table 5 at http://www.plantgenome.uga.edu/cottonmap.htm). While the 6.5% of Dt intervals that are marker poor appears significant, the 1.3% of At intervals that are marker poor is only nominally above the 1% false-positive rate. Only one case of correspondence among marker-poor intervals was foundbetween At chromosome 5 (cM 210220) and Dt linkage group chromosome 22 (cM 7080).
Distribution of dominant loci:
A total of 1069 dominant loci were detected in the tetraploid map, including 520 loci segregating as the presence of alleles from G. hirsutum and 549 from G. barbadense. No cluster of dominant loci specific for one parent was located on any chromosome. Among eight main probe sources, the probes from two libraries (Coau and pVNC) show significantly more dominant alleles from G. barbadense than from G. hirsutum: 63 vs. 37 and 16 vs. 5, respectively.
SSR and single-nucleotide polymorphism (SNP)-containing loci:
A total of 1749 cDNA, 710 gDNA, 124 SSR loci, and one isozyme were mapped in the tetraploid, and 410 cDNA and 353 gDNA in the diploid D genome. The distribution of cDNA and gDNA loci appear to be similar in both tetraploid (r = 0.434) and diploid (r = 0.505) maps. In addition to 124 SSR loci that were mapped directly, we also identified 312 additional SSR-containing sequences (defining an SSR as six or more repeats of a dinucleotide or longer repeat units covering 15 or more base pairs; see online supplemental Table 2 at http://www.plantgenome.uga.edu/cottonmap.htm) within 1884 sequences of gDNA and cDNA probes mapped as RFLPs. The SSR-containing loci were distributed across the map in a pattern similar to that of the entire probe set (r = 0.547).
For a sampling of simple-sequence repeat motifs, (GA)15, (CA)15, (AT)15, and (A)30, we roughly estimated the number of copies present in the cotton genome that are composed of 10 or more repeat units (and thus tend to be more polymorphic than shorter arrays). Specifically, we end-labeled synthetic oligonucleotides with 32P and screened
20,400
genomic clones (
11.3% of the genome) from a partial Sau3AI library of G. barbadense "K101" with average insert length of 15 kb (Table 2), using a hybridization stringency that should detect only arrays of 10 or more repeat units (adjusted to accommodate different binding affinities of oligonucleotides; Table 2). Subcloning and sequencing of a sampling (CMS loci) showed that arrays of 929 units were detected. Assuming that each positive clone contained only one simple-sequence repeat array, the number of such long arrays in the genome was estimated as 1800, 538, 4500, and 9000 for the four dinucleotides, respectively. On the basis of a genome size of 2700 Mb, one of these four arrays should occur an average of once in 170 kb. Considering only the GA and CA elements, which are more reliable in PCR amplification, we find one array per 1155 kb (or
2 cM).
We also explored levels and patterns of SNP variability in a sampling of genetically mapped sequence-tagged sites among a small selection of tetraploid cottons. For diploids, PCR amplification products might be sequenced directly; however, for a recently formed polyploid such as cotton, most PCR amplification products are mixtures of sequences from two or more divergent loci and require further purification before sequencing. To accommodate this, after PCR amplification of each of four genotypes (G. barbadense accession "K101" and cultivar Pima S6 and G. hirsutum race palmeri unnamed accession and cultivar "Acala Maxxa") with each of 24 primer pairs (individually, using conditions determined to be optimal for each primer pair), we diluted the samples to equimolar concentrations, then pooled the 24 samples for each genotype, purified with Sephadex G-50 columns, and cloned using the pGEM-T Easy vector system (Promega, Madison, WI) following the manufacturer's protocols. From the pooled cloned products, a total of 768 clones (192 per genotype, or
5x the coverage of each homeologous locus) were sequenced from one direction, with 664 (86.5%) sequences, or an average of
3 per homeologous locus in each genotype, meeting our (Q20) quality criteria. All primer sets were represented by at least one sequence, but with a range of 165 clones per primer pair (not shown). Hence, the savings in cost and labor associated with cloning and sequencing of individual amplifications from each genotype was partly sacrificed to uneven sampling of different loci.
Comprehensive analysis was performed on a subset of 12 primer pairs (Table 3) spanning 5409 nucleotides (nt) in each of the two subgenomes, which had relatively complete representation among the genotypes. For 10 of the 12 primer pairs, sufficient information was available to infer the subgenomic affinity of SNPs (on the basis of correspondence to diploid sequences): 6 of these 10 pairs showed SNPs or indels that distinguished between subgenomes. The minimal rate of variation per nucleotide of 1.06% between subgenomes (estimated considering indels as a single mutational event regardless of their length) was substantially greater than the level of variation between species (0.35%) or genotypes within species (0.37 and 0.14%). The vast majority (72.3%) of mutational events were SNPs, although a total of 12 indels were found, ranging from 1 to 37 nt in length.
Assignment of linkage groups and morphological markers to subgenomes and chromosomes:
Coalescence of the map into 26 linkage groups (from 41; ![]()
![]()
![]()
A total of 20 cotton chromosomes have now been identified, adding to our prior work (![]()
![]()
![]()
![]()
![]()
![]()
A few incongruities between various data types suggest the need for reassignment of several chromosome names. First, the identification of chromosome 12 herein and elsewhere required us to revisit the identity of its inferred homeolog. The chromosome that we now (see above) know to be chromosome 12 had previously been shown to be homeologous to what we thought was chromosome 22 (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
While our chromosomal assignments are closely congruent with one recent study (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
In addition to those described above, several additional morphological traits were mapped to chromosomes. On the basis of previously described populations (![]()
2 tests using stringent statistical criteria (P > 0.001, to accommodate the large genome of cotton), segregation of petal color (the classical Y1 gene; ![]()
![]()
![]()
![]()
![]()
![]()
![]()
Patterns of DNA marker locus duplication:
Among the 2007 different probes mapped in the tetraploid cotton genome, a total of 427 detected 2 loci, 58 detected 3 loci, 10 detected 4 loci, and 1 detected 6 loci. Duplicated loci are not randomly distributed (Fig 3; see also clickable annotated version at http://www.plantgenome.uga.edu/cottonmap.htm). In tetraploid cotton, loci on a particular At-subgenome chromosome almost invariably showed duplications that were concentrated on 1 or 2 Dt-subgenome chromosomes. For example, 21 of 43 probes mapped on chromosome 1 have counterparts on chromosome 15. The duplicated loci of the remaining 22 probes were distributed across 13 different chromosomes with a maximum of only 4 on any single chromosome. In total, 10 Dt-genome chromosomes have a much higher frequency of duplicate loci (in all cases numbering 15 or more) on one of the At-genome chromosomes than on any other chromosomes. The remaining 3 Dt-genome chromosomes (chromosome 14, chromosome 22, and LG D08) have a higher frequency of duplicated loci (6 or more) on 2 At-genome chromosomes than on any other chromosomes. On the basis of the Poisson distribution, 5 or more loci duplicated on a pair of chromosomes deviates significantly from a random distribution (P = 0.03). In most cases, the linear order of the duplicated loci on the pairs of chromosomes is quite consistent (Fig 1 and Fig 3), except for the subset in which we deduced structural changes (see below). Putatively homeologous chromosomes (or segments) are presented with putatively homeologous loci connected by lines (Fig 1).
|
To investigate the correspondence of diploid and tetraploid chromosomes, 257 common probes were mapped on both populations. For any single linkage group in the D diploid map, corresponding loci were heavily concentrated on two homeologous chromosomes of tetraploid cotton (Fig 1 and Fig 4; see also a clickable annotated version at http://www.plantgenome.uga.edu/cottonmap.htm). For example, among the 40 loci mapped on diploid chromosome D1, counterparts of 29 were located on homeologous tetraploid chromosomes 7 (16 loci) and 16 (13 loci). The remaining 11 loci were distributed over nine other chromosomes, with no chromosome having >2 loci. The linear order of the common probes is similar between tetraploid and diploid chromosomes. Putatively corresponding diploid and tetraploid chromosomes (or segments) are presented with putatively corresponding loci connected by lines (Fig 1).
|
Comparative transmission and organization of homologous and homeologous groups:
Chromosomes of the At subgenome averaged
10% longer than those of the Dt subgenome, and
55% longer than those of the D diploid (Table 2). Since very similar numbers of markers mapped to the At and Dt subgenomes, the shorter recombinational length may explain the slightly higher marker density on the Dt genome than on the At genome (1.65 vs. 1.77 cM/locus). All D chromosomes are shorter than their corresponding At homeologs and eight are shorter than their Dt homeologs, with the remaining five (D1, D3, D7, D11, and D12) roughly equivalent to their Dt homeologs (chromosome 16, chromosome 17, LG D02, chromosome 20, and chromosome 22; Fig 1). Five At chromosomes (chromosome 3, chromosome 4, chromosome 7, chromosome 10, and chromosome 12) were much longer than homeologous Dt chromosomes. Four At chromosomes (chromosome 5, LG A01, LG A02, and LG A03) are close in length to their homeologous Dt counterparts and only four At chromosomes were shorter than Dt chromosomes.
Chromosome structural changes:
Above it was shown that three Dt-genome chromosomes (chromosome 14, chromosome 22, and LG D08) have homeologous relationships with two At chromosomes. Nonoverlapping sets of loci on chromosome 22 and LG D08 have counterparts on different regions of chromosome 4 and chromosome 5, respectively. By contrast, chromosome 22 and LG D08 have a simple one-to-one homeologous relationship with their corresponding diploid D chromosomes, D12 and D9. This is consistent with the finding (![]()
Duplication within individual subgenomes:
In addition to loci duplicated on homeologous chromosomes, many loci were duplicated on chromosomes that were nonhomeologous and/or within the same subgenome. Among 679 pairs of duplicate loci detected with 499 probes in the tetraploid map (Fig 3), 289 pairs were on putatively homeologous chromosomes. Among the 389 pairs on nonhomeologous chromosomes, 160 pairs involved both subgenomes, 120 were on pairs of At chromosomes only, and 109 were on pairs of Dt chromosomes only. Unlike the duplicate loci on homeologous chromosomes, the nonhomeologous duplicate loci were scattered over many chromosomes. For example, 67 probes on LG A03 detected duplicated loci on 18 chromosomes, with 31 on the homeologous chromosome LG D02, and the remaining 36 scattered over 17 chromosomes. However, close study of nonhomeologous duplications reveals that 6 were duplicated on LG A03, 5 on chromosome 7, and 5 on chromosome 26. On the basis of the Poisson distribution test, these numbers are significantly higher than expected to occur by chance. On the tetraploid map, 19 pairs of chromosomes shared 4 or more common loci, including 6 on At, 10 on both At and Dt, and 3 on Dt. In addition, 10 individual chromosomes contained 4 or more pairs of duplicate probes (Fig 4).
A total of 82 probes (12.4%) detected 137 pairs of duplicate loci in the diploid D genome (Fig 5; see also clickable annotated version at http://www.plantgenome.uga.edu/cottonmap.htm) despite that genome having a smaller total number of mapped loci (763) than the tetraploid. The frequency of mapped duplicated loci is higher in the D diploid map than in either the At or the Dt subgenomes, due to a generally higher level of DNA polymorphism in the D diploid cross (![]()
|
The D-genome duplicate loci, like the intrasubgenomic duplications in the tetraploid, were not randomly distributed (Fig 5). Nine pairs of chromosomes each share four or five common probes, significantly higher than the random expectation. In addition, another 29 pairs of chromosomes contain two or three common markers each, most of which are closely distributed on the chromosomes. As a result, each chromosome can be divided into several blocks according to the clustering of duplicate loci. Frequently, the blocks on one chromosome show a syntenic relationship with blocks on other chromosomes. For example, D1 can be divided into four regions, the top one corresponding with the middle region of D12, the middle one with the middle regions of D8 and D7, and the bottom one with D6 (Fig 6). Although a clear syntenic relationship was observed in these cases, the linear order of the duplicated loci between homeologous blocks was not identical, especially regarding the top region of D1. In this case, the differences in linear order can be explained by the inversion of the top region (Gate1BF04-Coau2J14) of this chromosome.
|
Conservation of intragenomic duplication also can be detected in some regions of the tetraploid genome. A typical example was the high-marker-density region on the corresponding regions of chromosome 7, LG A03, and D02 (Fig 7). In diploid cotton, four probes (AEST009, pAR0141, pAR0024, and pAR09E07) located on the region of D1 spanning 18.2 cM detected five loci on D7 spanning 31.8 cM. In the corresponding region of the tetraploid map, the counterparts of four probes (pAR0024, pAR0319, Gate3CH01, and Unig25H02) from 98.5 to 117.8 cM of LG D02 were concentrated in the region from 102.4 to 119.8 cM of chromosome 7. Four probes (G1045, pAR0024, Gate3BB10, and Unig22B11) mapped on LG A03 from 96.0 to 144.5 cM were also located in this region of chromosome 7, except for Unig26B04 that was 36.8 cM away from the closest duplicate marker on LG A03 and 57.5 cM on chromosome 7. All this information shows that these high-marker-density regions are homeologous fragments among the chromosomes D1, D7, 7, A03, and D02, and they retain much similarity since divergence from the same ancestor.
|
Some probes have duplicated loci on the same chromosome. In the tetraploid map, 73 probes, or 11.0% of the total number of duplicated probes, were duplicated on the same chromosome, with as many as seven duplications on a single chromosome (chromosome 18). In the diploid map, 26 (18.8%) of the duplicated probes show two or more loci on the same chromosome, with up to three on a single chromosome. In both cases, the level of intrachromosomal duplication is higher than the random expectations of 4 and 8.25% for the AD and Dt genomes, respectively. Many of these duplications were present in tandem in all three genomes, suggesting an origin that predates the divergence of the A and D genomes.
The frequency of duplicated loci generally increased with locus number per chromosome (r = 0.829) and linkage group length (r = 0.811). A few exceptions to this trend were found. For example, chromosome 22, the second shortest chromosome (63 loci, 95.4 cM) in the tetraploid genome, has the highest frequency (27, or 42.9%) of duplicate loci, with corresponding loci distributed over 16 nonhomeologous chromosomes, not including 27 on homeologous chromosomes 4 and 5.
Marker sequence annotation:
Multiple local alignment searches using the programs blastn and tblastx were used for sequence annotation against publicly available databases of the National Center for Biotechnology Information (NCBI) as of April 20, 2003. The default matrix BLOSUM 62 and a cutoff of 1 x 10-6 were used in all BLAST searches. The NCBI database was subdivided into several taxon-specific groups to allow for the efficient determination of not only the best overall match, but also the best match among closely related species, excluding unannotated EST and genomic survey sequence database entries. Additional analyses included the use of hidden Markov models to classify sequence data by protein sequence signature. The program InterProScan (ZDOBNOV et al. 2002) was used to search and compare the translated cotton sequences against several protein databases (Pfam, SMART, and ProDom) and genome ontology (![]()
| DISCUSSION |
|---|
This genetically anchored set of sequence-tagged sites, composed of 3347 loci (2584 on the tetraploid map and 763 on the diploid map), provides transferable genetic markers suitable for a wide range of investigations in structural, functional, and evolutionary genomics. A genetically anchored STS framework provides a foundation for physical mapping and ultimately for assembling a robust finished sequence of the cotton genome. The present tetraploid map equates to an average interval of
606 kb between genetic markers on the basis of a consensus estimate of genome size of
2700 Mbp. The map is presently being used to anchor high-coverage bacterial artificial chromosome (BAC) libraries for G. hirsutum, G. barbadense, and G. raimondii on the basis of hybridizing genetically mapped probes to the BACs (see http://www.plantgenome.uga.edu). The G. raimondii BAC library is slated for fingerprinting to 10x depth, permitting the resulting "contigs" to be extended further. By selective BAC end sequencing, a robust, genetically anchored physical map is expected to coalesce.
Although the map was mostly created using the RFLP method and has been applied to several goals by this technology (e.g., ![]()
![]()
![]()
![]()
![]()
![]()
![]()
Rates of intraspecific DNA sequence variation within amplicons derived from the genetically mapped sequence-tagged sites are somewhat higher than the (typically
0.1%) rates found in human or other taxa, indicating that SNP discovery and application in cotton is feasible. Several approaches can be used to discover and detect SNPs in the STS loci. The scope of SNP detection systems has been extensively reviewed elsewhere. Many labs with a need for small numbers of specific markers may find the CAPS method attractive (![]()
Nonrandom patterns of DNA marker distribution provide clues regarding interesting and important features of cotton genome organization. On most chromosomes, at least one significant concentration of loci occurs, possibly corresponding to the centromeric regions. Virtually all marker-rich regions corresponded between the D and Dt genomes, and most also corresponded with the At genome, suggesting that these may be the locations of many of the cotton centromeres. In several cases, the breakpoints of structural rearrangements between the A and D subgenomes locate squarely in these regions, consistent with the widespread observation that chromosomal inversion breakpoints often lie at or near centromeres. A total of three marker-rich regions are unique to Dt and nine are unique to At, generally consistent with the much larger quantity of repetitive DNA in the A genome (![]()
Patterns of DNA marker duplication in the tetraploid cotton genome are especially important. The detailed delineation of homeologous relationships among the chromosomes of tetraploid cotton and between the chromosomes of tetraploids and diploids is important from both a basic and an applied standpoint. Tetraploid cotton, containing At and Dt subgenomes, was derived from a naturally occurring cross between two diploids with A and D genomes, respectively,
12 million years ago (![]()
![]()
![]()
![]()
In addition to the homeologous duplications, many sequences are also duplicated within each subgenome. While the observed proximal (intrachromosomal) bias of intragenomic duplication suggests that some of it may be accounted for by retrotransposition, we suggest that the majority of intragenomic duplication may have another basis. In the diploid D genome, 12.4% of probes detected multiple polymorphic loci, which appear to fall into syntenic and even colinear regions. This finding, together with the recent report of intragenomic similarities in Giemsa-banding patterns (![]()
![]()
![]()
| ACKNOWLEDGMENTS |
|---|
We thank many colleagues in the Paterson lab for technical and moral support and David Stelly for helpful comments. This work was funded by the U.S. Department of Agriculture (91-37300-6570; 97-35300-5305), the National Science Foundation (DBI-9872630; 0211700), Georgia and Texas Cotton Commissions, and Georgia and Texas Agricultural Experiment Stations.
Manuscript received July 30, 2003; Accepted for publication September 23, 2003.
| LITERATURE CITED |
|---|
ASHBURNER, M., C. A. BALL, J. A. BLAKE, H. BUTLER, and J. M. CHERRY et al., 2001 Creating the gene ontology resource: design and implementation. Genome Res. 11:1425-1433.
BEASLEY, J. O., 1940 The production of polyploids in Gossypium.. J. Hered. 31:39-48.
BOWERS, J. E., C. ABBEY, S. ANDERSON, C. CHANG, and X. DRAYE et al., 2003 A high-density genetic recombination map of sequence-tagged sites for Sorghum, as a framework for comparative structural and evolutionary genomics of tropical grains and grasses. Genetics 165:367-386.
BRUBAKER, C. L., A. H. PATERSON, and J. F. WENDEL, 1999 Comparative genetic mapping of allotetraploid cotton and its diploid progenitors. Genome 42:184-203.[CrossRef]
EDWARDS, G. A. and M. A. MIRZA, 1979 Genomes of the Australian wild species of cotton. II. The designation of a new G genome for Gossypium bickii. Can. J. Genet. Cytol. 21:367-372.
ENDRIZZI, J., E. L. TURCOTTE and R. J. KOHEL, 1984 Qualitative genetics, cytology, and cytogenetics, pp. 81129 in Cotton, edited by R. J. KOHEL and C. F. LEWIS. ASA/CSSA/SSSA, Madison, WI.
ENDRIZZI, J., E. L. TURCOTTE, and R. J. KOHEL, 1985 Genetics, cytology, and evolution of Gossypium. Adv. Genet. 23:272-375.
EWING, B., L. HILLIER, M. C. WENDL, and P. GREEN, 1998 Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res. 8:175-185.
FRYXELL, P. A., 1979 The Natural History of the Cotton Tribe. Texas A&M University Press, College Station, TX.
FRYXELL, P. A., 1992 A revised taxonomic interpretation of Gossypium L. (Malvaceae). Rheedea 2:108-165.
GALAU, G. A. and T. A. WILKINS, 1989 Alloplasmic male-sterility in AD allotetraploid Gossypium hirsutum upon replacement of its resident cytoplasm with that of D-species Gossypium harknessii. Theor. Appl. Genet. 78:23-30.[CrossRef]
HARLAND, S. C., 1929 The genetics of cotton. I. The inheritance of petal spot in New World cottons. J. Genet. 20:365-385.

















