Genome duplication is thought to be central to the evolution of morphological complexity, and some polyploids enjoy a variety of capabilities that transgress those of their diploid progenitors. Comparison of genomic sequences from several tetraploid (AtDt) Gossypium species and genotypes with putative diploid A- and D-genome progenitor species revealed that unidirectional DNA exchanges between homeologous chromosomes were the predominant mechanism responsible for allelic differences between the Gossypium tetraploids and their diploid progenitors. Homeologous gene conversion events (HeGCEs) gradually subsided, declining to rates similar to random mutation during radiation of the polyploid into multiple clades and species. Despite occurring in a common nucleus, preservation of HeGCE is asymmetric in the two tetraploid subgenomes. At-to-Dt conversion is far more abundant than the reciprocal, is enriched in heterochromatin, is highly correlated with GC content and transposon distribution, and may silence abundant A-genome-derived retrotransposons. Dt-to-At conversion is abundant in euchromatin and genes, frequently reversing losses of gene function. The long-standing observation that the nonspinnable-fibered D-genome contributes to the superior yield and quality of tetraploid cotton fibers may be explained by accelerated Dt to At conversion during cotton domestication and improvement, increasing dosage of alleles from the spinnable-fibered A-genome. HeGCE may provide an alternative to (rare) reciprocal DNA exchanges between chromosomes in heterochromatin, where genes have approximately five times greater abundance of Dt-to-At conversion than does adjacent intergenic DNA. Spanning exon-to-gene-sized regions, HeGCE is a natural noninvasive means of gene transfer with the precision of transformation, potentially important in genetic improvement of many crop plants.
GENOME duplication is a potentially rich source of genes with new (Stephens 1951; Ohno 1970) or modified functions (Lynch and Conery 2000), and is thought to be central to the evolution of morphological complexity (Freeling and Thomas 2006). Genome doubling may confer advantages to a polyploid (Comai 2005), via mechanisms such as increased gene dosage, “intergenomic heterosis” conferred by multiple alleles in a polyploid nucleus, or the evolution of novel gene functions (neofunctionalization) (Stephens 1951; Ohno 1970). Over time, duplicated genes may evolve subdivisions of ancestral functions (subfunctionalization) (Lynch and Force 2000) that render them interdependent. Subfunctionalization may sometimes lead to neofunctionalization (He and Zhang 2005).
Polyploids have been suggested to enjoy a variety of capabilities that transgress those of their diploid progenitors. For example, the notion that polyploids may adapt better than diploids to environmental extremes has been suggested, based on both their geographic distribution (Muntzing 1936; Love and Love 1949; Stebbins 1950; Grant 1971) and on an inferred abundance of paleopolyploidizations near the Cretaceous–Tertiary extinction (Fawcett et al. 2009). A variety of morphological, physiological, and gene expression changes have been associated with polyploidy. Experimental data are available to evaluate causality of only a few such cases in specific adaptations of polyploids, with more data needed but offering some support (as recently reviewed in Madlung 2013).
Angiosperms (flowering plants) are an outstanding model for studying consequences of genome duplication salient to higher eukaryotes. All angiosperms are paleopolyploid (Bowers et al. 2003; Jiao et al. 2011), and their abundance of multiple independent genome duplications (Paterson et al. 2010) provides “natural replicates” for a variety of investigations. Study of the genes from three rounds of ancient whole genome duplications in Arabidopsis reveals a short phase of function relaxation followed by diversifying selection (Guo et al. 2013). The ability to study multiple independent genome duplications in a lineage also permits inference of the temporal orders and rates at which different duplication-associated events/mechanisms occur. Their larger genome sizes and smaller effective population sizes than microbes that have experienced genome duplication such as yeast (Gu et al. 2003; Christoffels et al. 2004; Scannell et al. 2006) and Paramecium (Aury et al. 2006), makes angiosperms more appropriate for studying consequences of genome duplication in higher eukaryotes (Lynch et al. 2001; Lynch 2006).
The ability to “synthesize” newly polyploid plants by artificial crosses and chromosomal manipulation using colchicine, has revealed striking immediate reactions of genomes to duplication. These reactions include loss and restructuring of low-copy DNA sequences (Song et al. 1995; Feldman et al. 1997; Ozkan et al. 2001; Shaked et al. 2001; Kashkush et al. 2002; Ozkan et al. 2002), activation of genes and retrotransposons (O’Neill et al. 2002; Kashkush et al. 2003), gene silencing (Chen and Pikaard 1997a, 1997b; Comai et al. 2000; Lee and Chen 2001), and subfunctionalization of gene expression patterns (Adams et al. 2003, 2004). Gene silencing in the allopolyploid hybrid between Arabidopsis thaliana and Cardaminopsis arenosa is arguably related to defense response against transposons (Comai et al. 2000). Changes of 24-nt siRNA and DNA methylation levels in Arabidopsis hybrids are greatest at loci for which two parents differ substantially (Groszmann et al. 2011; Greaves et al. 2011). Chromosome rearrangement and reactivation of transposable elements are well known when plants are under “genomic stress,” which includes formation of polyploids (McClintock 1983). Instability of hybrid genomes has been attributed to bursts of transposition in both animals and plants; cytosine demethylation and deacetylation of lysine residues on histones may be responsible (Fontdevila 2005).
However, to learn whether immediate reactions of genomes to duplication provide raw material for the beginnings of adaptation or are merely symptoms of imminent extinction, it is necessary to investigate naturally formed polyploids that have survived the test of time. The extinction hypothesis seems generally more likely, given that unreduced gametes are produced more or less continuously by organisms but only a tiny fraction result in successful lineages. For example, dramatic early-generation mutations in synthetic Brassica napus (Pires et al. 2004) are not paralleled in naturally occurring forms (Rana et al. 2004).
A particularly intriguing example of possible advantages associated with polyploidy comes from the cotton genus, Gossypium, in which two diploids and two tetraploids have each been independently domesticated for production of the same product, seedborne epidermal fibers. A-genome diploids native to Africa, and Mexican D-genome diploids diverged ∼5–10 MYA (Senchina et al. 2003) . They were reunited ∼1–2 MYA by trans-oceanic dispersal to the New World of a maternal A-genome propagule resembling Gossypium herbaceum (Wendel 1989), hybridization with a native D-genome species resembling G. raimondii, and chromosome doubling. The nascent AtDt allopolyploid spread throughout the American tropics and subtropics, diverging into at least three subclades and five species, with two of those species (G. hirsutum and G. barbadense) being independently domesticated.
In India, where scientific improvement programs are active for both ploidies, tetraploid (“AtDt” genome) cottons consistently have substantially higher yield and superior fiber qualities than A-genome diploids (Anonymous 1997). Remarkably, the majority of genetic variation among tetraploid cottons has been ascribed to chromosomes from the D-genome diploid progenitor that does not produce spinnable fiber, suggesting that postpolyploidy selection for superior fiber yield and quality of tetraploid cottons has preferentially operated upon the Dt genome (Jiang et al. 1998; Rong et al. 2007).
In the present study, sequencing and careful comparison of several tetraploid Gossypium species and genotypes and representatives of their putative progenitor genomes reveals that homeologous gene conversion events (HeGCEs) account for the vast majority of allelic differences between polyploid cottons and their diploid progenitors. High survivorship of alleles that were converted shortly after polyploid formation is suggested to reflect both a rapid rate of conversion at that time and also adaptive significance of many resulting alleles. A second cadre of converted alleles are closely associated with domestication, suggesting a mechanism by which chromosomes from the D-genome diploid progenitor that does not produce spinnable fiber may have come to account for the majority of genetic variation in fiber characteristics among tetraploid cottons. In partial summary, these data suggest that HeGCEs are an early and important mechanism by which genomes adapt to the duplicated state and may also contribute to plant domestication and crop improvement.
Read mapping and single nucleotide variation detection
Sequences of cotton D-genome v2 (G. raimondii), A-genome (G. herbaceum), and a tetraploid genome (G. hirsutum, Acala ‘Maxxa’) are from Paterson et al. (2012). G. herbaceum is sequenced with read depth of 32× and Acala ‘Maxxa’ with read depth of 82×. Sequences of three additional tetraploid genomes (G. hirsutum GA120R1B3 30×, G. hirsutum race yucatanese 15×; and G. mustelinum 46×) are from Illumina sequencing of paired-end libraries. Reads are aligned to the reference genome using Burrows-Wheeler Aligner (BWA) (Li and Durbin 2009). Single nucleotide variants (SNVs) between D- and A-genomes are called with Samtools/bcftools (Li et al. 2009; Li 2011) using reads with mapping quality >30 and base quality >30. Raw SNVs between progenitor genomes are further filtered by keeping those with read depth between 4 and 60. Raw SNVs for the tetraploid species are further filtered by keeping those with read depth between 7 and twice the average effective read coverage. After aligning reads from the tetraploid genome to the reference genome, alleles are assigned to each subgenome by referring to the parental alleles.
Detection of converted alleles
SNVs from the two progenitor genomes are identified by aligning reads from the A-genome to the reference D-genome. For each of the SNV sites, the orthologous alleles in the tetraploid genomes are sorted into subsets corresponding to the respective parental subgenomes (Supporting Information, Figure S6). A locus that differs between the diploid progenitors but for which a tetraploid shows only the allele from one progenitor (i.e., is monomorphic) is regarded as converted, if meeting the following criteria. Assuming that alleles at a locus follow a binomial distribution when sampling tetraploid DNA, read coverage of 7× is necessary to keep the false positive rate under 0.0078. To remove false positives caused by undersampling, we filtered out monomorphic alleles with read coverage under 7. False-positive converted alleles can also be derived from deletion of a progenitor allele in a subgenome. In this event, read coverage of deleted sites would average half that of the sites that show both parental alleles. Average mapped read coverage for the four tetraploid cotton species studied are shown in Figure S7. Reduction of read coverage by half is not observed in either of the conversion categories, compared to sites that show both parental alleles.
To rule out the possibility that inferred conversions are due to the At subgenome being more divergent than the Dt from the reference genome, the effect of relaxing the value of editing distance (−n flag in BWA) was investigated. If the sequence divergence was a major factor, the magnitude of conversion bias would be reduced after relaxing the editing distance. The default value for −n flag is 0.04. The estimated mean nucleotide divergence between A- and D-genomes is ∼0.64%, ranging from 1.13e-5 to 0.012. Theoretically, the default value is sufficiently large to allow the mapping of At-genome reads even for most diverged regions. Figure S8A shows the proportion of allele changes resulting from relaxing the value of −n flag to 0.8. The magnitude of conversion bias is not decreased, but increased. At-to-Dt converted alleles became the most common one, presumably due to an increase of mapped reads from the Dt subgenome after allowing more mismatches. That is, the relative frequency of At alleles declined under the SNP calling threshold because more Dt-subgenome reads aligned to the reference. To test this, we reduced the frequency threshold of calling a heterozygous SNP (Figure S8B). We observed an increase of unchanged alleles and decrease of both converted types. Expectedly, the log ratio of the two conversion types also increased (Figure S9). We also tried reads with mapping quality from 1 to 30 with increments of 5; all of the tests show similar results. We used reads with mapping quality >20 for this study.
We further investigated possible artifacts of read alignment by assessing the sequence divergence between the A- and D-genomes near HeGCE sites. We divided the genome into nonoverlapping 10-kb bins. For each bin, we calculated the number of At-to-Dt converted alleles and the nucleotide divergence between the A- and D-genomes. The nucleotide divergence in each bin is normalized by the proportion of sites covered by reads to remove the variation of reads coverage. The Pearson correlation coefficient of nucleotide divergence and At-to-Dt conversions is 0.089, which indicates only weak positive correlations (Figure S10A). To compare with other types of allele changes, we also looked into the correlation between Dt-to-At converted sites and unchanged sites. The other two types also show similar levels of correlations (r = 0.0613 for Dt-to-At converted alleles; r = 0.0979 for unchanged alleles) (Figure S10, B and C).
Evaluation of gene function impact
Gene function impact of allele changes is measured relative to the cotton reference gene models as described (Paterson et al. 2012), by the following four categories: (a) Altered translation initiation site − allele changes in the first codon of a coding sequence that leads to an amino acid other than methionine; (b) altered splicing sites − allele changes disrupting the “GT–AG” conserved sites flanking introns; (c) introduced stop codons − allele changes introducing premature stop codons into the normal protein coding sequence; and (d) altered stop codons − allele changes altering stop codon to encode an amino acid.
Estimation of the size of biased-conversion tracks
Due to the large variance of measurement of the length of conversion tracks in base pairs (bp), we use number of continuous At-to-Dt conversion alleles as a measure. We searched for continuous tracks of conversion alleles both in the genome and in Highly-biased conversion region (HC). If we assume random mutation follows a Poisson distribution, the length of continuous mutations should follow an exponential distribution. Both genome-wide and HC conversion tracks show longer continuous tracks than expected (Figure S2). Conversion tracks in the HC show an excess of long ones (four to six alleles) and a deficiency of shorter ones (fewer than three alleles) than the genome-wide set. From the distribution, the average length of conversion tracks across the genome is 3.75 continuous alleles. Genome-wide, the average distance between three continuous conversion alleles is 320.3 bp and 412.6 bp for four. Estimation of the average length of conversion tracks across the genome is (320.3/3 + 412.6/4) × 3.75/2 = 394 bp. In HC, the average length of conversion tracks is 4.32 continuous alleles, with the average length of four continuous At-to-Dt conversion alleles being 600.8 bp and 647.3 bp for five. Estimation of the average length of conversion tracks in HC is (600.8/4 + 647.3/5) × 4.32/2 = 604 bp.
Unidirectional DNA exchanges between homeologous loci were the predominant mutational mechanism in the nascent Gossypium polyploid
DNA recombination, typically by reciprocal exchanges between homologous chromosomes (“crossing over”), is a central element of eukaryotic transmission genetics and is also implicated in repair of highly deleterious DNA double-strand breaks (DSBs). Reciprocal exchanges are often accompanied by tracts of unidirectional, local DNA exchanges known as “gene conversion.” Most models proposed to account for homologous DSB repair (synthesis-dependent strand annealing, classic double-strand break repair, and break-induced replication, although not single strand annealing) (Helleday et al. 2007), also predict the occurrence of tracts of unidirectional gene conversion, even when reciprocal crossing over occurs.
Building on rich evidence of concerted evolution in tandemly repeated sequences such as ribosomal RNA genes (Eickbush and Eickbush 2007) and multigene families such as primate olfactory receptor genes (Sharon et al. 1999), we recently showed gene conversion to have occurred in the past ∼400,000 years between duplicated rice genes that diverged from a common ancestor 70 million years ago (MYA) (Wang et al. 2009). While this is an extreme case, nonrandom similarity between duplicated genes widely distributed across other genomes (Chapman et al. 2006; Wang et al. 2007) suggests the phenomenon to be widespread.
During the 1–2 MY since its formation, unidirectional DNA exchanges between homeologous chromosomes have greatly outnumbered random mutations in AtDt tetraploid cotton. Mapping of 38× Illumina coverage from the A-genome species G. herbaceum to the reference D-genome v2 (G. raimondii) (Paterson et al. 2012) revealed 2,145,177 SNVs between the two, with 60% remaining unchanged in an AtDt tetraploid, G. hirsutum cultivar Acala ‘Maxxa.’ Among the 40% of changed sites, 25% now have only D-genome alleles and 10.6% have only A-genome alleles, with only ∼4.4% having new mutations. SNVs inferred (by methods previously described) (Paterson et al. 2012) to confer striking changes of gene function are even more biased, with 45% of sites changed, 34.2% to D-genome alleles and 8% to A-genome alleles (Figure 1A).
To infer the levels and patterns of occurrence of gene conversions following polyploid cotton formation (Figure 1B), we used a parsimony-based method. For comparison with Acala ‘Maxxa,’ we resequenced another G. hirsutum cultivar (GA120R1B3) from a different US production region (i.e., separated by <200 years), a wild G. hirsutum (race yucatanese) separated by ∼4500 years, and G. mustelinum, the tetraploid cotton species most divergent from G. hirsutum separated by ∼1 million yr. If a converted allele is shared by two lineages, we assume that the event occurred in the common ancestor rather than independently in each lineage. Indeed, most conversions were present in all four genomes (85.03–87.54%; Figure 1B), outnumbering omnipresent random mutations by approximately eightfold (χ2 = 23,599, 87,473 for Dt, At, P<0.001. Parsimony implies that these HeGCEs occurred prior to speciation in the nascent Gossypium polyploid.
When the polyploid Gossypium lineage diverged into multiple species, HeGCE appears to have been abating (Figure 1B). This is inferred based on the observation that HeGCEs account for only ∼62% (4921) of polymorphisms among the divergent species G. hirsutum and G. mustelinum, ∼47% between wild and cultivated forms of G. hirsutum, and 33% between G. hirsutum cultivars from different production regions (Figure 1B and Figure S1A). Consideration of the genomic distribution of HeGCEs in these various taxa suggests that it abates sooner in heterochromatin than in euchromatin (Figure S1B).
Asymmetric evolution of the polyploid cotton subgenomes
Despite inhabiting a common nucleus, the evolution of the cotton At and Dt subgenomes differs in several striking ways. First, At-to-Dt conversion is enriched in heterochromatin (Figure 2). Euchromatin, localized in the terminal regions of cotton chromosomes, shows largely similar rates of the two conversion types (log-likelihood ratio ∼0). Occasional 1-Mb bins with Dt-to-At biased conversion (log-likelihood ratio <0) unanimously reside in euchromatin.
As in many other eukaryotes (Duret and Galtier 2009) cotton gene conversion is GC biased, and this bias is closely related to the divergent evolution of the At and Dt subgenomes. The D-genome has higher GC content than the A-genome in the heterochromatin where At-to-Dt conversion is enriched, and the A-genome has higher GC content than the D-genome in the euchromatin where Dt-to-At conversion is enriched (Figure 2). The high correlation of tetraploid cotton gene conversion with the GC ratio between the two progenitor genomes (R2 = 0.506) (Figure 3A), may explain the 30% more A-to-G and T-to-C mutations in the D-genome than the A-genome since their divergence (Rong et al. 2012).
To investigate genomic features related to HeGCEs, we compared regions with highly (HC)- or little (LC)-biased conversion. Across the genome, the log-ratio of At-to-Dt vs. Dt-to-At conversions per 1-Mb bin averages 0.80 (SD = 0.46). Log ratios above 1.72 (mean + 2 SD) indicate largely At-to-Dt conversion (HC), and below −0.12 (mean − 2 SD) indicate Dt to At (LC). HC overwhelmingly locates in heterochromatin and LC in euchromatin. Heterochromatic At-to-Dt conversion is highly correlated with transposon distribution (R2 = 0.684) (Figure 3B) and enriched for potential DNA methylation sites suitable for transposon silencing (Slotkin and Martienssen 2007). Long terminal repeat (LTR), particularly gypsy-type, retrotransposons are overwhelmingly enriched in HC [>70% of all types of transposable elements, and 10.02% of total length (in base pairs) of HC vs. 2.97% of LC] (Figure 4A). A,T-to-G,C conversion, providing potential DNA methylation sites, is more frequent in transposable element (TE) than non-TE regions across the genome (Figure 4B). Further, in HC, A,T-to-G,C significantly outnumber G,C-to-A,T conversions for At to Dt (50.04 vs. 34.69%), but not Dt to At (39.95 vs. 41.26%). In LC, A,T-to-G,C conversions are similar to G,C to A,T (43.97 vs. 38.13%) for At to Dt but differ for Dt to At (47.35 vs. 32.52%) (Table 1).
In heterochromatic regions where reciprocal DNA exchanges are rare, some conversions might simply be neutral or slightly deleterious relics not yet purged due to inefficient selection, as is true of retroelement insertions (Paterson et al. 2009) and other rearrangements (Bowers et al. 2005). Longer persistence of harmful mutations in HC than LC may explain an increasing ratio (in HC) of nonsynonymous to synonymous mutations across the phylogenetic tree of the four Gossypium species studied (Figure 4C). Indeed, the overall ratio of nonsynonymous to synonymous mutations is larger in heterochromatin than euchromatin (Fisher’s exact test, P = 1.334e-6), and heterochromatin conversion tracks are longer than the genome-wide average (Figure S2). All detected conversion tracks in the genome and HC region are listed in Table S1.
Gene-altering conversions are of widespread importance
Strong evidence suggests gene-altering conversions to be of widespread functional importance, perhaps serving as an alternative to reciprocal DNA exchanges to form new allele combinations in heterochromatic regions. Heterochromatic Dt-to-At conversions are approximately five times more frequent in genes than adjacent intergenic DNA (Figure 4D; Fisher’s exact test, P = 0.0001) and At-to-Dt conversions are also gene enriched (P = 0.0157). Conversions more frequently restored gene functions in cotton heterochromatin than euchromatin (Fisher’s exact test, P = 0.0008). Among 59 HC and 206 LC alleles that experienced striking mutations in the A-genome since its divergence from the F-genome [using the D-genome as outgroup (16), including premature mutations (32 HC and 80 LC), splice site alterations (22 and 89), translation initiation alterations (3 and 19), and stop codon losses (2 and 18)], At-to-Dt conversion found in Acala ‘Maxxa’ restored function to 45.8% (27) of HC vs. only 22.3% (46) of LC SNVs. Recent evidence of widespread gene conversion in the centromere cores of maize (Shi et al. 2010) and Arabidopsis (Yang et al. 2012) further supports the importance of this mechanism.
Whole genome comparison of four naturally occurring tetraploid cottons and representatives of their progenitor genomes reveals extensive DNA exchanges accumulated during the past 1–2 MY since polyploid formation. Remarkably, the two constituent “subgenomes” of tetraploid cotton have experienced very different evolution while residing within a common nucleus, with more than twice as many conversions of At-to-Dt alleles than the reciprocal. The bias is unlikely to be caused by incomplete lineage sorting. Genetic diversity within both A- and D-genomes are estimated to be ∼10 times less than the diversity between the two genomes (Van Deynze et al. 2009). Comparison of At and Dt subgenomes of tetraploid cotton to their corresponding diploid progenitors shows small and comparable sequence divergence (Cronn et al. 2002). G. raimondii is quite narrow genetically, so there would not be much lineage sorting on the D-side (Cronn et al. 2002). The A-genome is a little more heterogeneous when considering the two A-genome species (G. arboreum and G. herbaceum) (Cronn et al. 2002). Given these considerations and keeping in mind the 5–7 MY of divergence between the A- and D-genome diploids (Senchina et al. 2003), it seems improbable that there would be segregation for many common alleles.
A key advantage of the cotton system is that polyploids have survived in the wild for 1 MY or more, effectively ruling out the hypothesis that HeGCEs are a symptom of a dysfunctional genome destined for extinction. Extensive gene conversion in the centromere cores of maize and Arabidopsis has been described using mapping populations, in which only the first several generations are observed (Shi et al. 2010; Yang et al. 2012). On the contrary, some studies show that noncrossover gene conversion is relatively rare compared to crossover-associated gene conversion in A. thaliana hybrids (Lu et al. 2012; Wijnker et al. 2013). Cytogenetic studies of recently and naturally formed polyploid species, Tragopogon miscellus and resynthesized B. napus reveal a prolonged phase of genomic instability coupled with chromosome rearrangement and translocation (Xiong et al. 2011; Chester et al. 2012). However, in each of these systems and other synthetic or recent polyploids, it is hard to assert whether the sorts of rapid responses to polyploidy that are observed (chromosome rearrangement and translocation) are the beginnings of adaptation or symptoms of pending extinction.
Frequent conversion of mostly heterochromatic At alleles by GC-rich Dt DNA may have helped to silence abundant A-genome-derived retrotransposons, perhaps stabilizing the early polyploid. It seems intuitive that D-genome alleles, from the progenitor native to the New World habitat of the polyploid, may confer many adaptations that are lacking from the Old World A-genome. However, this explanation of enrichment for At-to-Dt conversions is not consistent with the strong heterochromatic bias observed for these conversions. The approximately two times larger physical size of the A- than the D-genomes is largely due to retrotransposons, mostly in heterochromatin (Paterson et al. 2012) and some having spread to the Dt genome following polyploidy (Zhao et al. 1998). Bursts of retrotransposition following hybridization (McClintock 1983; Fontdevila 2005) can cause many DSBs (Gasior et al. 2006) that are fatal to cells if not repaired (Van Gent et al. 2001). More quickly than random mutations, GC-biased gene conversion may have provided the At genome with (Dt derived) targets for DNA methylation-based transposon silencing. Transcripts from At-derived retrotransposons may enter the RNAi pathway for RNA-dependent DNA methylation (Slotkin et al. 2009; Groszmann et al. 2011). An existing example similar to this process is the Drosophila P cytotype female that produces a repressor protein and piRNA to inhibit P-element transposition in gametes (Brennecke et al. 2008). Correlation between DNA methylation and gene conversion implies that the two are mechanistically related (Colot et al. 1996) by an as-yet-unknown process. In primates, biased gene conversion is shown as a major force for stabilizing constitutively methylated CpG islands (Cohen et al. 2011).
Meiotic recombination tends to be concentrated in small regions on chromosomes known as “recombination hotspots.” Recombination is initiated by introduction of double-strand breaks to the hotspot alleles. When a DSB occurs at a hotspot that is heterozygous with an inactive (“cold”) hotspot allele, the hotspot alleles are replaced with cold alleles by gene conversion during DNA repair. The process will cause a rapid loss of the recombination hotspot in the genome. The existence of recombination hotspots is therefore considered a “hotspot-conversion paradox” (Boulton et al. 1997). The paradox predicts small numbers of hotspots in the region with a high conversion rate. Consistent with this prediction, the high rate of conversion in the heterochromatin region identified in this study might be partially explained by the paradox.
The ability to “copy” existing alleles via HeCGEs may expedite the ability of polyploids to evolve new or more exaggerated phenotypes, by achieving allele dosages that exceed those of their progenitors. For example, Dt-to-At conversion steadily declines following polyploid formation, but accelerates during cotton domestication and improvement (Figure 1B and Figure S1). This may provide a mechanism to explain the long-standing irony that so many QTL for yield and quality of tetraploid cotton map to chromosomes derived from an ancestor (D) that lacks spinnable fibers (Jiang et al. 1998). This genetic process may complement paramutations that copy epigenetic information. In Arabidopsis hybrids, for example, methylation levels of one parental allele change to match the other (Greaves et al. 2011). Our findings show that nonreciprocal exchange of both genetic and epigenetic information may be important to the integrity of hybrid genomes.
HeCGEs may have practical value in crop improvement. The length of cotton conversion tracts averages 455 bp, ranging from 279 to 3650 bp (see Methods), somewhat longer than in mammals (Chen et al. 2007) and often spanning entire exons or occasional genes. Cotton and many other neopolyploid crops are genetically depauperate and occasional crosses with exotic or synthetic polyploids for crop improvement may be prone to bursts of transposition, DSBs, and conversion. A fascinating area for further study is whether occasional “successes” in extracting valuable alleles from exotic germplasm (e.g., Campbell et al. 2011) might have occurred by gene conversion rather than crossing over. Introgression by gene conversion might solve the widespread challenge of extracting desirable alleles from exotic germplasm while leaving behind nearby undesirable ones, i.e., with the precision of transformation but by a natural noninvasive means.
While their early evolution involved extensive intergenomic exchange, modern tetraploid cottons show strict disomy (Kimber 1961). The gradual decline of intergenomic conversion during the course of polyploid cotton evolution (Figure 1B) may have been due in part to fewer DSBs as the nascent polyploid was stabilized, perhaps by silencing of abundant A-genome-derived retrotransposons. The evolutionary success of a newly formed polyploid may require a delicate balance between genomic-stress induced novel variation (McClintock 1983) and stabilization via such quantitative factors as we suggest or qualitative factors such as the ph1 locus enforcing pairing specificity of wheat (Griffiths et al. 2006).
The G. raimondii genome sequence is in the National Center for Biotechnology Information (NCBI) with BioProject accession PRJNA171262. Other sequences are available at the NCBI short read archive for G. herbaceum (accession F1-1, SRA061660), G. hirsutum (accession GA120R1B3, SRA068148), G. hirsutum (race yucatanese, SRA068479), and G. mustelinum (SRA068485).
The authors appreciate financial support from the US National Science Foundation (IIP-0917856, IIP-1127755 to A.H.P.); United States Department of Agriculture [Agricultural Research Service (ARS)-58-6402-7-241, 58-6402-1-644, and 58-6402-1-645 to D.G.P. and ARS 6402-21310-003-00 to B.E.S.); Bayer Crop Science; the Consortium for Plant Biotechnology Research; and Cotton, Inc. (A.H.P.).
Communicating editor: J. Schimenti
- Received May 11, 2014.
- Accepted June 5, 2014.
- Copyright © 2014 by the Genetics Society of America