A large region of suppressed recombination surrounds the sex-determining locus of the self-fertile fungus Neurospora tetrasperma. This region encompasses nearly one-fifth of the N. tetrasperma genome and suppression of recombination is necessary for self-fertility. The similarity of the N. tetrasperma mating chromosome to plant and animal sex chromosomes and its recent origin (<5 MYA), combined with a long history of genetic and cytological research, make this fungus an ideal model for studying the evolutionary consequences of suppressed recombination. Here we compare genome sequences from two N. tetrasperma strains of opposite mating type to determine whether structural rearrangements are associated with the nonrecombining region and to examine the effect of suppressed recombination for the evolution of the genes within it. We find a series of three inversions encompassing the majority of the region of suppressed recombination and provide evidence for two different types of rearrangement mechanisms: the recently proposed mechanism of inversion via staggered single-strand breaks as well as ectopic recombination between transposable elements. In addition, we show that the N. tetrasperma mat a mating-type region appears to be accumulating deleterious substitutions at a faster rate than the other mating type (mat A) and thus may be in the early stages of degeneration.
THE elimination of recombination can have a dramatic effect on the evolutionary trajectory of a genomic region. Without recombination, selection acts on linked genetic complexes rather than independent genetic elements. Theory predicts the accumulation of deleterious mutations and selfish genetic elements in the absence of recombinational purging and a reduced ability to fix adaptive mutations (Charlesworth and Charlesworth 2000; Charlesworth et al. 2005).
The genetic consequences of suppressed recombination have been best studied in the sex chromosomes of outcrossing eukaryotes, e.g., plants, insects, and mammals, because the initial step in the formation of sex chromosomes is posited to be a cessation of recombination across a genomic region that includes the sex-determining locus (Charlesworth et al. 2005). The suppression of recombination across such a region will be selected for if it creates linkage between the sex-determining locus and other genes that are sexually antagonistic in that their functions are beneficial to only one of the sexes. The nonrecombining region can be formed from the spread of recombinational suppressors or structural changes to the chromosome that prevent synapsis. In either case, studies in mammals, birds, and plants have shown evidence that present-day nonrecombining regions are composed of multiple discrete blockage events that occurred at different time points in the evolutionary history of the taxon in question [termed “evolutionary strata” by Lahn and Page 1999 (Charlesworth et al. 2005)].
Because large, nonrecombining, sex-linked regions appear to have evolved independently across a diverse array of taxonomic groups, comparing the evolution of such regions across disparate taxa will make it possible to understand the evolutionary events associated with their formation as well as the genomic consequences of suppressed recombination.
In several species of fungi, the properties of the chromosomal region surrounding the mating-type locus have been studied extensively because of their similarities to the sex chromosomes of other organisms (Fraser and Heitman 2004, 2005). The mating-type locus of Cryptococcus neoformans occurs within an ∼100-kb region where recombination is suppressed due to multiple chromosomal rearrangements (Lengeler et al. 2002; Fraser et al. 2007). The evolutionary history of this region includes the accumulation of transposable elements as well as gene conversion, gene loss, and pseudogenization (Fraser et al. 2004; Metin et al. 2010). The mating-type region of Ustilago hordei resides within a 500-kb region that is characterized by suppressed recombination and chromosomal rearrangements (Lee et al. 1999) and the mating-type chromosomes of the fungus Microbotryum violaceum are heteromorphic and contain a region of suppressed recombination that has been roughly estimated to be 1000 kb in size (Votintseva and Filatov 2009). DNA polymorphism within M. violaceum populations has been compared between the recombining and nonrecombining portions of the genome; however, nucleotide variation within the nonrecombining region did not stand out from the rest of the genome due to overall low levels of polymorphism and high linkage disequilibrium (Votintseva and Filatov 2011), consistent with previous results showing that M. violaceum is predominantly selfing (Giraud et al. 2005).
The sex-determining locus of Neurospora tetrasperma (termed mat) is surrounded by a region of suppressed recombination that is ∼7-fold larger than that of M. violaceum and 70-fold larger than that of C. neoformans (Menkis et al. 2008). From the work presented here, we now know that it includes ∼2000 genes and spans a distance of ∼7.8 Mb, which represents 80% of the mating-type chromosome and approximately one-fifth of the N. tetrasperma genome. The formation of the region of suppressed recombination was part of a series of evolutionary events that allowed this species to become self-fertile and it is relatively young compared to most vertebrate sex chromosomes [< ∼4.5 MYA compared to ∼160 MYA for the XY system in marsupial and placental mammals (Veyrunes et al. 2008) and the ZW system in snakes (O’Meally et al. 2010) and >120 MYA for the ZW system in birds (Mank and Ellegren 2007)]. Due to the large size and recent origin of the nonrecombining region, N. tetrasperma has emerged as an important model for the study of early sex chromosome evolution (Merino et al. 1996; Gallegos et al. 2000; Jacobson 2005; Menkis et al. 2008, 2010) alongside other organisms with relatively young sex chromosomes such as medaka (Kondo et al. 2006), papaya (Liu et al. 2004), and several species of Drosophila (Charlesworth et al. 2005; reviewed in Fraser and Heitman 2005).
Sex in N. tetrasperma is controlled by two mating-type idiomorphs, mat a and mat A. The majority of isolates collected from nature are heterokaryotic: haploid nuclei of opposite mating type can be found in a single fungal individual, allowing the individual to be self-fertile (Raju 1992). The maintenance of self-fertility via the packaging of two nuclei of opposite mating type into a single sexual spore is known as pseudohomothallism and is usually accompanied by strict regulation of recombination between the mating-type locus and the centromere. In N. tetrasperma, a crossover in this region during meiosis can result in two nuclei of the same, rather than opposite, mating type being packaged into the sexual spore, producing progeny that are not self-fertile (Raju and Perkins 1994; Merino et al. 1996; Gallegos et al. 2000). The suppression of recombination in this region is therefore thought to have evolved to ensure the correct packaging of nuclei of opposite mating type into the sexual spore, thereby maintaining the heterokaryotic, self-fertile condition (Merino et al. 1996; Gallegos et al. 2000).
The only other pseudohomothallic ascomycete where recombination has been studied genetically is Podospora anserina, which, as a close relative, has mating-type genes that share homology and gene order with those of Neurospora (Marcou et al. 1979; Raju and Perkins 1994; Coppin et al. 1997). This fungus exhibits a contrasting approach for regulating recombination between the mating-type locus and the centromere whereby a single obligate crossover occurs and additional crossovers are suppressed (Marcou et al. 1979). The mechanism by which this occurs is unknown but, as in N. tetrasperma, it acts to ensure the correct packaging of nuclei of opposite mating type into the sexual spore (Raju and Perkins 1994).
In N. tetrasperma, the heterokaryon occasionally breaks down via the production of vegetative or sexual spores containing nuclei with only one of the two mating types (Raju 1992). The haploid, homokaryotic individual that grows from such a spore can mate with another homokaryotic individual of opposite mating type to restore the heterokaryotic condition (Raju 1992). Thus, the reproductive strategy of N. tetrasperma is likely to include repeated rounds of selfing with an occasional outcrossing event (Powell et al. 2001). Repeated selfing within a heterokaryon is also supported by previous work showing that allelic differences between mat a and mat A nuclei were confined to the nonrecombining region of the mating chromosome (Merino et al. 1996).
A considerable body of work has built upon the initial observations of Howe and Haysman (1966) that recombination was reduced on the N. tetrasperma mating-type chromosome. Raju (1992) and Raju and Perkins (1994) determined the steps during ascus development that are required for correct packaging of mat a and mat A nuclei into a single ascospore. Merino et al. (1996) and Gallegos et al. (2000) confirmed that recombination is suppressed across the majority of the mating-type chromosome and Gallegos et al. visualized the nonrecombining interval as an anomalous unpaired region visible during pachytene. Jacobson (2005) reciprocally introgressed mating-type chromosomes between N. tetrasperma and N. crassa. His results suggested that the N. tetrasperma mat a chromosome may be collinear with the N. crassa mat A chromosome while the N. tetrasperma mat A chromosome may be structurally rearranged. However, he also found evidence for the existence of genetic, rather than structural, modifiers of recombination and was unable to determine the relative contributions of these two phenomena with respect to the suppression of recombination in this region. More recently, Menkis et al. (2008) sequenced 35 genes spanning the mating-type chromosome from each of the two N. tetrasperma mat a and mat A strains used in this study. On the basis of sequence divergence between these genes, the authors predicted the existence of two evolutionary strata: one large pericentric region encompassing most of the chromosome and another much smaller region distal to the mat locus. In two other studies, Menkis et al. (2009) showed that N. tetrasperma is actually a species complex composed of nine genetically isolated lineages and found evidence of differences in the size of the nonrecombining region among these lineages (Menkis et al. 2010). Together, these results suggest that there may be important differences in the region of suppressed recombination around the mating-type locus among the different N. tetrasperma lineages (Menkis et al. 2008, 2009, 2010). Whittle and Johannesson (2011) and Whittle et al. (2011b) have investigated patterns of codon usage and nonsynonymous substitution within the region of suppressed recombination in another N. tetrasperma lineage (strain P4492, lineage 1), one different from the strain that was the main subject of Jacobson (2005), Menkis et al. (2008), and the work presented here (strain P581, lineage 6). They found evidence for relaxed purifying selection within this region in the form of substitutions from preferred to nonpreferred codons and, in a branch-specific analysis using PAML, a higher dN/dS ratio along the N. tetrasperma branch for the genes in the nonrecombining region compared to their N. crassa and N. discreta orthologs. However, these analyses assume that the location and number of nonrecombining strata in their lineage of interest (lineage 1; see Menkis et al. 2009) are the same as those reported for lineage 6 (Menkis et al. 2008), despite the previous evidence suggesting that these regions may have evolved independently (Menkis et al. 2008, 2009, 2010). While several other recent studies have investigated the molecular evolution of the strain we study here (Nygren et al. 2011; Whittle et al. 2011c; reviewed in Whittle et al. 2011a), none of these have examined the evolution and structure of the region of suppressed recombination in detail.
More than 40 years of genetic and cytological analyses involving N. tetrasperma, combined with the open question of the degree to which changes in chromosome structure are involved in the suppression of recombination, made this organism an ideal candidate for a genome sequencing project. This sequencing project was undertaken by the Joint Genome Institute and involved the sequencing and assembly of two genomes: the haploid mat A and mat a strains derived from a single heterokaryotic N. tetrasperma isolate. These are the same strains studied in Jacobson (2005) and Menkis et al. (2008) and here we use their genome assemblies to investigate the mechanisms of recombination suppression as well as the evolutionary consequences for the genes residing within the nonrecombining region. We find that the majority of the region of suppressed recombination is covered by several large chromosomal rearrangements. Additionally, we show that the young evolutionary stratum identified by Menkis et al. (2008) is located in a region where the two mating chromosomes are collinear and identify an additional stratum, created by an inversion, on the opposite end of the chromosome. We propose a model for the sequence of events and mechanisms of rearrangement that produced the current orientation and show that the region of suppressed recombination within the mat a strain appears to be in the early stages of degeneration.
Materials and Methods
Genome sequencing and assembly
Both N. tetrasperma strains (FGSC 2508 mat A and FGSC 2509 mat a) were sequenced using a hybrid approach on Roche 454 pyrosequencing and Sanger platforms (Supporting Information, Table S3 and Table S4) and assembled with Newbler. The sequencing projects have been deposited in GenBank under accession nos. AFBT00000000 (mat A) and AFCY00000000 (mat a). The N. tetrasperma mat A assembly was postprocessed to close gaps in silico using JGI gapResolution software for the entire genome and targeted finishing for the mating-type chromosome. Statistics of both assemblies are summarized in Table S2.
Both N. tetrasperma assemblies were annotated using the JGI annotation pipeline with results deposited to the integrated fungal resource MycoCosm (http://jgi.doe.gov/fungi) for further analysis. Genome assembly scaffolds were masked using RepeatMasker (Smit et al. 1996–2010) and tRNAs were predicted using tRNAscan-SE (Lowe and Eddy 1997). Several gene predictors were used on the repeat-masked assembly: (i) ab initio FGENESH (Salamov and Solovyev 2000) and GeneMark (Isono et al. 1994), (ii) homology-based FGENESH+ and Genewise (Birney and Durbin 2000) seeded by BLASTx alignments against GenBank’s database of nonredundant proteins, and (iii) direct mapping of expressed sequence tag (EST)-derived full-length genes to genome assembly. Genewise models were extended where possible, using scaffold data to find start and stop codons. EST BLAT alignments (Kent 2002) were used to extend, verify, and complete the predicted gene models. From the resulting set of models, a nonredundant representative set of best models was selected (Table S6).
All predicted gene models were functionally annotated using SignalP (Nielsen et al. 1997); TMHMM (Melen et al. 2003); InterProScan (Zdobnov and Apweiler 2001); BLASTp (Altschul et al. 1990) against nr; and hardware-accelerated double-affine Smith–Waterman alignments (deCypherSW; http://www.timelogic.com/catalog/758/decyphersw) against SwissProt (Boeckmann et al. 2003), KEGG (Kanehisa et al. 2008), and KOG (Koonin et al. 2004). KEGG hits were used to assign EC numbers (Gasteiger et al. 2003), and Interpro and SwissProt hits were used to map GO terms (Ashburner et al. 2000). Multigene families were predicted with the Markov clustering algorithm (MCL) (Enright et al. 2002) to cluster the proteins, using BLASTp alignment scores between proteins as a similarity metric.
N. tetrasperma FGSC 2508 was grown in Vogel’s liquid media (Vogel 1956) and total RNA was extracted by bead-beating in TRIzol (Invitrogen Life Science Technologies, Carlsbad, CA) with zirconia/silica beads (0.2 g, 0.5-mm diameter; Biospec Products). ESTs were sequenced using 454 pyrosequencing. The sequencing library protocol and sequence processing procedure are described in Swarbreck et al. (2011).
Whole-genome synteny was assessed between the two N. tetrasperma strains as well as between each N. tetrasperma strain and the outgroup N. crassa. Dotplots were created using the program MUMMER (Kurtz et al. 2004) to visualize synteny while more precise identification of rearrangement breakpoints was achieved using a combination of whole-genome orthology map construction with Mercator and whole-genome alignment with MAVID (Dewey 2007).
Identification of orthologs
Orthologs between the two N. tetrasperma strains as well as those between each N. tetrasperma strain and N. crassa were initially identified on the basis of best-reciprocal-BLAST hits and further verified using synteny information from Mercator. This procedure resulted in the identification of a total of 7693 single-copy orthologs between the three species.
Calculation of nonsynonymous and synonymous substitutions per site (Ka and Ks )
Ka/Ks ratios were calculated for each pair of orthologs between the two N. tetrasperma strains as well as between each N. tetrasperma strain and the outgroup N. crassa, using the modified Yang–Nielsen method in the program KaKs_Calculator (Zhang et al. 2006). It has previously been shown that the way in which the total number of synonymous sites are counted can bias the calculation of Ks (Bierne and Eyre-Walker 2003). For this reason, we also calculated the Ka and Ks values for all pairwise orthologs as in Bierne and Eyre-Walker (2003) using only fourfold degenerate sites for Ks and the physical site definition for both Ka and Ks and found similar results (Figure S2). For consistency, all Ks results reported here are from the modified Yang–Nielsen method in the program KaKs_Calculator.
Identification of pseudogenes
We used the tfasty program within the FASTA sequence comparison package (Pearson et al. 1997) to perform translated searches (allowing for frameshifts and premature stop codons) against the N. tetrasperma genome sequence. As queries, we used the subset of N. crassa proteins from within the genomic region of suppressed recombination for which we were unable to find N. tetrasperma orthologs. We used a custom Perl script to parse the results to identify matches that showed frameshifts and/or nonsense mutations. We additionally extracted the genomic sequence for candidate pseudogenes and used the program Exonerate (Slater and Birney 2005) to create genomic DNA/protein sequence alignments to confirm the presence of nonsense mutations or frameshifts.
Codon adaptation index
We used the N. tetrasperma mat A ESTs and N. crassa Illumina RNA-Seq data (Ellison et al. 2011) to identify the top 100 most highly expressed genes in each species. We used these genes to compute codon usage tables for each species using the cusp application in the EMBOSS package (Rice et al. 2000). We then used the EMBOSS application cai to calculate the codon adaptation index for each N. tetrasperma and N. crassa gene. All codon adaptation index (CAI) analyses were performed twice, once with the N. tetrasperma codon usage table and once with the N. crassa codon usage table. The results were equivalent in all cases and the reported P-values are those from the N. crassa codon usage table. We defined substitutions to preferred codons as a change to a synonymous codon whose frequency of usage in the top 100 most highly expressed genes was at least 10-fold greater than that of the ancestral codon. Similarly, we defined substitutions to unpreferred codons as a change to a synonymous codon whose frequency of usage in the top 100 most highly expressed genes was at least 10-fold less than that of the ancestral codon.
Repetitive elements were identified de novo using the program RepeatModeler (Smit and Hubley 2008–2010). The results of RepeatModeler were added to a fungal-specific repeat library downloaded from RepBase (Jurka 2000) and repetitive elements were identified on the basis of sequence homology using this library and the program RepeatMasker (Smit et al. 1996–2010). Centromeric regions in Neurospora can be easily identified because they are composed almost entirely of transposable element remnants. The numbers of repetitive elements per kilobase for the chromosomal segment within the N. tetrasperma nonrecombining region and in the homologous region in N. crassa were calculated after excluding centromeric regions so that these measures would not be confounded by differences between assemblies in the number of gaps in these repeat-dense regions.
Repeat-induced point mutation index
Repeat-induced point mutation (RIP) indexes were calculated for 500-bp sliding windows across the genome for both N. tetrasperma strains, using a custom Perl script and the composite index described in Lewis et al. (2009).
CAI and Ka:
This test was performed on both N. tetrasperma strains separately. Twenty-six genes (the number of genes within the relocated region) were drawn randomly from the set of all genes within the nonrecombining region. Each gene’s CAI was subtracted from that of its ortholog in N. crassa and the median of the differences was calculated for the random sample and compared to that of the real data (i.e., the genes within the relocated region). This procedure was repeated 10,000 times and the P-value was calculated as the proportion of random samples (of the 10,000 permutations) that had a value greater than that of the real data.
Ka permutation tests were performed similarly except the median Ka of randomly sampled N. tetrasperma mat A and mat a ortholog pairs was compared to that of the genes within the relocated region.
We compared each N. tetrasperma strain separately to N. crassa. We took the total number of transposons within each genome and shuffled their locations. We then counted the number of shuffled transposons that landed within the boundaries of the nonrecombining region in N. tetrasperma and within the homologous region in N. crassa. We subtracted the N. crassa sum from the N. tetrasperma sum and compared this difference to the true difference. The P-value was calculated as the proportion of permutations (of 10,000 total) where the permuted difference was larger than the true difference.
All permutation tests were performed using custom Perl scripts.
To determine whether chromosomal rearrangements could be responsible for the suppression of recombination, we created a whole-genome alignment between the mat a and mat A strains of N. tetrasperma as well as between each N. tetrasperma strain and the outgroup species N. crassa. We visualized large-scale synteny between these pairs using dotplots (Figure 1). Synteny is strongly conserved across all genome pairs, with the exception of those involving the N. tetrasperma mat A mating-type chromosome. Knowing that the N. tetrasperma mat a mating-type chromosome is collinear with that of N. crassa, we concluded that a series of chromosomal rearrangements had occurred on the N. tetrasperma mat A chromosome.
Focusing on the mating-type chromosome and the comparison between the two N. tetrasperma strains, we found that there have been two large inversions (∼5.3 and 1.2 Mb), a smaller inversion (68 kb), and an apparent translocation of an ∼143-kb segment from one end of the chromosome to the other (Table 1 and Figure 2). Consistent with previous cytological data, the mating-type chromosome ends are collinear and the rearrangements encompass contiguous regions within the central portion of the chromosome; however, none of the rearrangement events include the mat locus itself. We propose that the evolution of the mat A chromosome can be explained by two overlapping inversions that, together, resulted in the movement of the 143-kb segment from one end of the chromosome to the other (Figure 2). We therefore hereafter refer to this segment as the “relocated” region. The alternative explanation for this movement, that a 143-kb genomic segment was excised from one end of the mat chromosome and reinserted into the other, appears less likely for reasons explained below.
Mechanisms of rearrangement
Ectopic recombination between transposable elements is generally believed to be a common mechanism for the generation of chromosomal inversions (Casals and Navarro 2007) and its role in creating chromosomal rearrangements has been experimentally verified in yeast (Argueso et al. 2008). However, a study by Ranz et al. (2007) found no evidence for ectopic recombination in the majority of fixed inversions between species within the Drosophila melanogaster group. Instead, the authors found short duplications of nonrepetitive sequence at the breakpoints of most of the inversions they identified and proposed a novel mechanism for the generation of inversions involving staggered single-strand breaks followed by nonhomologous end joining.
We compared the two breakpoints associated with each inversion shown in our model to determine whether there was evidence for either ectopic recombination via transposable elements or the staggered break model of Ranz et al. (2007). Interestingly, we find evidence for both types of rearrangement mechanisms, adding credence to the novel mechanism proposed by Ranz et al. while also supporting the notion that ectopic recombination is a common mechanism underlying chromosomal rearrangements. It is important to note that there is another class of rearrangement mechanisms that occur in slipped, stalled, or collapsed DNA replication forks. These mechanisms include serial replication slippage (Chen et al. 2005), microhomology-mediated break-induced replication (MMBIR) (Hastings et al. 2009), and fork stalling and template switching (FoSTeS) (Lee et al. 2007). We cannot eliminate the possibility that such mechanisms created the rearrangements we observe in N. tetrasperma. However, previously described rearrangements that have been associated with these mechanisms showed a complex combination of double or triple deletions, insertions, and inversions (Mani and Chinnaiyan 2010), which are not seen here.
At the breakpoints of the 1.2-Mb inversion, we find in mat A a short duplication of a 50-bp segment that is unique in the N. tetrasperma mat a genome. The orientation of this duplication is consistent with those found in Drosophila (Ranz et al. 2007) and with our model of the rearrangement events. In N. tetrasperma, after the 1.2-Mb inversion, the duplicated segments would have been in an inverted orientation as in Drosophila (Ranz et al. 2007). The subsequent 5.3-Mb inversion, which contained the left duplication, would have then returned the leftmost duplication to the same orientation as the right duplication (Figure 2).
The 5.3-Mb inversion is flanked by inverted Mariner transposable elements. The amino acid sequences of the transposase ORFs of the two elements are ∼28% identical and ∼59% similar to that of the Pogo family Mariner-3_AN from Aspergillus nidulans (Kapitonov and Jurka 2003). Both transposase ORFs have multiple stop codons, suggesting that they are no longer active.
These sequence features also support the inclusion of the relocated region within both inversions, providing additional support for our model of how this region moved from one end of the chromosome to the other (Figure 2). We found no such sequence features supporting the alternative model, in which the inversions occurred independently from the relocation.
The small 68-kb inversion could also have resulted from ectopic recombination. Although the sequences located at the breakpoints of this inversion have no homology to any known transposable elements, both flanks contain several microsatellite regions and regions of low sequence complexity, creating several short blocks of microhomology that could facilitate ectopic recombination.
Relative ages of rearrangement events
We used the whole-genome alignment between the two N. tetrasperma strains to locate 192,225 nucleotide differences between them, >99% (190,728) of which are located within the boundaries of the nonrecombining region on the mating-type chromosome. These two strains are homokaryons that were derived from a single heterokaryotic strain. The lack of nucleotide differences across most of the genome is most likely due to repeated inbreeding of the self-fertile heterokaryotic strain (Merino et al. 1996).
If nucleotide differences have accumulated on the mating-type chromosome because of recombination suppression, the number of neutral differences should be proportional to the amount of time since the formation of the nonrecombining region. If the rearrangement events that we observed occurred at different evolutionary time points, they should have different levels of neutral nucleotide divergence with the oldest event being the most divergent.
As a measure of neutral divergence between the two N. tetrasperma strains, we calculated the number of synonymous substitutions per site (Ks) for each gene within each rearrangement event (Figure 3). The large 5.3-Mb inversion was the rearrangement event that allowed N. tetrasperma to become self-fertile by suppressing recombination along the majority of the chromosomal region between the mat locus and the centromere. We have therefore compared the distribution of Ks values for genes within the large inversion to those distributions for each of the other rearrangement events.
We found no significant difference between Ks values for the 5.3-Mb inversion and the 1.2-Mb inversion, nor is there a significant difference between Ks values for the 5.3-Mb inversion and the mat proximal block (the 218-kb chromosomal region between the mat locus and the leftmost rearrangement event; see Figure 3), suggesting that the suppression of recombination in the 5.3-Mb and 1.2-Mb inversions occurred at the same evolutionary time and that these inversions also suppressed recombination in the mat proximal block (Bonferroni-corrected Mann-Whitney U (MWU) tests: P = 0.6 and P = 1, respectively; Figure 3).
We found two chromosomal regions whose Ks values were much smaller compared to those from the 5.3-Mb inversion: the mat distal block (the 850-kb chromosomal region between the recombining left chromosome arm and the mat locus; see Figure 3) and the 68-kb inversion (Bonferroni-corrected MWU tests: P = 1.1e-15 and P = 4.4e-05, respectively; Figure 3). The suppression of recombination in these regions most likely occurred after the rearrangement event that created the large inversion.
Surprisingly and counter to our model of the order of rearrangement events, we found that the Ks values for genes within the relocated region were much larger than those for genes within the 5.3-Mb inversion (Bonferroni-corrected Mann-Whitney U (MWU) test: P = 4.33e-06; Figure 3). One interpretation of this result is that recombination was suppressed in this region much earlier than that in the 5.3-Mb inversion. The median Ks values between the N. tetrasperma mat A and mat a strains for the genes within the relocated region are ∼2.7 times greater than those for the next most divergent regions. If the substitution rate was the same between these chromosomal regions, the suppression of recombination within the relocated region would have had to occur at an evolutionary time point that was more than twofold earlier than the other rearrangement events. However, the divergence between N. tetrasperma mat A and its close relative N. crassa is only ∼1.3-fold greater than the divergence between the two N. tetrasperma strains (median Ks between N. tetrasperma and N. crassa for genes within the 5.3- and 1.2-Mb inversions, 0.070; median Ks for the same genes between the N. tetrasperma strains, 0.054). This sequence of events would place the time point of the relocation well before the divergence of the species. Given that only one of the species now exhibits the relocation, this scenario is very unlikely.
Investigating the large sequence divergence of the relocated region
The alternative interpretation for the large sequence divergence of the relocated region is that it has been experiencing an elevated substitution rate relative to the rest of the nonrecombining region. However, because we used only synonymous codon positions to calculate divergence, the elevated rate of substitution would have to pertain to nucleotide substitutions that did not change the amino acid, rather than an elevated rate of protein evolution. One possibility is that the elevated synonymous substitution rate is due to relaxed selection for codon usage. Another is that some of the annotated genes within this region are actually pseudogenes and thus, because they are completely unconstrained by purifying selection, are accumulating nucleotide substitutions at a faster rate than synonymous positions. To address the possibility that the divergence measures for this region are biased by pseudogenes, we repeated the analysis on a subset of genes from the region for which we had evidence of expression from the mat A ESTs. The increased divergence was also present in the subset of genes showing evidence of expression, suggesting that the potential inclusion of pseudogenes in our analysis was not responsible for the increased divergence we observed. Obviously, such a filter is not infallible because a minority of pseudogenes have been shown to be transcribed on the basis of profiling of mRNA/ESTs from a variety of organisms [between 2% and 15% of pseudogenes studied in rice (Zou et al. 2009), Arabidopsis (Zou et al. 2009), humans (Zheng et al. 2007), and yeast (Lafontaine and Dujon 2010)]. Assuming Neurospora also follows this general pattern, such a filter, although incomplete, would nevertheless eliminate the majority of pseudogenes from this region.
To test the hypothesis that relaxed selection for codon usage is occurring in the relocated region, we calculated the codon usage in the 100 most highly expressed genes in N. crassa. We used these values to calculate the CAI for each one-to-one ortholog between N. crassa and the two N. tetrasperma strains. CAI values range from 0 to 1 with higher values indicating more codon usage bias (Sharp and Li 1987). To control for any bias resulting from using N. crassa highly expressed genes to identify favored codons, we repeated this analysis with codon usage information gleaned from the 100 most highly expressed genes in N. tetrasperma mat A (as determined by EST coverage) and obtained similar results (Figure S1).
We compared the CAI values for genes from each N. tetrasperma strain to their ortholog in N. crassa. We found that the genes within the chromosomal region that was relocated in N. tetrasperma mat A have a median CAI that is lower than that of their N. crassa orthologs (one-sided permutation test: N. tetrasperma mat A, P = 0.0016; using only the subset of genes with ESTs, P = 0.0023) (Figure 4). This situation is not seen for the other rearrangement events; to the contrary, the genes within the 68-kb inversion appear to be evolving higher codon usage bias in both N. tetrasperma strains [one-sided permutation test: P = 0.026 (mat A), P = 0.003 (mat a); Figure 4]. Across the 7693 ortholog pairs that we identified in the N. tetrasperma and N. crassa genome comparisons, we found slightly more pairs where the N. crassa ortholog had a larger CAI value, for both N. tetrasperma mating types [one-sided binomial test: P = 0.0001 (mat A), P = 4.871e-08 (mat a); Table 2].
Given that most amino acid-changing mutations are likely to be deleterious, it is reasonable to expect that relaxed selection across a genomic region would result in an increase in the number of nonsynonymous substitutions per site (Ka). To minimize inflation of Ka due to the inclusion of pseudogenes or misannotated genes in the analysis, we used only genes that had evidence of expression in the form of ESTs.
We found that the genes located within the relocated region of the N. tetrasperma mat A strain as well as their orthologs in N. tetrasperma mat a both have a median Ka that is greater than that of the nonrecombining region as a whole (one-sided permutation test: N. tetrasperma mat A, P = 0.005; N. tetrasperma mat a, P = 0.013; Figure 5). Together, the elevated Ka and the reduced CAI of the genes in the relocated region suggest that it has experienced reduced purifying selection in the N. tetrasperma lineage since its divergence from N. crassa.
Evidence for asymmetrical degeneration within the nonrecombining region
The origin of the N. tetrasperma region of suppressed recombination is ∼37 times younger than that of the XY sex chromosomes of marsupial and placental mammals (∼4.5 MYA vs. ∼166 MYA) (Veyrunes et al. 2008). Unlike mammals, it is possible for a functionally diploid (heterokaryotic) N. tetrasperma individual to become haploid (homokaryotic). These haploid individuals grow and reproduce via mitotic spores in the laboratory and there is evidence of outcrossing between them in the wild (Menkis et al. 2009; Powell et al. 2001). These observations suggest that there should be selection to maintain the function of both copies of the sex-linked genes and one would therefore expect the nonrecombining regions to be maintained intact. However, the two strains that we examine here are almost identical across their genomes except for the nonrecombining region, implying a long history of inbreeding, and previous studies have observed high instances of sexual dysfunction when N. tetrasperma strains are outcrossed in the laboratory (Jacobson 1995; Saenz et al. 2001). These results suggest that the degree of outcrossing in nature may be limited, in which case selection against degeneration within the nonrecombining regions of both mating types may be reduced.
One signal of degeneration that has been found previously in N. tetrasperma (Whittle and Johannesson 2011; Whittle et al. 2011b), as well as in regions of reduced recombination in many other organisms (Kliman and Hey 1993; Betancourt and Presgraves 2002; Bachtrog 2003; Liu et al. 2004; Peichel et al. 2004; Nicolas et al. 2005; Presgraves 2005; Haddrill et al. 2007; Marais et al. 2008; Zhou et al. 2008; Betancourt et al. 2009), is the accumulation of deleterious alleles. The elimination of recombination across a genomic region will, in essence, lower the effective population size of that region, making fixation more likely for slightly deleterious mutations and less likely for slightly beneficial ones. Given that most amino acid-changing substitutions are deleterious, suppression of recombination should result in an increased proportion of nonsynonymous substitutions compared to synonymous substitutions (Ka/Ks) across the nonrecombining region (Charlesworth and Charlesworth 2000).
The previous results from a different lineage within the N. tetrasperma species complex found a higher Ka/Ks ratio for a set of N. tetrasperma genes located within the nonrecombining region but not for a set of genes from outside of this region, consistent with the genes within the nonrecombining region being under relaxed purifying selection (Whittle and Johannesson 2011; Whittle et al. 2011b). To determine whether there is evidence supporting this prediction for the genes within the nonrecombining region of this N. tetrasperma lineage, we calculated Ka and Ks for all ortholog pairs between each N. tetrasperma strain and N. crassa and compared the Ka/Ks ratios for the genes within the nonrecombining region to those for the genes outside it.
Consistent with the prediction of Charlesworth and Charlesworth (2000) and the previous results from another N. tetrasperma lineage (Whittle and Johannesson 2011; Whittle et al. 2011b), we found that the genes within the nonrecombining region of the mat a strain have a significantly higher median Ka/Ks ratio than the genes outside of the nonrecombining region, but this pattern did not hold for the genes from the mat A strain (MWU test: P = 0.001 and P = 0.6, respectively; Figure 6 and Figure S2). One explanation for this is that the mat a region may be accumulating deleterious alleles at a faster rate than the mat A region. To further explore this possibility, we compared the accumulation of deleterious mutations within the nonrecombining region in the form of pseudogenes, codon usage, and nonsynonymous substitutions between the two N. tetrasperma mating types. For the pseudogene analysis, we identified candidate pseudogenes from the set of N. crassa predicted proteins for which we were unable to find an ortholog in the N. tetrasperma set of predicted peptides. Confidently identifying pseudogenes can be difficult because misannotations in the N. crassa genome that incorrectly identify start sites or incorrectly predict ORFs that are not actually transcribed can make the homologous region in N. tetrasperma appear to be a pseudogene. For these reasons, we required a candidate pseudogene in N. tetrasperma to have either a frameshift or a nonsense mutation, full-length homology to functional genes in both N. crassa and N. discreta, and evidence of expression in N. crassa. Applying these conservative criteria, we found a total of 10 candidate pseudogenes: 2 with nonsense and/or frameshift mutations in both N. tetrasperma genomes, 5 with such mutations only in the N. tetrasperma mat a genome, and 3 appearing only in the N. tetrasperma mat A genome (Table S1). While it is notable that we observed more pseudogenes on the mat a chromosome compared to mat A, the sample size (10 pseudogenes in total) is not large enough to confidently conclude that this represents evidence of asymmetrical degeneration.
As an additional approach to assess evidence of asymmetrical degeneration in one of the two N. tetrasperma sex-linked regions, we used N. crassa as an outgroup to assign the nucleotide substitutions that we identified within the nonrecombining region to one of the two N. tetrasperma lineages. After normalizing by the total number of nucleotide substitutions, we compared the frequencies of each nonsynonymous substitution (at the codon level) between the two N. tetrasperma strains and found that nonsynonymous substitutions have occurred at higher frequencies on the N. tetrasperma mat a lineage (one-sided, paired MWU test: P = 2.749e-15; Figure 7A), suggesting that the N. tetrasperma mat a strain may be accumulating deleterious substitutions at a higher rate than the mat A strain.
We also used the codon usage table mentioned previously to identify synonymous changes involving the substitution of a more preferred codon to a less preferred codon and vice versa. After normalizing by the total number of synonymous substitutions within each lineage, we found a tendency for substitutions in N. tetrasperma mat a that involve a change to an unpreferred codon to have occurred at higher frequencies, although this difference is not significant at α = 0.05 (one-sided, paired MWU test: P = 0.072; Figure 7B). We also found that substitutions involving a change to a preferred codon have occurred at lower frequencies compared to those in N. tetrasperma mat A (one-sided, paired MWU test: P = 0.039; Figure 7B). Together, these results suggest that the N. tetrasperma mat a nonrecombining region may be in the early stages of degeneration and are consistent with observations that, within a heterokaryotic N. tetrasperma individual, mat A nuclei outnumber mat a nuclei during growth and early sexual development (H. Johannesson, personal communication).
Suppression of recombination in other systems has often been accompanied by the accumulation of transposable elements. We identified de novo repetitive elements as well as those with homology to known fungal elements in both N. tetrasperma strains and used a permutation test to determine whether the nonrecombining region is enriched for transposons relative to the rest of the genome. Interestingly, the mat A strain has significantly more transposons in the genomic region where recombination is suppressed compared to the rest of the genome (P = 0.0004), but the mat a strain does not (P = 0.30) (Figure 8).
In this study we have used two high-quality genome assemblies, representing the nuclei of opposite mating type derived from a single heterokaryotic strain, to discover a series of three inversions within the N. tetrasperma region of suppressed recombination. The location of these rearrangements and the collinearity of the chromosome ends are consistent with the cytology, genetic map data, and sequence divergence data from previous studies that have investigated the region of suppressed recombination. The identification of these rearrangements answers the question raised by Jacobson (2005) of the relative influence of structural vs. genetic modifiers in maintaining this nonrecombining region: while we show that structural rearrangements encompass most of the nonrecombining region, the mat a and mat A chromosomes are collinear in the chromosomal region surrounding the mat locus (Figure 3). This region includes the more recent evolutionary stratum identified in Menkis et al. (2008) (the ∼850-kb mat distal block) as well as the mat proximal block (the ∼218-kb chromosomal segment between the mating-type locus and the first chromosomal rearrangement). Genetic evidence indicates that this segment is within the nonrecombining region, and the presence of sequence divergence within this region (Figure 3) implies that it has long remained so, but it is unclear why recombination is suppressed. However, the mat a and mat A loci do not share sequence homology and, in addition, we found that within each strain, the mat locus is flanked by regions that have been subject to Repeat Induced Point mutation (RIP) (Galagan and Selker 2004).
RIP is a genome-defense mechanism that subjects recently duplicated sequences to numerous G:C to A:T point mutations and is thought to have evolved to suppress the amplification of transposable elements. A side effect of this defense mechanism is the elimination of young, highly similar, gene paralogs (Galagan and Selker 2004). In the sex chromosomes of mammals and Drosophila, intrachromosomal gene conversion between such young duplicates has been hypothesized to play a role in rescuing Y-linked genes from degeneration (Rozen et al. 2003; Connallon and Clark 2010). Because RIP constrains gene duplication, we do not expect a similar phenomenon to be occurring in N. tetrasperma.
It is possible that the RIPed regions, together with the idiomorphic mat loci, create islands of sequence divergence that disrupt synapsis in this region and thus cause the suppression of recombination to extend past the distal portion of the mat locus. Other nonstructural mechanisms that have been shown to suppress recombination include genetic modifiers such as the rec genes of N. crassa (Catcheside 1975), overall high levels of sequence divergence (rather than several discrete islands) (Hunter et al. 1996), and DNA methylation (Maloisel and Rossignol 1998). Additionally, because this region is located at the end of the nonrecombining interval, it may simply be that there have been fluctuations in the exact location of the boundary of the nonrecombining region over evolutionary time. At this time, we are unable to differentiate between these possibilities.
By combining our analysis of the relative timing of the rearrangement events with the analysis of sequence features associated with the boundaries of these events, we have formulated a cohesive model for the structural evolution of the N. tetrasperma mating-type chromosome (Figure 2). On the basis of sequence divergence, the last rearrangement event (a 68-kb inversion) occurred more recently than the first two and therefore represents a second evolutionary stratum. These results, along with those from other fungal phyla (Fraser et al. 2004; Idnurm et al. 2008), suggest that inversion-mediated suppression of recombination on mating chromosomes and the expansion of such nonrecombining regions as discrete strata have occurred independently in many fungal lineages and may be a common property of the evolution of sex in fungi, similar to the sex chromosomes of other taxa (Charlesworth et al. 2005; Fraser and Heitman 2005).
An additional similarity between the N. tetrasperma nonrecombining region and the sex chromosomes from other systems is that it may be under reduced purifying selection. The nonrecombining region in the N. tetrasperma mat A strain has accumulated an excess of transposons compared to the rest of genome and there is evidence that the N. tetrasperma mat a strain may be in the early stages of degeneration (Figures 6–8). Our finding that, across the genome, there is a slight but significant tendency for N. crassa genes to have a higher CAI value compared to the orthologs in both N. tetrasperma mating types (Table 2) is consistent with what would be expected given the evidence for extensive selfing within the N. tetrasperma heterokaryon. However, this result contrasts with that reported by Whittle et al. (2011c). In a comparison of genome-wide codon usage between the same N. tetrasperma mat A strain we study here and the outcrossing species N. discreta, the authors found that N. tetrasperma has a higher frequency of optimal codon usage. Whittle et al. hypothesize that this result may be due to N. discreta having an effective population size that is smaller than that of N. tetrasperma, despite the fact that N. discreta is not self-fertile. While estimates of effective population size have been inferred for two populations of N. crassa (Ellison et al. 2011), none exist for N. discreta. Further population-level work involving the comparison of effective population size between these three species would shed light on this inconsistency.
It is unclear why we observe an accumulation of transposons only on the mat A chromosome. One explanation is that we identified fewer transposons in the mat a strain because its genome assembly has a higher proportion of sequence bases in gaps compared to the mat A strain (Table S2). It is possible that the additional gaps in the mat a assembly are caused by a reduced ability to assemble repetitive elements due to differences in sequence coverage (Sanger vs. 454 as well as single end vs. paired end; Table S3 and Table S4) between the two genome assemblies. Alternatively, we speculate that the additional transposable elements in the N. tetrasperma mat A strain could be the result of “genome shock” (McClintock 1984): a single burst of activity specific to the mat A nucleus due to the release of transposon suppression during the reorganization of the mating-type chromosome. Of course, it is equally plausible that the relative timing of the chromosome reorganization and the transpositions was reversed, such that increased transposon activity enabled the 5.3-Mb inversion that led to self-fertility in N. tetrasperma, thereby linking the additional transposon copies to the new chromosome orientation.
Our finding that the mat a chromosome is accumulating deleterious alleles at a faster rate than the mat A chromosome is also somewhat unexpected and stands in contrast to previous theory (Bull 1978). N. tetrasperma individuals can grow as haploid homokaryons that are capable of outcrossing and acting as a maternal or paternal parent. Recessive deleterious alleles would not be sheltered under these conditions and, if N. tetrasperma exists often as a homokaryon in nature, there should be selection to maintain both copies of the sex-linked genes. However, there is evidence that outcrossing may be limited in nature (Jacobson 1995), in which case degeneration would be reasonable because most individuals would not leave their heterokaryotic, sheltered state. Additionally, asymmetric evolution between sex-determining chromosomes, at least with respect to differences in chromosome size, has been observed in other haploid organisms such as the liverwort Marchantia polymorpha (Yamato et al. 2007) and the fungus M. violaceum (Hood 2002). In neither of these systems, unfortunately, have the chromosomes of both mating types been sequenced, making it impossible to determine whether the size difference is due to asymmetric gain or loss (i.e., degeneration) of genetic elements.
While Whittle et al. (2011b) report an asymmetry between the mat a and mat A chromosomes in terms of the frequency of substitutions to preferred codons in lineage 1 of the N. tetrasperma species complex, a different lineage from that studied here, they do not report whether the deviation is statistically significant and a reanalysis of their results shows that it is not (Fisher’s exact test: P = 0.285; Table S5). Additionally, there does not appear to be a similar asymmetry with respect to nonsynonymous substitutions in lineage 1 (Whittle and Johannesson 2011). That we are able to observe such an asymmetry in both nonsynonymous substitutions and codon usage between the sex-linked regions in this N. tetrasperma lineage may be due to the larger sample of genes we use here (∼1300 genes from within the nonrecombining region compared to 168 in Whittle and Johannesson 2011 and 228 in Whittle et al. 2011b) or to differences between the lineage studied in Whittle and Johannesson (2011; Whittle et al. 2011b) (lineage 1) and that studied here (lineage 6).
Future work comparing the structural rearrangements identified here to the other N. tetrasperma lineages would address the hypothesis of independent origins of this region of suppressed recombination and lead to further insight into its evolutionary history. The examination of sequence data from N. tetrasperma populations within these lineages would be useful for assessing the effect of reduced recombination on nucleotide diversity. Additionally, allele-specific gene expression experiments would be an ideal approach to determine whether there is evidence that the genes within the mat A nonrecombining region are evolving increased expression to compensate for the increase in deleterious substitutions that are occurring within the mat a nonrecombining region.
We thank Deborah Charlesworth for comments on a previous version of this manuscript and Brian Charlesworth for helpful discussion. This work was supported by National Science Foundation grant DEB-0516511 (to J.W.T.), National Institutes of Health–National Institute of General Medical Sciences grant R01RGM081597 (to J.W.T.), and the Chang-Lin Tien Graduate Fellowship (to C.E.E.). The work conducted by the U.S. Department of Energy Joint Genome Institute is supported by the Office of Science of the U.S. Department of Energy under contract no. DE-AC02-05CH11231.
Available freely online through the author-supported open access option.
Supporting information is available online at http://www.genetics.org/content/suppl/2011/07/12/genetics.111.130690.DC1.
- Received May 16, 2011.
- Accepted June 25, 2011.
- Copyright © 2011 by the Genetics Society of America
Available freely online through the author-supported open access option.