Homosporous ferns have extremely high chromosome numbers relative to flowering plants, but the species with the lowest chromosome numbers show gene expression patterns typical of diploid organisms, suggesting that they may be diploidized ancient polyploids. To investigate the role of polyploidy in fern genome evolution, and to provide permanent genetic resources for this neglected group, we constructed a high-resolution genetic linkage map of the homosporous fern model species, Ceratopteris richardii (n = 39). Linkage map construction employed 488 doubled haploid lines (DHLs) that were genotyped for 368 RFLP, 358 AFLP, and 3 isozyme markers. Forty-one linkage groups were recovered, with average spacing between markers of 3.18 cM. Most loci (∼76%) are duplicated and most duplicates occur on different linkage groups, indicating that as in other eukaryotic genomes, gene duplication plays a prominent role in shaping the architecture of fern genomes. Although past polyploidization is a potential mechanism for the observed abundance of gene duplicates, a wide range in the number of gene duplicates as well as the absence of large syntenic regions consisting of duplicated gene copies implies that small-scale duplications may be the primary mode of gene duplication in C. richardii. Alternatively, evidence of past polyploidization(s) may be masked by extensive chromosomal rearrangements as well as smaller-scale duplications and deletions following polyploidization(s).
THE vascular plants of the world consist of two major groups—those that are homosporous and those that are heterosporous. The heterosporous vascular plants are more familiar because the vast majority of them are the well-studied flowering plants (angiosperms). Homosporous vascular plants (∼10,000 species, Smith 1972) differ from heterosporous plants in producing a single kind of spore that germinates into a free-living bisexual gametophyte, and the vast majority of homosporous vascular plants (some 9500 species) are the ferns. All cells of a homosporous fern's bisexual gametophyte are mitotically derived from the single-celled spore from which the gametophyte germinates. Thus the sperm and egg of a single gametophyte are genetically identical, and self-fertilization (intragametophytic selfing) yields a sporophyte that is immediately homozygous at all loci on its homologous chromosomes.
Homosporous ferns are famous for their high chromosome numbers. The average chromosome number in homosporous ferns was estimated to be n = 57.05, as opposed to n = 15.99 in angiosperms (Klekowski and Baker 1966). Angiosperm species with chromosome numbers >n = 14 are generally considered to be polyploid (Grant 1981). If this rule were applied to homosporous ferns, 95% of the species would be considered polyploid (Wagner and Wagner 1980; Grant 1981). Sequence-based data now show that even plants with simple genomes such as Arabidopsis thaliana (e.g., Vision et al. 2000; Blanc et al. 2000, 2003) and rice (Yu et al. 2005) appear to have experienced polyploidization event(s).
Enzyme electrophoresis has been used to test the hypothesis that ferns are polyploids. Multiple enzyme bands observed in an initial electrophoretic study of bracken fern were taken as evidence of multiple gene copies resulting from paleopolyploidy (Chapman et al. 1979). However, subsequent enzyme electrophoretic analyses of homosporous ferns showed that some multibanded enzyme patterns result from segregating diploid Mendelian heterozygosity at single loci (Gastony and Gottlieb 1982) and others from subcellularly compartmentalized isozymes (Gastony and Darrow 1983) well known in angiosperms, conclusions subsequently validated by reanalysis of bracken fern (Wolf et al. 1987). Electrophoretic analysis of naturally occurring fern populations also revealed diploid gene expression of enzyme-encoding loci (Haufler and Soltis 1984; Gastony and Gottlieb 1985). These findings discrediting the paleopolyploid hypothesis in homosporous ferns were subsequently generalized for the homosporous vascular plants as a whole (Haufler and Soltis 1986; Soltis and Soltis 1987, 1988a,b; Soltis et al. 1988). Thus isozyme studies showed that despite their high chromosome numbers, homosporous ferns with the lowest chromosome number in their genus (n = 27–52) have gene expression patterns typical of diploid angiosperms.
This paradoxical combination of high chromosome numbers and diploid gene expression has been explained in two ways: (1) homosporous ferns initially had high chromosome numbers, or (2) duplicated genes resulting from ancestral polyploidization events have been silenced. The former hypothesis cannot be rejected (Duncan and Smith 1978), but it is not generally accepted (Wagner and Wagner 1980; Soltis and Soltis 1987), in part because mechanisms that would generate a large number of chromosomes are not clear. Although chromosomal fission may explain the high chromosome numbers in homosporous ferns, there is no evidence that chromosomes of homosporous species are significantly reduced in size compared to those of heterosporous species (reviewed in Walker 1979, 1983). The alternative hypothesis, that paleopolyploidy has been followed by chromosomal diploidization and silencing of duplicated genes, was clearly stated by Haufler (1987). He suggested that homosporous ferns may have acquired the combination of high chromosome numbers and diploid gene expression through repeated cycles of polyploidization followed by extensive gene silencing. This hypothesis received support from an isozyme demonstration that the duplicated expression of phosphoglucoisomerase in neotetraploid Pellaea rufa progressively diminished to a diploid level in different populations of this homosporous fern (Gastony 1991). Gene silencing was also observed in a sequencing study of chlorophyll a/b binding protein (CAB) genes in the homosporous fern species Polystichum munitum, where several copies of this high-copy-number gene have mutated to inactive states (Pichersky et al. 1990). It has become increasingly clear that gene silencing of duplicated genes occurs frequently by various mechanisms including pseudogenization and the regulation of gene expressions (reviewed in Osborn et al. 2003 and Adams and Wendel 2005).
If Haufler's hypothesis is correct, multiple (but mostly silenced) copies of genes should be detected in a paleopolyploid genome. Hybridization of low-copy cDNA probes to the genomic DNA of two species of the homosporous fern genus Ceratopteris (one diploid and one tetraploid, as based on the lowest extant chromosome numbers in the genus) demonstrated that most tested probes hybridized to four or more restriction fragments (McGrath et al. 1994). Furthermore, fluorescent in situ hybridization (FISH) studies detected two major rDNA loci and six more weakly hybridizing signals in the genome of the diploid homosporous fern Ceratopteris richardii (McGrath and Hickok 1999). These studies are consistent with the view that the genome of homosporous ferns consists of duplicated genes that arose through past polyploidization events.
Evidence from previous studies, however, is not sufficient to show that duplicated genes are the product of polyploidization events, because several other mechanisms, such as internal duplication of chromosomal segments, cDNA synthesis by reverse transcriptase (Ohshima et al. 1992), or formation of tandem repeats by unequal crossing over, gene conversion, and other illegitimate recombinations (Arnheim 1983; Li 1997; Liao 1999; Ohno 1970), can also explain the phenomenon. Polyploidization is a genome-level process, and it is difficult, if not impossible, to evaluate its occurrence on the basis of studies of individual loci. Construction of a genetic linkage map, such as that reported here, however, not only can reveal the numbers of each duplicated gene copy (paralogue), but also can identify homeologous chromosomes or chromosomal segments. Thus, linkage mapping addresses the question of paleopolyploidy by asking whether gene duplication is a genomewide phenomenon or is restricted to certain genes or chromosomal segments. Strongest evidence of past polyploidization events is obtained when all or most genes have similar numbers of paralogues and occur in the same order along homeologous chromosomes or chromosomal segments (Gaut 2001). On the other hand, segmental duplication or tandem repeat formation is a more parsimonious explanation when copy number varies widely for different genes or when paralogues are located adjacent to each other rather than on homeologous chromosomal segments. Perhaps the clearest evidence of ancient polyploidization was derived from linkage maps of Brassica nigra (Lagercrantz and Lydiate 1996) whose genome was shown to consist largely of three complete sets of genes. These data suggest that B. nigra is an ancient polyploid, possibly a hexaploid (Lukens et al. 2004), although full sets of homeologous chromosomes are no longer recognizable due to chromosomal rearrangements. Other examples of paleopolyploidy inferred from genetic linkage maps include sorghum (Chittenden et al. 1994; Bowers et al. 2003), maize (Helentjaris et al. 1988), cotton (Reinisch et al. 1994; Rong et al. 2004), and soybean (Shoemaker et al. 1996). All of these are angiosperm (heterosporous) species.
Here, we report the construction of a high-resolution genetic linkage map of the homosporous fern species C. richardii using a large population of doubled haploid lines (DHLs). Our goals are to test the hypothesis of paleopolyploidy in homosporous ferns, as well as to provide permanent genetic resources for this neglected lineage of vascular plants. Restriction fragment length polymorphisms (RFLPs) are used to determine the number of gene copies and their distribution in the genome, whereas amplified fragment length polymorphisms (AFLPs) and isozyme markers are employed to increase saturation of the linkage map. The following two questions are addressed:
Are the majority of genes duplicated? If so, what is the average number of copies for each gene? Do most genes have a similar number of copies, or do they vary widely in copy number?
How are duplicated genes distributed across the genome? Are entire sets of genes duplicated and recognizable as homeologous chromosomal segments, or are paralogues scattered haphazardly across the genome?
MATERIALS AND METHODS
Ceratopteris comprises several herbaceous species that are distributed throughout the tropics and subtropics of the world. It is one of the few fern genera with a nonperennial life history, and plants typically occur in aquatic to semiaquatic habitats such as river banks and ponds (Lloyd 1974). Our study system, C. richardii Brongn., is a putatively diploid species with 39 chromosome pairs that is native to the tropics of Central to South America (Lloyd 1974). The species has recently been promoted as a model system for ferns (Hickok et al. 1987, 1995; Hickok and Warne 1998; http://cfern.bio.utk.edu) because of several desirable features for experimental studies, including relatively short generation time (∼90 days), ease of cultivation in a common greenhouse environment, and ease of mutagenization and screening of a large number of individuals for mutants (Hickok et al. 1995).
Plant materials and DNA extraction:
The parental strains of the mapping population, Hα-PQ45, the paraquat-tolerant mutant (Hickok and Schwarz 1986) of Hn-n from Cuba (Killip 44595, GH) and ΦN8 from Nicaragua (Nichols 1719, GH), were kindly provided by Leslie Hickok (University of Tennessee), and their gametophytes were cross-fertilized to generate the F1 hybrid using Hα-PQ45 as a maternal parent. DHLs were generated by isolating gametophytes grown from spores of the F1 hybrid and allowing them to grow to maturity and self-fertilize in the presence of distilled water. Gametophyte culture and fertilization were carried out according to the C-Fern Manual (Hickok and Warne 1998). All sporophytes were grown under continuous lighting in the greenhouses at Indiana University (Bloomington, IN). Mature spores of each DHL were collected and stored at room temperature and are available upon request to investigators interested in studies using the mapping population. Genomic DNA from sporophytic tissues was extracted by the CTAB method (Doyle and Doyle 1987), purified with CsCl centrifugation (Weising 1995), quantified using SPECTRAmax 190 (Molecular Devices, Sunnyvale, CA), and stored at −80° until used.
Sample preparation for RFLP genotyping:
Five micrograms of genomic DNA from each sample were digested with EcoRI or HindIII (Roche, Indianapolis) according to the manufacturer's instructions. Digested fragments were run on 0.8% agarose gels in 1× TAE at 1 cm/V for 16 hr at room temperature. Depurination, denaturation, neutralization, and neutral capillary transfer to Nylon membranes (Roche) were carried out according to standard protocols (Sambrook and Russell 2001). Membranes were UV-crosslinked using the “Autolink” option of Stratalinker (Stratagene, La Jolla, CA) and stored in 2× SSC until used.
Probes were prepared from two sources: (a) a cDNA library of gametophytic tissues packaged in the pCMV·SPORT 6 vector (Invitrogen, Carlsbad, CA) prepared by Jo Ann Banks (Purdue University) and (b) a cDNA library of sporophyll tissues packaged in the Lambda ZAP II vector (Stratagene) prepared by Jeffrey Hill (Idaho State University). Colonies or plaques with unique cDNA inserts were isolated and stored in a 100-μl storage medium according to standard protocols (Sambrook and Russell 2001). Previously, >5000 ESTs had been sequenced from the gametophytic cDNA library (Salmi et al. 2005) and have been deposited in GenBank. Prior to genotyping, these EST sequences were BLASTed against themselves using the NCBI blastn program (Altschul et al. 1990, 1997) to ensure uniqueness of these sequences and were BLASTed against A. thaliana EST sequences using the NCBI tblastx program with the BLOSSUM 62 matrix. Only unique cDNA clones with significant sequence similarity to A. thaliana ESTs (E < 1e−10) were selected for genotyping. Selected cDNA clones were further screened by test Southern hybridizations with the parents and five DHLs, and only clones with scoreable polymorphic banding patterns were used for genotyping. Because the sequences of sporophytic cDNA clones were not known prior to genotyping, uniqueness of these clones was checked on the basis of the banding patterns from the test screens. Sporophytic ESTs used in this study were subsequently sequenced and deposited in GenBank (nos. DW177051–DW177090). EST clones used as RFLP markers in this study and their putative function based on BLAST search against A. thaliana EST sequences (E < 1e−10) are listed in supplemental Table 1 at http://www.genetics.org/supplemental/.
PCR labeling of probes was carried out in a 100-μl reaction mixture containing 30 mm tricine, 50 mm KCl, 2 mm MgCl2, 5% acetamide, 100 μm of each d[A,C,G]TP, 80 μm dTTP, 20 μm DIG--dUTP, alkali-labile (Roche), 3 μl of Taq polymerase, 0.2 mm primers designed for the flanking sequences of the vector inserts, and 2 μl of diluted colonies or plaques that were denatured at 95° for 5 min. PCR reactions were performed in an MJ Research thermal cycler (MJ Research, Watertown, MA) with the following program: initial denaturation for 60 sec at 94° followed by 35 cycles of 30 sec at 94°, 30 sec at 55°, and 2 min at 72°, with a final extension of 5 min at 72°. Labeled probes were purified with a Qiaquick PCR purification kit (QIAGEN, Valencia, CA) or a Montage PCR96 filter plate (Millipore, Billerica, MA) according to the manufacturers' instructions and stored at −20° until used.
Southern hybridization and detection:
All hybridization and washing steps were carried out in plastic containers using a rocker (Labnet International, Woodbridge, NJ) at room temperature unless otherwise indicated. Membranes were briefly rinsed with distilled water and prehybridized for 1–3 hr in the 250-ml hybridization solution [50% deionized formamide, 2% (w/v) blocking reagent (Roche), 5× SSC, 0.1% (w/v) N-lauroylsarcosine sodium salt, 0.02% (w/v) SDS] at 41° (calculated on the basis of the GC content from the cDNA sequences according to the equation in Wetmur 1991) with nylon meshes (Abgene, Rochester, NY) between membranes. Three micrograms of labeled product were typically obtained from the PCR reaction above and were used for hybridization, resulting in a probe concentration of 12 ng/ml hybridization solution. The labeled and purified probes were denatured at 95° for 5 min and added to the hybridization solution, and membranes were hybridized for 12–16 hr under the same conditions as prehybridization.
After hybridization, membranes were washed twice with 500 ml of low-stringency solution [2× SSC, 0.1% (w/v) SDS] for 15 min each and twice with 500 ml of a high-stringency solution [0.5× SSC, 0.2% (w/v) SDS] for 15 min each at 55°. Our choice of a high-stringency washing procedure was based on a pilot study of 18 probes in which the number of detected fragments on blots was compared under low (washing at 40° with 2× SSC), medium (washing at 45° with 0.5× SSC), and high stringency (washing at 55° with 0.5× SSC). The results indicated that the number of fragments detected was not strongly influenced by washing stringency, at most by one fragment (data not shown).
Membranes were rinsed with 500 ml of washing solution [0.1 m maleic acid, 0.3% (v/v) Tween 20, pH 7.5] and incubated with 250 ml of blocking solution [0.1 m maleic acid, 2% (w/v) blocking reagent (Roche), pH 7.5] for 1–3 hr. A tube of anti-digoxigenin, Fab fragments (Roche) was centrifuged for 5 min at 18,000 × g and 8.3 μl of the antibody solution (1:40,000 dilution) was taken from the surface and added to the blocking solution. After 30 min of incubation, membranes were washed twice with the washing solution for 15 min each and rinsed twice in 500 ml of detection solution (0.1 m Tris–HCl, 0.1 m NaCl, pH 9.5) for 10 min each. Membranes were placed on plastic wrap and 500 μl of CDP-Star, Ready-To-Use (Roche) was evenly distributed to each membrane. Wrapped membranes were exposed to LUMI-FILM chemiluminescent detection film (Roche) for 16 hr and developed using an X-Omat developer (Kodak, Rochester, NY). Probes were stripped from the membranes by incubating twice with 500 ml stripping solution [0.2 m NaOH, 0.1% (w/v) SDS] for 15 min each at 37°, washed twice with 2× SSC solution, and stored in 2× SSC at 4° until the next probe hybridization.
AFLP genotyping was carried out following protocols described in Kim and Rieseberg (1999), except that primers were labeled with 6-FAM, HEX, and NED fluorescent dye. Primer pairs used in selective amplifications are listed in supplemental Table 2 at http://www.genetics.org/supplemental/. Amplified fragments were detected with the 3700 DNA Analyzer using GeneScan-500 [ROX] (Applied Biosystems, Foster City, CA) as a size standard. AFLP electropherograms were generated using Genotyper software (Applied Biosystems). Each AFLP locus was treated as independent of others and manually scored as a dominant marker, either present (1) or absent (0). Loci varied from 50 to 500 bp in size.
Isozyme expression assays were carried out following standard protocols (Soltis et al. 1983). The following enzymes (enzyme name, acronym, and nomenclature according to Wendel and Weeden 1989) were tested for polymorphisms: aconitate hydratase (ACO) (E.C. 184.108.40.206), aspartate aminotransferase (AAT) (E.C. 220.127.116.11), fructose-bisphosphate aldolase (FBA) [= ALD aldolase] (E.C. 18.104.22.168), glucose-6-phosphate dehydrogenase (G6PDH) (E.C. 22.214.171.124), glutamate dehydrogenase (GDH) (E.C. 126.96.36.199), glucose-6-phosphate isomerase (GPI) [= PGI phosphoglucoisomerase] (E.C. 188.8.131.52), hexokinase (HEX) (E.C. 184.108.40.206), isocitrate dehydrogenase (IDH) (E.C. 220.127.116.11), leucine aminopeptidase (LAP) (E.C. 18.104.22.168), malate dehydrogenase (MDH) (E.C. 22.214.171.124), malate dehydrogenase (oxaloacetate-decarboxylating) (ME) (NADP+) [= malic enzyme] (E.C. 126.96.36.199), phosphogluconate dehydrogenase (PGD) (E.C. 188.8.131.52), phosphoglucomutase (PGM) (E.C. 184.108.40.206), shikimate dehydrogenase (SKD) (E.C. 220.127.116.11), and triose-phosphate isomerase (TPI) (E.C. 18.104.22.168). Of those enzymes, AAT, IDH, PGM, and SKD were found to be polymorphic between the parental strains and were genotyped as slow (1) or fast (0) morphs.
Linkage map construction:
Linkage analysis was conducted using MAPMAKER/EXP 3.0 (Lander et al. 1987). Kosambi's mapping function (Kosambi 1944) was used to estimate map distances throughout the analysis. Markers were first grouped into linkage groups using the “group” command with LOD scores of 3.0 and recombination limits of 30 cM. Some markers, particularly those showing transmission ratio distortion (TRD) of ≥20%, potentially form “pseudolinkages” that join independent chromosomal blocks (Cloutier et al. 1997); these were manually removed from subsequent analyses. Marker orders in each linkage group were determined either automatically by the “three-point” plus “order” commands or exhaustively by the “compare” command. Other residual markers were manually placed by the “near” and “try” commands. Missing genotypes were inferred from their flanking markers using a program written in Microsoft Visual Basic. Potential genotyping errors were identified from the genotypes of flanking markers and were checked against the original raw data. The total map length was estimated by two methods. The first estimate (estimate 1) is simply a sum of distances between pairs of markers in the map. The second estimate (estimate 2) is based on method 4 of Chakravarti et al. (1991), which multiplies the sum of marker distances on each linkage group by the factor (m + 1)/(m − 1), where m is the number of markers on the linkage group. Map coverage, c, was estimated by the formula c = 1 − e−2dn/L (Bishop et al. 1983), where n is the total number of markers and L is the total map length, assuming a random marker distribution. Map coverage indicates the probability that a new marker added to the map would fall within a given distance d from the nearest marker. All statistical analyses in this study were conducted using either Microsoft Excel or SAS 9.1 for UNIX (SAS Institute, Cary, NC). Genotypes and other data in this study are deposited in the public website: C-Fern (http://cfern.bio.utk.edu).
Statistical analysis for clustering of gene copies:
To test the hypothesis of paleopolyploidy, we asked whether sets of duplicated gene copies detected by Southern hybridization are clustered among linkage groups more frequently than expected by chance. Although our method is conceptually similar to a simple χ2-test using a contingency table of 41 linkage groups, our data do not quite meet the standard χ2-test requirements. First, the expected counts of most cells are too small (less than five) to obtain sufficient statistical power. Second, there is a forced symmetry in the table, where the count in cell (i, j) is the same as the count in cell (j, i) because both are the number of duplicates shared between chromosomes i and j. Additional higher-order substructures also exist in the data. Because of these constraints, the data cannot be tested against the standard χ2-distribution, and the appropriate distribution was obtained by simulating data with constraints similar to the observed data. Two methods were used to test for significant deviations from a random distribution of duplicated genes. In the first method (method 1), the observed row and column totals as well as the observed copy numbers for the gene duplicates were retained in all simulated data sets. In the second method (method 2), gene copy numbers were randomly sampled according to the observed distribution and copies were placed into cells according to the relative probability that a duplicate gene pair would fall into that cell. Method 2 did not attempt to preserve the observed row and column totals. Significance of gene copy number distributions (relative to neutral expectations) for each of 1000 simulations was evaluated by the typical χ2-test statistic (using its simulated distribution),where O is the observed number of duplicates in a cell in the original or generated table and E is the expected number (= row total × column total)/1100 (the total number of gene copies in the data).
Eighty-five percent of the 1037 RFLP probes screened by Southern hybridization detected multiple restriction fragments. This result is consistent with previous RFLP studies of C. richardii (McGrath et al. 1994; McGrath and Hickok 1999) and implies that most genes in this homosporous fern are present in multiple copies (Figure 1). Because varying the stringency of hybridization and washing conditions had little impact on the number of fragments detected per probe (materials and methods), the high proportion of multiple-copy genes reported here is a robust result.
The number of restriction fragments on a given blot, however, is not equivalent to the number of gene copies because the region homologous to a given probe may sometimes contain one or more restriction sites. We determined the average number of restriction sites within our target genes by using the genomic sequence of four single-copy nuclear genes derived from the C. richardii EST library (GenBank nos. DQ352835–DQ352838), as well as eight previously published nuclear gene sequences from fern species (GenBank nos. M34396, X77813, X98414, AB012629, AB012630, AB012631, AB016151, and AB016231). On the basis of the average gene size (3148.25 bp), proportion of noncoding sequences (52.11%), and GC content of coding (49.90%) and noncoding sequences (43.43%), we estimated the frequency distribution of gene copy numbers, assuming that the number of restriction sites within a given gene is distributed according to a random Poisson probability (solid line in Figure 1). The frequency of RFLP probes decayed exponentially with respect to copy numbers, but there is a conspicuously long tail in the distribution, which is represented by the category >13 in Figure 1. This indicates that many genes have numerous gene copies. After correcting for “multiple cuts,” 24.0% of genes were estimated to be single copy, substantially higher than the detected number of restriction fragments on blots (14.2%).
Comparisons with RFLP studies of seed plants indicate that C. richardii has a lower proportion of single-copy genes than most other plants (Table 1). Indeed, only B. juncea (Cheung et al. 1997) and soybean (Glycine subgenus soja; Shoemaker et al. 1996) had fewer single-copy genes. Although these kinds of comparisons can be complicated by differences in methods and criteria of estimation (e.g., different hybridization conditions and genomic vs. cDNA as a source of probe), it seems reasonable to conclude that the gene copy number of C. richardii is higher than that of most seed plants.
Most of 1037 RFLP (91.40%) probes assayed in the initial test hybridizations detected at least one restriction fragment polymorphism between the two parental strains. The large amount of polymorphism between these strains is consistent with previous studies based on a small number of RFLP probes (McGrath et al. 1994) and RAPD markers (Hickok et al. 1995) and suggests that the parental populations have accumulated substantial genetic differences, although no data currently exist on the timing of their divergence. Substantial divergence of the parental strains was also suggested by the partial sterility of their hybrids (∼30% F1 spore viability).
Despite the large amount of polymorphism in the C. richardii genome, the majority of screened RFLP fragments (71.3%) exhibited codominant inheritance patterns (both parental alleles were detected), suggesting that the location and repertoire of homologous genes is mostly conserved between the parental strains. However, the remaining 28.7% of RFLP markers were inherited in a dominant fashion. There is no significant bias (χ2-test, P > 0.1) in the direction of marker dominance toward one parent or the other, so dominance cannot be accounted for by weak hybridization to ΦN8 alleles (probes were derived from the Hα-PQ45-related strain). Therefore, it appears that there have been significant changes in genome structure between the two parental strains, including the duplication, deletion, and translocation of genes.
The number of AFLP markers per primer pair had a more or less continuous but skewed distribution from 1 to 16 markers (mean, 4.84; mode, 2). Although the majority of AFLP markers were dominant, ∼1% of AFLP marker pairs were codominant or in a tightly linked repulsion phase indicated by highly negative correlations (ρ < −0.99). Interestingly, dominant AFLP markers were derived more frequently from ΦN8 than from Hα-PQ45 (χ2-test, P < 0.05), perhaps implying a higher rate of duplication (or lower rate of deletion) in the former. Also, this differential rate of genome evolution seems to be restricted to noncoding regions because the gene-based RFLP markers did not show biased representation of parental alleles (see above).
Linkage analysis using 729 markers (368 RFLPs, 358 AFLPs, and 3 isozymes) based on 488 DHLs detected 41 major linkage groups. Although most of these likely correspond 1:1 to the 39 C. richardii chromosomes, in several instances 2 (or more) linkage groups must represent fragments of the same chromosome. Because no description of chromosomes based on molecular data has been published for this species, linkage group numbers were assigned from the largest (1) to the smallest (41) (Figure 2). A large number of markers (154) could not be placed on the map because they exhibited severe transmission distortion, created pseudolinkages between major linkage groups, and/or did not map closely with other markers (see materials and methods). These markers, which sometimes included clusters of tightly linked loci, were not included in subsequent analyses, although some may represent real linkages.
The total map length of C. richardii was estimated at between 2178.8 cM (estimate 1) and 2450.9 cM (estimate 2; see materials and methods). When compared to seed plants with high-density maps, only tetraploid cotton exceeds C. richardii in total map length (Table 2). Despite the large map size, the amount of recombination within linkage groups (10.7–97.7 cM; mean, 59.5 cM) is much smaller than that of seed plants analyzed to date. As a consequence, the ratio of physical to genetic distance (megabases per centimorgan) in C. richardii is very large when compared to that reported for other plants. Although unresolved linkage relationships and unlinked markers may be partly responsible for this pattern, the consistently short map lengths across linkage groups may reflect differences in genome structure between C. richardii and seed plants. That is, the C. richardii genome is characterized by a large number of chromosomes with low rates of recombination, as opposed to the fewer more highly recombinant chromosomes found in other plant groups.
The number of markers per linkage group ranged from 5 to 30.9 (mean, 17.8), with mean spacing between markers of 3.18 cM. Marker coverage was 96.5% at 5-cM distance and 99.9% at 10-cM distance on the basis of the map length estimate 1 (materials and methods), indicating that the map is fairly saturated. Whether the mapped markers were evenly distributed throughout the genome was tested by comparing the frequency distribution of the number of markers observed in every 10-cM interval with the expected frequency distribution based on a random Poisson probability. The frequency distribution of RFLP markers very closely approximated random expectations (Kolmogorov–Smirnov two-sample test, P > 0.5), whereas AFLP markers were seemingly more clustered, although the pattern was not statistically significant (P > 0.1). Clustering of AFLP markers is often reported (e.g., Vuylsteke et al. 1999), presumably because of the overrepresentation of AFLP markers in low-recombination regions such as centromeres.
Distribution of duplicated genes:
Figure 3 shows the distribution of pairs of gene copies detected by RFLP probes in the C. richardii genome based on the linkage map described above. The majority of gene copy pairs (95.1%) occur between different linkage groups, as shown by data points located off the diagonal. Although it is difficult to estimate the exact proportion of gene copies occurring within and between linkage groups because not all of the segregating markers for a given probe were mapped, the presence of gene copies on different linkage groups is a common occurrence. On a visual basis, duplicated gene copies seem more or less randomly distributed throughout the genome, indicated by no clear clustering of sets of markers shared by a pair of linkage groups (Figure 3). This distribution pattern contrasts sharply with that of tetraploid cotton (Reinisch et al. 1994; Rong et al. 2004) and Brassica (Lagercrantz and Lydiate 1996), where sets of homeologous chromosomes or chromosomal segments can be clearly identified. Statistical analyses of the distribution of gene copies, however, indicate that there is significant clustering (P ≈ 0.001 for method 1 and P = 0.011 for method 2 in materials and methods); that is, the copies of a set of markers in a given linkage group tend to co-occur on a different linkage group. This result suggests that on a genomewide scale, genes on chromosomes or chromosomal segments tend to be duplicated together in the genome, although the pattern is not strong enough to be visually apparent. Although the degree of the gene copy clustering can be overestimated by the fact that some gene copies that occur between linkage groups are also duplicated within linkage groups, we found no evidence that the clustering is due to intralinkage group duplications. Both an analysis of the contributions to the χ2-statistic from the diagonal for the original data set (due to gene duplicates residing on the same linkage group) and a model that allows for an increased proportion of duplicates along the diagonal suggest that the significant χ2-statistic for the original data is due to a more complicated pattern of clustering involving gene duplicates on distinct chromosomes. It is noteworthy that statistically significant but not visually apparent clustering of gene copies has previously been reported for putatively diploid cotton (Rong et al. 2004) and sorghum (Bowers et al. 2003) species.
Homosporous ferns are remarkable for their high chromosome numbers, but the question of how such high numbers originated remains unanswered. This study showed that the C. richardii genome has one of the highest proportions of multiple-copy genes among plant species and that the duplicate copies occur mainly on different chromosomes. However, the number of copies was found to vary widely for different genes. In addition, gene duplicates seemed to be scattered more or less haphazardly across the genome rather than occurring in recognizable homeologous chromosomal segments. These results suggest that polypoidization has not played a significant role in the recent evolution of the C. richardii lineage.
Unfortunately, our data fail to resolve the issue of paleopolyploidy. While the absence of large syntenic regions consisting of duplicated gene copies implies that small-scale duplications may be the primary mode of gene duplication in C. richardii, evidence of past polyploidization(s) might be masked by extensive chromosomal rearrangements, as well as smaller-scale duplications and deletions following polyploidization(s). The high proportion (>25%) of dominant RFLP markers offers some support for this view in that it implies a high rate of gene deletion, duplication, or translocation. Likewise, statistically significant clustering of gene copies in linkage groups in the C. richardii genome, although not visually apparent, may reflect a weak but significant signature of past polyploidization(s). Thus, it may be that homosporous ferns are paleopolyploids, but this cannot be proven with the present data set. Plant genomes seem to be surprisingly dynamic, with chromosomal structural changes evident even in very early stages of polyploid generations (Lukens et al. 2004; Pires et al. 2004).
The phylogenetic distribution of chromosome numbers among vascular plants also hints at one or several polyploidization events early in the evolution of homosporous ferns (Figure 4). Base chromosome numbers of most Moniliformopses (ferns and fern allies) are substantially higher (>27), with the exception of heterosporous ferns, than those of the seed plants (angiosperms and gymnosperms) (<28, Klekowski and Baker 1966) or more basal land plants (Lycophytes and Bryophytes). These high base chromosome numbers are fairly stable within homosporous ferns, although neopolyploidization seems to be frequent within local lineages. Thus, solely on the basis of the distribution of chromosome numbers among vascular plants, it is reasonable to speculate that the origin of the high base chromosome numbers of homosporous ferns (perhaps through ancient polyploidy) dates back at least to the divergence of the heterosporous ferns and their homosporous sister clade that includes C. richardii (222.99 ± 9.12 MYA, Schneider et al. 2004).
Although this phylogenetic perspective supports polyploidy in ferns, it implies a different history of polyploidization than that generally promoted in the fern literature. The most widely accepted hypothesis holds that there have been repeated cycles of polyploidization during the evolution of homosporous ferns (Haufler 1987), similar to the model of chromosomal evolution in grasses (Stebbins 1971, 1985). The most recent of these (if this hypothesis had been correct) should have been detected easily using the RFLP mapping methods employed here. In contrast, the chances of detecting evidence of an ancient polyploidization event going back 220 million years with this approach are extremely remote, particularly given the high rates of turnover of duplicate genes (Lynch and Conery 2000). Although the overall age distribution of gene copies is consistent with past polyploidization events in A. thaliana and rice, there are massive duplications of individual genes scattered throughout these genomes, indicating that gene duplication is a dynamic evolutionary force constantly reshaping genome structure (Arabidopsis Genome Initiative 2000; Yu et al. 2005). Our ability to detect syntenic chromosomal regions using linkage mapping is further limited by the low probability that all copies of a given gene will have scoreable polymorphisms. Hence, while the results from our study are consistent with past polyploidization(s), the event(s) may be too ancient and/or genome rearrangements too extensive to decisively infer its occurrence on the basis of the linkage mapping approach employed here.
An alternative approach to this question might be to analyze the age distribution of duplicated gene copies in the C. richardii genome. This has proven to be a sensitive method for obtaining evidence of paleopolyploidy in other organisms (Vision et al. 2000; Blanc and Wolfe 2004), and it represents the current focus of our research efforts.
A permanent resource for genetic studies of ferns:
Despite their enormous contribution to the world's biodiversity, we have virtually no knowledge of how fern genomes are organized and evolve. Even less is known about the genetic basis of phenotypic differences between populations and species of ferns. This is regrettable because of the enormous biosystematic, morphological, and ecological resources that have been developed for ferns over the past century and also because homosporous ferns have a unique mating system that is potentially valuable for a wide range of genetic and phenotypic studies (see below). This article provides a first step toward understanding the basic structure of a homosporous fern genome and provides permanent genetic resources that can be efficiently exploited for genetic and phenotypic analyses of this poorly studied group.
Unlike those of seed plants, spores of homosporous ferns are capable of developing into bisexual gametophytes, which can self-fertilize (intragametophytic selfing) to result in sporophytes homozygous throughout the genome in a single generation (DHLs). This feature was found to be particularly useful for linkage mapping using segregating populations for the following reasons. First, RFLP genotyping is more reliable without heterozygotes that reduce signal intensity and complicate banding patterns. This is particularly important for organisms such as ferns with high gene copy numbers. Second, in a conventional crossing design (e.g., F2 populations), dominant markers are much less informative than codominant markers, because linkage relationships of dominant markers in repulsion phase cannot be determined, and separate maps for each parent often need to be constructed (Liu 1998). Linkage mapping with DHLs does not involve repulsion phase, and therefore dominant markers such as AFLPs provide the same power as codominant markers. Third, detecting quantitative trait loci (QTL) underlying phenotypic traits using DHLs is much more powerful than with F2 and backcross populations, because the expression of phenotypic differences is not confounded by dominance effects (Lynch and Walsh 1998). Finally, once DHLs are constructed, their genotypes are permanently preserved, and future QTL analyses can be done simply by phenotyping the DHLs without additional genotyping.
We envision a whole range of genetic studies of ferns that can be undertaken with these resources. These range from studies of the genetic basis of phenotypic differences to the nature of hybrid sterility and inviability to the interactions of genotypes with the environment (G × E). Analysis of G × E is particularly straightforward with DHLs because one needs only to propagate replicate DHLs in different environments and record their phenotypes. The linkage map from this study, in combination with rapidly accumulating genetic tools and resources for C. richardii, should serve as a foundation for future genetic and genomic studies in ferns specifically and vascular plants more generally.
We are grateful to Jo Ann Banks and Jeffrey Hill for cDNA libraries; Leslie Hickok for parental spores; Johanna Johnson and Valena Fiscus for technical assistance; Ying-juan Su and Michael Barker for sequencing the sporophytic EST markers; and John Burke, Kevin Livingstone, John Colbourne, and Ellen Quardokus for helpful suggestions. This material is based upon work supported by the National Science Foundation under grant no. 0128926.
- Received January 11, 2006.
- Accepted April 18, 2006.
- Copyright © 2006 by the Genetics Society of America