Divergence of the hyperthermophilic Archaea, Pyrococcus furiosus and Pyrococcus horikoshii, was assessed by analysis of complete genomic sequences of both species. The average nucleotide identity between the genomic sequences is 70-75% within ORFs. The P. furiosus genome (1.908 mbp) is 170 kbp larger than the P. horikoshii genome (1.738 mbp) and the latter displays significant deletions in coding regions, including the trp, his, aro, leu-ile-val, arg, pro, cys, thr, and mal operons. P. horikoshii is auxotrophic for tryptophan and histidine and is unable to utilize maltose, unlike P. furiosus. In addition, the genomes differ considerably in gene order, displaying displacements and inversions. Six allelic intein sites are common to both Pyrococcus genomes, and two intein insertions occur in each species and not the other. The bacteria-like methylated chemotaxis proteins form a functional group in P. horikoshii, but are absent in P. furiosus. Two paralogous families of ferredoxin oxidoreductases provide evidence of gene duplication preceding the divergence of the Pyrococcus species.
THE three-domain proposal of Woese et al. (1990), which divides all living organisms into three primary groups or domains named Archaea (or archaebacteria), Bacteria (or eubacteria), and Eucarya (or eukaryotes), has been widely accepted. However, with the recent determination of genomic sequences for four species of Archaea (Bultet al. 1996; Klenket al. 1997; Smithet al. 1997; Kawarabayasiet al. 1998), this phylogenetic division has become the focus of renewed debate (Doolittle 1998; Mayr 1998). On the one hand, the Archaea are characterized by molecular features of both bacteria and eukaryotes, such as the possession of a eukaryotic transcription system, which supports the three-domain proposal. However, the genomic sequences provide significant evidence for lateral gene transfer, leading to the speculation that the “bacterial” and “eukaryal” contents of the average archaeal genome may be imported. This poses the difficult question of which genes have been inherited from the last common ancestor of the Archaea and which have been incorporated as a consequence of lateral gene transfer.
In this article, we examine two recently completed genomic sequences of related archaeal species within the order Thermococcales. These strains, Pyrococcus horikoshii (Kawarabayasiet al. 1998) and P. furiosus (R. Weiss, D. Dunn, R. Stokes, J. Cherry, M. Stump, R. Yeh, B. Duval, C. Hamil, R. Rose, J. Nelson-Peterson, A. von Niederhausern, A. Aoyagi, M. Mahmoud, E. Carter, J. Alder, M. Alder, J. R. Brown, W. Marshall, P. V. Warren, W. S. Hayes, S. S. Hannenhalli, A. N. Lupas, K. K. Koretke, J. DiRuggiero, D. L. Maeder and F. T. Robb, unpublished results) have very similar physiological and biochemical characteristics. They are both hyperthermophilic, with optimal growth temperatures of 100° for P. furiosus (Fiala and Stetter 1986) and 98° for P. horikoshii (Gonzalezet al. 1998). The availability of whole genome sequences has allowed us to approach the question of whether a significant level of genetic exchange has occurred within the Thermococcales and to examine in detail the gains and losses of genomic sequences during the segregation of these species.
MATERIALS AND METHODS
Archaeal strains: P. furiosus DSM 3638 (Fiala and Stetter 1986) and P. horikoshii (Gonzalezet al. 1998) are sulfur-reducing, hyperthermophilic marine isolates with optimal growth at 100° and 98° respectively. Isolation was from distant and different volcanic sampling sites: P. furiosus was isolated from a shallow marine solfatara at Vulcano Island near southern Italy (Fiala and Stetter 1986), whereas P. horikoshii was isolated from a hydrothermal vent site near Okinawa in the North Pacific Ocean, at 1400 m depth (Gonzalezet al. 1998).
Sequencing: Genomic DNA was prepared from P. furiosus using previously described methods (Borgeset al. 1996). Genomic sequencing was accomplished using a 2400-clone medium-insert plasmid library, a 400-clone cosmid library, and PCR templates. End sequences for each clone were generated using a combination of multiplex sequencing and BigDye terminator reactions. Gamma Delta transposons were inserted into two subsets of the plasmid library clones. Bidirectional sequencing from mapped sets of transposon inserts rendered complete sequence of the plasmid subsets. The primary subset contained 150 randomly chosen clones that generated contiguous sequence islands across the genome. The second subset was determined by overlaying the end sequence read-pairs on the contigs and identifying bridging clones. The remaining gaps were closed by primer walking on amplified products generated from genomic DNA from P. furiosus or from a cosmid clone (R. Weiss, D. Dunn, R. Stokes, J. Cherry, M. Stump, R. Yeh, B. Duval, C. Hamil, R. Rose, J. Nelson-Peterson, A. von Niederhausern, A. Aoyagi, M. Mahmoud, E. Carter, J. Alder, M. Alder, J. R. Brown, W. Marshall, P. V. Warren, W. S. Hayes, S. S. Hannenhalli, A. N. Lupas, K. K. Koretke, J. DiRuggiero, D. L. Maeder and F. T. Robb, unpublished results).
Genomic analysis: The presence and absence analysis was performed with software for whole genome comparison, including the program Cross. Cross is a Win32 program that parses blastN (Altschulet al. 1990) sequence comparison output and assembles a table of high-scoring pairs (HSPs) with their database sequence identities, positions, and hit quality parameters. The program does not incorporate any new sequence comparison algorithm, but it has the advantage of representing the blast output in a color-coded format that allows the assessment of hit quality and position relative to genomic coordinates. Output from the program can be viewed at the following web site: http://comb5-156.umbi.umd.edu/poster/cross.
Genomic comparison data are accessed through an interactive graphical user interface in which hits are displayed in a diagonal plot analogous to a dot plot, but with HSPs color coded by their scores scaled between a lower (blue) and upper (yellow) threshold, with white indicating higher scores.
Physiological measurements: To confirm the physiological properties inferred from genomic analysis, growth requirements were determined experimentally for both P. furiosus and P. horikoshii using the following growth medium (per liter): 24 g NaCl, 4 g Na2SO4, 0.7 g KCl, 0.2 g NaHCO3, 0.1 g KBr, 0.03 g H3BO3, 10.6 g MgCl2 · 6H2O, 1.5 g CaCl2 · 2H2O, 0.025 g SrCl2 · 6H2O, 5 g Tryptone, 1 g yeast extract, 1 ml Resazurin (0.2 g/liter solution), 5-10 g elemental sulfur. The medium was prepared under a N2 atmosphere and reduced by the addition of 3 ml Na2S · 9H2O (25% [w/v] solution, pH 7). The pH was adjusted at room temperature with correction for increased temperature of growth.
Candidate energy substrates were tested in the presence and absence of 0.01% (w/v) yeast extract, as well as in the presence and absence of a vitamin solution (Blachet al. 1979). In the culture media amended with Casamino acids and amino acid mixtures, yeast extract was omitted. Nonproteinaceous substrates and amino acids were supplemented at a final concentration of 0.2% (w/v). Because Casamino acids (Difco, Detroit) lack tryptophan, asparagine, and glutamine due to acid treatment during production (Moore and Stein 1963), these were added or omitted to study the requirements for these three amino acids.
The genomes of P. furiosus and P. horikoshii were compared using diagonal plots reflecting the dispersion of gene order between the species. A representative section of the genome comparison is shown in Figure 1. The comparisons show a considerable degree of collinearity, indicated by the number of continuous line segments and the fact that many occupy the 45° diagonal line delineating the coincidence between genomes. The hit quality of many of the HSPs in these sections is high, as shown by the yellow color. The P. furiosus/P. horikoshii comparison also reveals that the former has a larger genome by 169,425 bp, and a significant part of the missing sequence is located in a 73-kbp block.
Table 1 reveals that several amino acid biosynthetic operons that are present in P. furiosus are absent in P. horikoshii. We have established that P. horikoshii is auxotrophic for tryptophan (Gonzalezet al. 1998) and histidine (J. M. Gonzalez, this study), which is consistent with the absence of the trp and his operons. The position of the indels in the genome of P. horikoshii is interesting. In many of the cases where an entire operon is missing in P. horikoshii, the insertion or deletion event is correlated with the inversion of a large section of the genome containing the indel, relative to the other Pyrococcus genome. For example, Figure 1 shows the 125-kb inversion associated with the loss of the mal, trp, aro, and his regions. The missing regions have sharply defined boundaries in an otherwise contiguous span of collinear, homologous sequence shared by both genomes.
Table 1 shows the operons found in P. furiosus and missing from the P. horikoshii genome sequence. With the exception of the chemotaxis-related genes, referred to in Table 2, all of the divergent operons are missing from the P. horikoshii genome.
Analysis of all gaps in the nucleotide alignment of P. furiosus and P. horikoshii, 15000 bp or smaller bounded by HSPs with scores greater than 500, produced 56 gaps between direct matches, 51 between inverted matches, and just 6 between mismatched diagonals. Mean deviation from collinearity was -632 bp for direct matches and 8327 bp for inverted matches with standard deviations of 2616 and 4521 bp, respectively. Clearly, gaps between successive direct matches tend to be of similar size, while gaps in runs of inverted matches typically represent insertions of 4-13 kbp in P. furiosus.
Table 2 lists a linked cluster of bacterial-type chemotaxis genes that occur in P. horikoshii. None of these genes could be identified during annotation of the P. furiosus genome sequence.
Figure 2 shows the presence and absence of inteins in three archaeal genomes, from P. horikoshii, P. furiosus, and Methanococcus jannaschii. Four intein alleles are shared between all three organisms, possibly indicating a common origin of this set of allelic genes in an ancestral euryarchaeote. Unique inteins occur in the Pyrococcus genomes as follows: six in P. horikoshii and two in P. furiosus.
Figure 3 indicates the occurrence of paralogous 2-ketoacid ferredoxin oxidoreductase genes in both Pyrococcus genomes. It is clear from Figure 3a that the pairing of these operons represents an operon duplication event as there is an uninterrupted duplication of this region at the nucleotide (nt) level, which appears as a direct repeat covering some 4200 nt (the por region spans 2444 nt and vor, 1755 nt).
A second such motif (Figure 3b) corresponds to an unspecified 2-ketoacid ferredoxin oxidoreductase (2kor). Although there is no observable similarity between these operon pairs at the nucleotide level using blastN, this second pair of operons is a homologous, ordered set of genes as determined by blastX analysis. The similarity within paired members of contiguous operons is high as represented by analysis.
The functional oxidoreductase operon products consist of four chains α, β, γ, δ (A, B, G, D), that form heterotetrameric complexes (Kletzin and Adams 1996a). Surprisingly, the por/vor operon contains only one γ gene, located at the 5′ end, which encodes a subunit shared by the two complexes. Similarly, the δ chain gene of the 2kor operon occurs in only one copy at the 5′ end and likely encodes a shared subunit.
Detailed analysis of TblastN and blastX matching of the ORFs corresponding to the genes of these operons and the P. furiosus genome sequence (Table 2) indicated that homologous genes are closely related within contiguous operon pairs containing about 50% identities while homologs in disjoint operons have ∼28% amino acid sequence identity.
For the operon pairs the gene order is GDABDAB for vor/por and DABGABG for 2kor1/2kor2. Clearly the por/vor pair displays a duplication of DAB and the retention of one copy of G while for the 2kor pair ABG has been duplicated with the retention of one copy of D (Figure 4).
At present, there are no practical means of genetic analysis for the hyperthermophilic Archaea, with the possible exception of Sulfolobus species (Aravalli and Garrett 1997; Cannioet al. 1998; Sheet al. 1998). The availability of whole genome sequences, therefore, allows the formal cataloging of the gene complements of these organisms in a way that was not possible. The significant level of bacterial-like genes and operons in the Archaea implies that lateral gene transfer may be common between Archaea and Bacteria, whose physiological capabilities allow cohabitation (Aravindet al. 1998). Our studies support the conclusion that significant importation of new operons or functional groups has occurred over the relatively short time since the two Pyrococcus spp. diverged. Their metabolic functions show rapid adaptive displacement, with possible ongoing gene loss and replacement. This is certainly the case in the haloarchaea, many of which are characterized by frequent and radical genomic rearrangements (Ng et al. 1991, 1998; Charlebois and St. Jean 1995). These results belie the notion that organisms such as Pyrococcus are somehow “frozen” evolutionarily in a “primitive” state.
We propose that the incidence of large-scale rearrangements of genomic sequence by inversion or insertion may be due to the capacity of hyperthermophiles for accelerated breakage and repair. We, and others, have shown DNA break repair to be a continuous process in hyperthermophiles as they maintain viability near or above 100° (Peaket al. 1995). P. furiosus has been shown to possess an exceptional capability for DNA double-strand break repair. Following complete fragmentation of the chromosome into fragments of less than 500 kbp as a result of 2.5 kGy of gamma irradiation at 4°, the cells were able to reassemble intact chromosomes 4 hr after restoration to 95° with 25% loss of viability (DiRuggieroet al. 1997). This capability may result in a higher frequency of radical DNA alterations, compared with small-scale sequence divergence events that are normal for microorganisms growing at lower temperatures. Small-scale reordering or nucleotide sequence divergence may be the result of spontaneous mutation. Mutation rates of hyperthermophilic Archaea are not known, except in the case of Sulfolobus acidocaldarius (Jacobs and Grogan 1997, 1998), where the rate of spontaneous mutation was close to the rates reported for protein-encoding genes in repair-competent E. coli strains. Alternatively, more conventional mechanisms of genome rearrangements may be operating in the Archaea. There are cases of virus growth and lysogeny in Sulfolobus spp. (Zilliget al. 1998). In addition, a recent report confirms that conjugation, and conjugal transfer of chromosomal markers, is found in S. acidocaldarius (Ghané and Grogan 1998). The high incidence of large-scale inversions and the association between inversions and the deletion or insertion of operons indicate that insertion or deletion events may be associated within the inversion event. The adjustments of transcription and replication organization that follow an inversion of a large section of the genome might accelerate the process of insertion or deletion. The boundary positions of the large inversions and deletions are not marked by unusual sequence motifs, except for the mal region, which is exceptional in that it is bounded by putative transposons.
At present we do not know whether the differences in gene complement between the two species represent gains or losses. Most of the loci represent biosynthetic or degradative operons similar in gene order to those in the Bacteria (Table 1) and it is tempting to speculate that P. furiosus and P. horikoshii have both engaged in large-scale trafficking of nonessential genetic material by lateral gene transfer. The concerted gain or loss of whole operons or functional groups of genes would appear to support this suggestion. The chemotaxis gene cluster is one such example. Both strains are motile and possess prominent tufts of polar flagella (Fiala and Stetter 1986; Gonzalezet al. 1998). Two flagellin genes related to type IV bacterial pili occur in each Pyrococcus genome, a situation very similar to that described for methanogens (Bayleyet al. 1998). However, chemotactic responses have not been studied in hyperthermophilic Archaea. The haloarchaea have well-defined phototactic and aerotactic responses, mediated by a set of methyl-accepting chemotaxis proteins (for review see Hoffet al. 1997). It is tempting to speculate that P. horikoshii, because it is unable to synthesize many amino acids and other metabolic intermediates (see below), may compensate for this by a more effective chemotactic response. This assumption then begs the question: Why is P. furiosus motile if it is lacking the genetic capability for chemotaxis? It is possible that another system of taxis exists, unrelated to the bacterial type found so far in the Euryarcheota and encoded by some of the conserved hypothetical genes that constitute 40% of the ORFs in the P. furiosus genome. It is possible that thermosensing and thermotaxis exist to locate these organisms within thermal gradients.
In the case of P. furiosus it is possible to specify whether several of the predicted coding regions are, in fact, functional. Hoaki et al. (1994) showed that P. furiosus is auxotrophic for thr and ile. It is therefore possible that the thr and ile gene clusters that we have identified are nonfunctional. Interestingly, very poor growth was recorded in the absence of thr, leu, and met, implying that these pathways may be marginally functional. Further analysis is required to determine whether the biosynthetic genes are expressed at lower levels or are producing poorly functional or nonfunctional proteins as reported for several genes in the genome of Rickettsia prowazekii (Anderssonet al. 1998). The acquisition of marginally functional genes may be followed by progressive gains of biosynthetic function by selection for higher expression, more efficient catalysis, or improved thermostability. Alternatively, these genes may be mutating to become cryptic as a result of weak selection.
The P. furiosus enzymes pyruvate ferredoxin oxidoreductase (por) and 2-ketoisovalerate ferredoxin oxidoreductase (vor) have been characterized previously by Kletzin and Adams (1996a,b). It is clear from their high level of expression that the por and vor operons encode crucial functions in the energy metabolism of the Thermococcales. The divergence of operon structure and the overall conservation of these genes in both genomes suggests that the ancestral Pyrococcus diverged prior to the operon duplication event and two independent operon duplication events followed before the divergence of P. furiosus and P. horikoshii. Polypeptide D has recently been shown to be a ferredoxin (Menonet al. 1998), dedicated in function and tightly bound to the oxidoreductase complexes. The same G-subunit is shared in por and vor complexes, while the D-subunit must serve both 2kor complexes. The double operon must be coexpressed for the downstream gene products to become functional as there is only one copy of either polypeptides D or G. In addition, the regulation of the operons may be unconventional, because the D and G chains, which are specified by single genes, are found in stoichiometric amounts in the complexes of both types of oxidoreductases (M. Adams, personal communication). The rates of divergence of the neighboring genes within the individual por and vor operons are equal, as seen in Table 3. There is constrained divergence within operon pairs, relative to the homologs in other operons. It is probable that the structural requirements imposed on subunits in order for complexes to maintain effective contacts has constrained the divergence of the duplicated genes within paired operons.
The divergence of inteins in the genomes of the pyrococci and in more distant taxa, such as M. jannaschii, poses significant challenges in interpretation. Intein alleles that are shared between the methanogenic Archaea and the pyrococci occur in conserved target sites in homologous genes. However, the degree of divergence between the intein sequences is significantly higher than that in the surrounding extein sequences and approaches the low level of identity where homology is dubious. It becomes impossible to specify whether the intein elements are conserved or whether the insertion sites are hot spots for intein insertion and were conserved during the divergence of the pyrococci. If the latter proves to be correct, a survey of these genes from additional Pyrococcus and Thermococcus isolates could provide benchmarks of divergence for the group as a whole.
The significant degree of genomic divergence of two hyperthermophilic species corresponds to a relatively short period of genetic and geographical separation. From our findings, it is clear that these archaeal genomes are subject to dynamic genomic rearrangements and operon gain and loss.
The authors gratefully acknowledge support from the Microbial Genome Program of the U.S. Department of Energy, the National Science Foundation, the U.S. Department of Commerce, and the Wallenberg Foundation. This is contribution number 513 from the Center of Marine Biotechnology.
Communicating editor: P. Blum
- Received April 1, 1999.
- Accepted May 18, 1999.
- Copyright © 1999 by the Genetics Society of America