help button home button Genetics J Nutrition
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Rivolta, C.
Right arrow Articles by Pagni, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Rivolta, C.
Right arrow Articles by Pagni, M.
Genetics, Vol. 151, 1239-1244, April 1999, Copyright © 1999

Genetic and Physical Maps of the Bacillus subtilis Chromosome

Carlo Rivoltaa and Marco Pagnia
a Institut de Génétique et de Biologie Microbiennes, Université de Lausanne, CH-1005 Lausanne, Switzerland

Corresponding author: Marco Pagni, Institut de Génétique et de Biologie Microbiennes, Université de Lausanne, Rue César-Roux 19, CH-1005 Lausanne, Switzerland., marco.pagni{at}igbm.unil.ch (E-mail)

Communicating editor: R. MAURER


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Sequencing of the complete Bacillus subtilis chromosome revealed the presence of ~4100 genes, 1000 of which were previously identified and mapped by classical genetic crosses. Comparison of these experimentally determined positions to those derived from the nucleotide sequence showed discrepancies reaching up to 24° (~280 kb). The size of these discrepancies as a function of their position along the chromosome is not random but, apparently, reveals some periodicity. Our analyses demonstrate that the discrepancies can be accounted for by inaccurate positioning of the early reference markers with respect to which all subsequently identified loci were mapped by transduction and transformation. We conclude (i) that specific DNA sequences, such as recombination hotspots or presence of heterologous DNA, had no detectable effect on the results obtained by classical mapping, and (ii) that PBS1 transduction appears to be an accurate and unbiased mapping method in B. subtilis.


THE gram-positive reference bacterium Bacillus subtilis is a nonpathogenic and widespread soil microorganism. The discovery of genetic transformation (SPIZIZEN 1958 Down) and, subsequently, phage PBS1-mediated transduction (TAKAHASHI 1961 Down) in this organism allowed the construction of a genetic map comprising >1000 loci. The identification of the origin of chromosome replication (SUEOKA and YOSHIKAWA 1965 Down), the establishment of a limited number of linkage groups (DUBNAU et al. 1967 Down), the demonstration of the circularity of the B. subtilis genome (LEPESANT-KEJZLAROVA et al. 1975 Down), and the construction of a kit of nine strains with easily selectable markers relatively uniformly distributed over the chromosome (DEDONDER et al. 1977 Down) were some of the milestones in the elaboration of the genetic map. However, a fundamental contribution was that of HENNER and HOCH 1980 Down, who introduced new concepts used in all subsequent works. Their map listed gene locations in terms of distances, expressed in degrees, from the chromosome replication origin. Distances between genes were rendered additive by the application of the equation of KEMPER 1974 Down to the phage PBS1/B. subtilis system. This equation, relating the absolute distance between two genes to their cotransduction frequency, was first derived from P22-mediated transduction in Salmonella typhimurium. Henner and Hoch made use of the Kemper equation to position more than 350 genes. Among the genes, they highlighted a set of 24 markers called "genetic landmarks," which are evenly distributed on the B. subtilis chromosome and play a major role in most mapping experiments. Subsequently, hundreds of contributions reporting chromosomal locations of new genes allowed the compilation of more and more elaborate versions of the B. subtilis genetic map (HENNER and HOCH 1982 Down; PIGGOT and HOCH 1985 Down; PIGGOT 1989 Down; PIGGOT et al. 1990 Down; ANAGNOSTOPOULOS et al. 1993 Down; BIAUDET et al. 1996 Down).

Classical mapping methods remain valuable tools for establishing the location and the order of a set of loci. In particular, transduction allows for the comparison of the genetic organization of chromosomal regions in related species (e.g., TORO et al. 1998 Down) and remains essential for the positioning of genes that have no identified counterpart in the biological databanks. Therefore, integrating genetic and physical maps appears to be a pressing new problem (JAIN and MYERS 1997 Down).

In the present article we compare the genetic map of B. subtilis to its recently obtained physical counterpart, i.e., the nucleotide sequence of the whole genome (KUNST et al. 1997 Down).


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Computer analysis of the data:
Computational and graphical work was performed with the program IGOR Pro 3.12 of WaveMetrics, Inc. All source files are available upon request.


*  RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Preliminary analyses:
By courtesy of the Pasteur Institute we received a flat file containing all open reading frame (ORF) positions deduced from the genome sequence, as well as the genetic positions of loci that were previously experimentally mapped. These data correspond to the information contained in the SubtiList database from July 6, 1997 (MOSZER et al. 1995 Down; http://web.pasteur.fr/Bio/SubtiList.html/). Among the 4138 genes described in this file, 1020 have been located by genetic crosses and correspond to the loci reported by BIAUDET et al. 1996 Down, to which a few additions were subsequently made. However, the later releases of SubtiList provide positions of several genes that were not obtained by classical mapping, but derived from the sequence and were positioned, in degrees, at the integer value closest to their physical position. The latter loci were not taken into account in the present analysis.

The ORF density along the chromosome, expressed as number of ORFs per 10° segments, fluctuates from 56 to 167. This fluctuation can be accounted for by the average ORF length in each segment, because the proportion of coding DNA is nearly constant along the chromosome, representing ~87% of the genome (KUNST et al. 1997 Down). The density of genetically mapped loci ranges from 5 to 75 per 10° segment, representing between 4 and 58% of the ORFs found in a segment (Figure 1B).



View larger version (33K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1. (A) The genetic position of the 1020 genetically mapped genes of the Bacillus subtilis 168 chromosome plotted vs. the corresponding physical position deduced from the molecular sequence. Genes tenA and tenI (open arrowhead) were obviously mismapped and removed from further analysis. Seven genes were genetically mapped on the wrong replication arm (solid arrowhead). (B) The frequency per 10° segment of sequenced genes (lightly shaded bars) and genetically mapped genes (heavily shaded bars).

Let Pgeni be the genetic position, expressed in degrees, of gene i

(1)
and Pphyi the physical position of the middle of this gene

(2)
where starti and stopi are the distances, in base pairs, from the origin of chromosome replication to the first and last nucleotide of gene i, respectively, and where 4,214,807 corresponds to the number of base pairs of the circular B. subtilis chromosome (KUNST et al. 1997 Down).

Globally, the location of the markers on the physical map correlates well with that of the 1020 loci for which Pgeni is available (Figure 1A). There are, nevertheless, two striking exceptions. First, genes tenA and tenI (PANG et al. 1991 Down), whose physical and genetic positions differ by ~45° (527 kb), were considered to be mismapped and therefore removed from the data set. The data were not further preprocessed so as to maintain as much original experimental information as possible. Second, genes tetB, tetL, exoA, rpsR, ssb, rpsF, and rpmH give the impression of being completely mismapped. Actually, they are situated close to the origin of replication and were mapped on the wrong replication arm.

For any given gene i, we defined Wi as the discrepancy between its genetic and its physical position

(3)

Inspection of these discrepancies plotted as a function of Pphyi (Figure 2A) reveals a periodical behavior, with peaks and valleys on both sides of the Wi = 0 axis. It would appear that the error Wi, relative to the mapping of any given gene i, is influenced by the specific chromosomal region to which gene i maps. This, in turn, suggests that results of classical mapping experiments may be influenced by some regional factors such as the physicochemical properties of the DNA or by yet uncharacterized artifacts inherent to the PBS1-mediated transduction and possibly transformation.



View larger version (30K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 2. (A) The discrepancies (Wi) between the genetic and the physical locations (Pphyi) of 1020 genetically mapped B. subtilis genes are represented by smaller circles. The bigger circles joined by the dashed line correspond to the discrepancies of the 22 genetic landmarks with known physical positions (Table 1). (B) The same as in A, once the discrepancies Wi have been subtracted from the errors relative to the landmarks Ei, giving i. (C and D) Frequency distribution of the discrepancies shown in A and B, respectively. The black lines indicate the gauss curves that fit such distributions. (E and F) The autocorrelation function computed with the set of points shown in A and B, respectively. Heavier lines result from the smoothing of the raw autocorrelation values (lighter lines) using sliding average.

The landmarks hypothesis:
Historically, any newly identified marker was mapped by transduction or transformation with respect to relatively close previously located markers. This recursive process forcibly led to the propagation of errors committed during the positioning of the earliest markers to be recognized. We investigated the extent to which the discrepancies between the genetic and physical maps might have been due to experimental inaccuracies associated with the positioning of the early reference markers. For our study we considered as landmark markers the 24 loci retained by HENNER and HOCH 1980 Down as reference markers for mapping. However, 2 of the loci could not be taken into consideration because the corresponding ORFs have presently not been identified (Table 1).


 
View this table:
[in this window]
[in a new window]

 
Table 1. The genetic markers used as landmarks for mapping of the chromosome of Bacillus subtilis by HENNER and HOCH 1980 Down

A detailed historical reconstitution of the B. subtilis chromosome mapping would be a cumbersome and unfeasible task. Therefore, to account for the propagation of the early errors during the recursive mapping process we have introduced, for any given gene i, the landmark-associated discrepancy Ei defined by

(4)
where WL and WR are the discrepancies of the left and right gene i flanking landmark markers, respectively, and where DgenA,B is the genetic distance between genes (or markers) A and B. Ei is the mean of the landmark discrepancies pondered by the inverse of their relative genetic distance. It actually corresponds to a linear interpolation between the landmark points (Pgeni; Wi). From Equation 4 it follows that, for a landmark gene or for any gene having the same genetic position as a landmark gene, Ei is equal to Wi, while for a marker genetically equidistant from its two flanking landmarks, Ei corresponds to the average of the discrepancies of the latter landmarks.

Assuming that the landmark-dependent discrepancy Ei contributes to the actual discrepancy Wi of any gene i, we define the corrected genetic position geni and discrepancy i, respectively,

(5)

(6)

According to our hypothesis these variables should significantly correct the errors due to the mispositioning of reference markers.

Indeed, on the average the corrected genetic positions correlate better with the physical map, and for most genes the discrepancies i are reduced when compared to Wi (Figure 2A and Figure B). More precisely, distributions of Wi and i (Figure 2C and Figure D) have average values of -1.52 and 0.31°, and standard deviations of 4.44 and 3.00°, respectively. Actually, these distributions are both Gaussian, as confirmed by a {chi}2 test (not shown).

A single measure allowing an evaluation of the global correction derived from our hypothesis is provided by the sum of the squares of discrepancies (SSD). It clearly appears that

(7)
corresponding to the uncorrected discrepancies Wi, is considerably higher than

(8)
corresponding to the corrected values i.

The correction also affects the apparently periodic behavior of the (Pphyi; Wi) points (Figure 2A), which is hardly detectable in the set of the (Pphyi; i) points (Figure 2B). This effect can be measured by introducing the autocorrelation functions K({delta}) and ({delta}), which test the global correlation between all discrepancies separated by {delta} degrees

and

(9)
where

and

Due to the circularity of the chromosome, these functions are periodical, with a period of 360°, and symmetrical with respect to the {delta} = 180° axis. As shown in Figure 2E, autocorrelation is most pronounced for gene distances {delta} from 0 to ~40° and from 90 to 110° in the uncorrected data set. The particularly high autocorrelation for distances up to 30° is certainly due to the size of the PBS1 transducing DNA, which amounts to 27.6°. When computed with the corrected discrepancies i, the autocorrelation drastically decreases for any given {delta} (Figure 2F).

A single measure allowing evaluation of the effect of the landmarks hypothesis on the autocorrelation can be obtained by computing the sum of the squares of the autocorrelation values (SSA) of K({delta}) and ({delta}). Again,

(10)
computed from the uncorrected set of values, is higher than

(11)
computed from the corrected set of values.

Validation of the landmarks hypothesis:
It is clear that corrections based on our hypothesis reduce the global imprecision of the genetic mapping and nearly eliminate its periodic character. However, the particular choice of the genetic landmarks of HENNER and HOCH 1980 Down for performing the correction was arbitrary, despite its "historical" justification. A priori, it cannot be ruled out that a different set of "landmark" markers might provide an even better correction. To test this hypothesis, we performed the following computer simulation. Five thousand sets of 22 genes were randomly sampled from the 1020 genetically mapped loci, with the sole condition that each sampled gene must possess a unique genetic position within the set. These 22 randomly sampled genes were assumed to constitute a set of reference landmarks, from which corrected discrepancies were calculated (Equation 4Equation 5Equation 6). The correction was evaluated by computing the sums of squares of these corrected discrepancies and their autocorrelation values (Figure 3). Most simulations provided values lower than the corresponding uncorrected R and A values, but only 4.48% of the simulations yielded values lower than and , obtained with the Henner and Hoch set of landmarks. In the latter cases the improvement in the correction was only marginal, in particular with respect to . In conclusion, the correction made with 22 out of 24 of the Henner and Hoch set of landmarks, falling within the 5% best randomly generated corrections, is statistically significant. In other words, the landmarks hypothesis could be given a 95.52% probability of being true.



View larger version (21K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 3. Validation of the landmarks hypothesis. The sum of the squares of the autocorrelation values (SSA) and of the discrepancies (SSD) of the 1020 genetically mapped genes are plotted one vs. the other. The dots indicate the parameters computed from 5000 genetic maps amended for the influence of 22 randomly generated reference landmarks (see RESULTS). The circular symbol, having coordinates (;), refers to the map corrected from the errors of 22 out of 24 Henner and Hoch's landmarks. The squared symbol, having coordinates (A;R), corresponds to the uncorrected map. The values in percentages indicate the portions of the dots lying in the quadrants delimited by the lines SSA = and SSD = .

The landmarks hypothesis and PBS1 transduction:
To assess our model we placed five landmark markers on the physical map and determined the position of five loci linked to them in PBS1 transduction (LEPESANT-KEJZLAROVA et al. 1975 Down). The genetic position of each locus was expressed as distance from the physical position of the relevant landmark markers, obtained from measured cotransduction frequencies CA,B, using Kemper's equation

(12)
and

(13)
where x, the implicit solution of (12), is the distance between genes A and B expressed as a fraction of the size of the transducing fragment, equivalent to 27.6°, i.e., to 7.66% of the B. subtilis chromosome (HENNER and HOCH 1980 Down). It appears that the genetic positions of these loci fit rather well with their physical positions (Table 2). The discrepancies corresponding to thus recalculated genetic positions are indeed strongly reduced when compared to those present in the literature.


 
View this table:
[in this window]
[in a new window]

 
Table 2. PBS1 mapping and comparison of gene positions present in the literature to those obtained with respect to the physical position of relevant landmarks


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Our analyses provide strong evidence that major discrepancies between the physical and the molecular maps of the B. subtilis genome, which seem to be specific to given chromosomal regions, can be accounted for by factors that are not linked to the intrinsic nature of the genome sequence or to DNA metabolism. Indeed, it appears that the mislocation on the genetic map of a few loci, used as reference markers in mapping by PBS1-mediated transduction, were responsible for the observed phenomenon. As inferred from the literature and confirmed by our analysis, the landmark markers of HENNER and HOCH 1980 Down or a subgroup of them correspond to loci from which early errors were propagated. Other biological phenomena such as recombination hotspots or the presence of heterologous DNA inadvertently introduced from nonisogenic strains (e.g., FARMER and ROTHMAN 1965 Down; DEDONDER et al. 1977 Down) have apparently not determined any significant bias in results from classical mapping experiments. Should these phenomena really exist, their influence would be below the background noise revealed by the i distribution, having a calculated 3° standard deviation (Figure 2D).

In addition to providing for the first time a physical and thus absolute map, the sequencing of the B. subtilis chromosome (KUNST et al. 1997 Down) has made it possible to assess the reliability, by comparison, of classical mapping. This comparison has clearly revealed that PBS1 transduction was appropriate for accurate and unbiased mapping, possibly more reliable than expected in the past. This conclusion could help in placing on the physical map all of the genetically identified loci to which no ORF has been associated so far.

The analysis presented in this article provides an illustration of problems inherent to the compilation of experimental data by different laboratories and at different times, a compilation that did not necessarily rely on the same methodology or perform with the same accuracy. A rigorous bibliographical backtracking of the experimental data and of the propagation of the initial errors, which might have provided a more precise answer to the question raised, is a tremendous task. Even present-day information storage capacities, as well as public biological databases, theoretically providing the means to keep track of the information built up for such a collective work, would not necessarily help to obtain a quick and accurate response. Taking into account that most public biological databases are actually in a perpetual "moving" state, we believe that global approximation methods like those developed here will have to be devised when analyzing complex and "heterogeneous" observations.


*  ACKNOWLEDGMENTS

We are grateful to Dr. Ivan Moszer from Institut Pasteur for providing us with the data files and his "on-line" help. We are indebted to Prof. Dimitri Karamata for constructive discussions and support throughout this research. This work was supported by grant 96.0245 from the Office Fédéral de l'Education et de la Science (Switzerland).

Manuscript received September 15, 1998; Accepted for publication December 28, 1998.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

ANAGNOSTOPOULOS, C., P. J. PIGGOT and J. A. HOCH, 1993 The genetic map of Bacillus subtilis, pp. 425–461 in Bacillus subtilis and Other Gram-Positive Bacteria: Biochemistry, Physiology and Molecular Genetics, edited by A. L. SONENSHEIN, J. A. HOCH and R. LOSICK. American Society for Microbiology, Washington, DC.

BIAUDET, V., F. SAMSON, C. ANAGNOSTOPOULOS, S. D. EHRLICH, and P. BESSIERES, 1996  Computerized genetic map of Bacillus subtilis.. Microbiology 142:2669-2729[Medline].

DEDONDER, R. A., J. A. LEPESANT, J. LEPESANT-KEJZLAROVA, A. BILLAULT, and M. STEINMETZ et al., 1977  Construction of a kit of reference strains for rapid genetic mapping in Bacillus subtilis 168. Appl. Environ. Microbiol. 33:989-993[Abstract/Free Full Text].

DUBNAU, D., C. GOLDTHWAITE, I. SMITH, and J. MARMUR, 1967  Genetic mapping in Bacillus subtilis.. J. Mol. Biol. 27:163-185[Medline].

FARMER, J. L. and F. ROTHMAN, 1965  Transformable thymine-requiring mutant of Bacillus subtilis.. J. Bacteriol. 89:262-263[Free Full Text].

HENNER, D. J. and J. A. HOCH, 1980  The Bacillus subtilis chromosome. Microbiol. Rev. 44:57-82[Free Full Text].

HENNER, D. J., and J. A. HOCH, 1982 The genetic map of Bacillus subtilis, pp. 1–33 in The Molecular Biology of the Bacilli, edited by D. DUBNAU. Academic Press, New York.

JAIN, M. and E. W. MYERS, 1997  Algorithms for computing and integrating physical maps using unique probes. J. Comput. Biol. 4:449-466[Medline].

KEMPER, J., 1974  Gene order and co-transduction in the leu-ara-fol-pyrA region of the Salmonella typhimurium linkage map. J. Bacteriol. 117:94-99[Abstract/Free Full Text].

KUNST, F., N. OGASAWARA, I. MOSZER, A. M. ALBERTINI, and G. ALLONI et al., 1997  The complete genome sequence of the gram-positive bacterium Bacillus subtilis.. Nature 390:249-256[Medline].

LEPESANT-KEJZLAROVA, J., J. A. LEPESANT, J. WALLE, A. BILLAULT, and R. DEDONDER, 1975  Revision of the linkage map of Bacillus subtilis 168: indications for circularity of the chromosome. J. Bacteriol. 121:823-834[Abstract/Free Full Text].

MOSZER, I., P. GLASER, and A. DANCHIN, 1995  SubtiList: a relational database for the Bacillus subtilis genome. Microbiology 141:261-268[Abstract].

PANG, A. S., S. NATHOO, and S. L. WONG, 1991  Cloning and characterization of a pair of novel genes that regulate production of extracellular enzymes in Bacillus subtilis.. J. Bacteriol. 173:46-54[Abstract/Free Full Text].

PIGGOT, P. J., 1989 Revised genetic map of Bacillus subtilis 168, pp. 1–41 in Regulation of Prokaryotic Development, edited by I. SMITH, R. A. SLEPECKY and P. SETLOW. American Society for Microbiology, Washington, DC.

PIGGOT, P. J. and J. A. HOCH, 1985  Revised genetic linkage map of Bacillus subtilis.. Microbiol. Rev. 49:158-179[Free Full Text].

PIGGOT, P. J., M. AMJAD, J.-J. WU, H. SANDOVAL and J. CASTRO, 1990 Genetic and physical map of Bacillus subtilis 168, pp. 494–532 in Molecular Biology Methods for Bacillus, edited by C. R. HARWOOD and S. M. CUTTING. Wiley, London.

SPIZIZEN, J., 1958  Transformation of biochemically deficient strains of B. subtilis by deoxyribonucleate. Proc. Natl. Acad. Sci. USA 44:1072-1078[Free Full Text].

SUEOKA, N. and H. YOSHIKAWA, 1965  The chromosome of Bacillus subtilis: I. Theory of marker frequency analysis. Genetics 52:747-757[Free Full Text].

TAKAHASHI, I., 1961  Genetic transduction in B. subtilis.. Biochem. Biophys. Res. Commun. 5:171-175[Medline].

TORO, C. S., G. C. MORA, and N. FIGUEROA-BOSSI, 1998  Gene transfer between related bacteria by electrotransformation: mapping Salmonella typhi genes in Salmonella typhimurium. J. Bacteriol. 180:4750-4752[Abstract/Free Full Text].





This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Rivolta, C.
Right arrow Articles by Pagni, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Rivolta, C.
Right arrow Articles by Pagni, M.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 1999 by the Genetics Society of America.