| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Gene of the Ciliate Stylonychia lemnae (Alveolata; class Spirotrichea)
Corresponding author: David H. Ardell, Box 596, Biomedical Center, Uppsala University, SE-751 24 Uppsala, Sweden., dave.ardell{at}icm.uu.se (E-mail)
Communicating editor: M. FELDMAN
| ABSTRACT |
|---|
DNA polymerase
is the most highly scrambled gene known in stichotrichous ciliates. In its hereditary micronuclear form, it is broken into >40 pieces on two loci at least 3 kb apart. Scrambled genes must be reassembled through developmental DNA rearrangements to yield functioning macronuclear genes, but the mechanism and accuracy of this process are unknown. We describe the first analysis of DNA polymorphism in the macronuclear version of any scrambled gene. Six functional haplotypes obtained from five Eurasian strains of Stylonychia lemnae were highly polymorphic compared to Drosophila genes. Another incompletely unscrambled haplotype was interrupted by frameshift and nonsense mutations but contained more silent mutations than expected by allelic inactivation. In our sample, nucleotide diversity and recombination signals were unexpectedly high within a region encompassing the boundary of the two micronuclear loci. From this and other evidence we infer that both members of a long repeat at the ends of the loci provide alternative substrates for unscrambling in this region. Incongruent genealogies and recombination patterns were also consistent with separation of the two loci by a large genetic distance. Our results suggest that ciliate developmental DNA rearrangements may be more probabilistic and error prone than previously appreciated and constitute a potential source of macronuclear variation. From this perspective we introduce the nonsense-suppression hypothesis for the evolution of ciliate altered genetic codes. We also introduce methods and software to calculate the likelihood of hemizygosity in ciliate haplotype samples and to correct for multiple comparisons in sliding-window analyses of Tajima's D.
CILIATES are characterized by nuclear duality: they possess micronuclei serving primarily germ-line functions and macronuclei serving primarily somatic functions. Micronuclei divide meiotically before sexual exchange and are the developmental precursors of macronuclei, but are transcriptionally silent during vegetative growth. Macronuclei are the source of templates for transcription during vegetative growth but are destroyed during sexual reproduction and regenerated afterward. The genomic organization of the two types of nuclei is usually quite different. Macronuclear genomes in the class Spirotrichea that includes Stylonychia lemnae are partitioned into tens of thousands of different types of gene-sized pieces, so called because they are short,
2 kb in average length, and contain only one or a small number of genes each. Gene-sized pieces are highly polyploid: tens of thousands of each type populate a macronucleus on average. Although they are capped on both ends by short telomeres, they lack centromeres and neither pair nor segregate in mitosis. Therefore, strictly speaking, gene-sized pieces are not "chromosomes." During vegetative growth the macronucleus divides, and gene-sized pieces are randomly partitioned to daughter cells in a process sometimes called amitosis (reviewed in ![]()
Micronuclear chromosomes are far larger and fewer in number than macronuclear gene-sized pieces. Micronuclear chromosomes are generally diploid. During sexual conjugation, haploid micronuclei are exchanged and preexisting macronuclei are destroyed. After conjugation, new macronuclei develop from a single micronucleus through a sequence of events that include extensive DNA rearrangements. During this process, called macronuclear development, ciliates eliminate large micronuclear chromosomal fractions (up to 9598% of the S. lemnae micronuclear genome), excise the remainder into short fragments at chromosomal breakage sites, and amplify the remainder to macronuclear ploidy (![]()
![]()
![]()
IESs may be classified by their mechanism of excision. The types of IESs studied here are AT-rich stretches of DNA bounded by short repeats (218 bp; ![]()
Macronuclear destined segments (MDSs) are the micronuclear segments that are spliced together and retained during macronuclear development. We also refer to these segments as MDSs after they have been assembled in macronuclear sequence. The MDSs of the majority of genes that have been studied are collinear in micronucleus and macronucleus. ![]()
The gene encoding the
subunit of DNA polymerase (DNA pol
) is the largest and most complex scrambled gene known (![]()
![]()
![]()
![]()
![]()
![]()
A schematic of the micro- and macronuclear organization of DNA pol
in S. lemnae, along with the terminology and notational devices introduced in this article, appears in Fig 1. The micronuclear version of MDS 6, only 12 bases long, has not been found. IES X-29 and MDS 30 share a homologous region of at least 199 bp that differ by seven mismatches or 3.5% (![]()
|
We report here our study of macronuclear DNA polymorphism of DNA pol
from five Eurasian strains of S. lemnae. The primary aim of our study was to look for recombination between the major and minor loci to gauge their genetic distance. As this was the first analysis of polymorphism in a ciliate scrambled gene, we wanted to examine the effect of gene scrambling on molecular evolution and compare the level of polymorphism to that of genes in other species. We hoped we might also learn about the molecular mechanism of unscrambling, including how precise and regular the unscrambling process is, what kinds of errors occur, and at what rates. We looked for evidence of alternative unscramblingwhether a given micronuclear gene can unscramble in more than one way.
Two examples of alternative micronuclear processing have been studied in hypotrichs. In O. fallax, S. histomuscorum (formerly O. trifallax), and reportedly other stichotrichs, the 81-MAC locus occurs in three macronuclear versions, each known to come from the same micronuclear locus (![]()
![]()
![]()
![]()
![]()
![]()
We have found direct and indirect evidence of alternative unscrambling within paralogously duplicated MDSs, at least in MDSs 29 and 30 in the superpointer region. Thus, IES X-29 is in fact a functional duplicate of MDS 30. The major and minor loci have demonstrably different genealogies in our data, suggesting genetic recombination between them. Although this recombination could be developmental or meiotic in origin, because we have observed only two haplotypes per individual, we favor the latter interpretation. We report a reproducible example of unscrambling error. Finally we demonstrate that unscrambling directly contributes to the relatively large nucleotide diversity in ciliate macronuclear scrambled genes.
| MATERIALS AND METHODS |
|---|
Strains:
Macronuclear DNA from five strains of S. lemnae was generously provided by the laboratory of Dr. Hans Lipps (Witten University, Witten, Germany). One strain ("RUS") was collected from St. Petersburg, Russia; one was from North Germany (Dornen, annotated as "NGR"); two were from South Germany (Entringen, denoted "SGR1" and "SGR2"); and the last was from Lake Federsee, in South Germany (denoted "FED"). The strains were derived from lab crosses of strains collected by the laboratory of Dieter Ammermann (University of Tübingen, Tübingen, Germany).
The numbers of generations after conjugation of the clones we analyzed were unknown. Two of the clones were closely related to the one from which we previously sequenced micronuclear and macronuclear versions of DNA pol
(![]()
PCR sequencing of haplotypes:
A 4.8-kb region was amplified from macronuclear DNA of each strain, using the fewest number of cycles (usually 2730) required to produce a band on an ethidium bromide-stained agarose gel. This approach reduced the chance of generating artifactual recombinants during PCR. The primers used were 5'-AGRAATAARAATTGAATAATRATACCGCG (forward) and 5'-GTCTTTGAGCAACTGCCACATG (reverse). A hot-start PCR was used, followed by 94° denaturation (25 sec), 52° annealing (30 sec), and 72° extension (7 min). Final elongation was at 72° for 10 min.
PCR products were cloned using the TOPO-XL cloning kit (Invitrogen, San Diego). Clones were verified with colony PCR using the original primers. A small region from 2847 to 3870 (coordinates in GenBank sequence accession
AF194338) from
10 clones for each of the five strains was sequenced to type the number of haplotypes per strain. Two isolates of each haplotype were completely sequenced.
Bioinformatics:
Sequence and alignment manipulations:
Unless otherwise noted, the software used for this project were generalizable Perl scripts custom written by D. H. Ardell using BioPerl (v. 1.0; ![]()
GenBank-feature-based subsetting and manipulation of alignments were done with SUBALIGN. SUBALIGN requires an input GenBank file and an alignment of the polymorphic data with the sequence in the GenBank file, in this case the previously published macronuclear sequence AF194338 (![]()
![]()
Prediction of introns and initiation site:
The true reading frame and initiation site of the DNA polymerase
gene has been the subject of some controversy (![]()
![]()
gene in S. lemnae. These results will be published separately.
Annotation of conserved regions:
To relate the present data to functionally important domains of the polymerase
subunit protein, we used the annotation of protein-conserved regions from ![]()
![]()
![]()
Analysis of mosaic structure in DNA polymorphism:
We used two methods to detect mosaic structure in the Stylonychia macronuclear polymorphism data. TOPAL version 2.01b (![]()
TOPAL was run with a window size of 500 and an increment of 5 nucleotides. Because of the AT bias in the data (66% on average including all data), we used the "ML model" of DNA substitution, which is the "F84" model (see ![]()
We used RETICULATE (![]()
Gene genealogies and splits graphs:
We calculated genealogies of the major and minor loci separately and after removing MDSs 29 and 30 and pointers. Neighbor-joining and parsimony trees were calculated in PAUP* (v.4.0b8; ![]()
![]()
Polymorphism statistics and neutrality tests:
Polymorphic statistics and sliding-window analysis were calculated with a general-purpose command-line utility called PI (D. H. Ardell). Statistics were checked and examined for significance with DNASP (v.3.0; ![]()
![]()
![]()
| RESULTS |
|---|
Stylonychia DNA pol
is highly polymorphic:
Approximately 10 clones from each of the five strains were partially sequenced to screen the number of macronuclear haplotypes per strain. We failed to find more than two haplotypes per strain. Since one haplotype was shared between two apparently homozygous individuals, we obtained a total of seven macronuclear haplotypes. One of these seven contained nonsense and frameshift mutations and is believed to be inactive (see below). A total of 4734 nucleotides of overlapping sequence were obtained from the others, spanning from position 75 (in the 5' leader) to position 4808 (before the end of the coding region) of the previously published macronuclear sequence AF194338 (![]()
Haplotypes NGR-a1 and SGR2-a1 were highly similar to AF194338. Excluding ambiguities, the per-site pairwise differences of these haplotypes from AF194338 were
= 2.37 x 10-3 and
= 9.91 x 10-3, respectively, while the average of other haplotypes (
= 1.58 x 10-2) with AF194338 was comparable to the estimated nucleotide diversity of the sample (
= 1.74 x 10-2). Because the haplotypes that contributed to the AF194338 sequence were not present in our sample, we could not directly use the method of ![]()
There was only one segregating insertion/deletion (indel) in the data, a singleton. A haplotype from the Federsee strain, FED-a2, had a single-base deletion of an A at position 210 relative to the others and to AF194338. This deletion lies in the first inferred intron (our unpublished data) and as such we deem it to be a true segregating indel.
Two strains, RUS and SGR1, from St. Petersburg, Russia and Entringen, South Germany, respectively, were apparently homozygous. The single haplotypes each presented were identical up to a small proportion of sequencing ambiguities (3/4734 = 0.6% in RUS and 5/4734 = 0.1% in SGR). The positions of these ambiguities were completely nonoverlapping.
This apparent homozygosity and haplotype sharing may actually be hemizygosity and/or due to experimental error. Macronuclear hemizygosity may arise naturally through the stochastic partitioning of macronuclear molecules during growth or through fluctuations in the number of micronuclear chromosomes that develop in the macronucleus (reviewed in ![]()
![]()
![]()
To examine the relative likelihood of hemizygosity we used the results of ![]()
![]()
![]()
w, n, k, {ai}) of observing an allele m times, m
1, in a sample of size n + m containing k alleles, given Watterson's estimator
w, is

where ai is the number of alleles present exactly i times in the sample. We use Watterson's estimator because it is invariant to allelic copy number and has relatively low variance. We implemented this calculation in a program called KM.
In our data, for the entire region we sequenced, the value of Watterson's estimator was approximately
w
83. With this value, the likelihood that we actually sampled SGR1-RUS only once in a sample of size six was >4 times that of sampling it twice (in a sample of size seven), >71 times that of sampling it three times, and 1000 times that of sampling it four times (assuming all other alleles are sampled once). Furthermore, when all other alleles are singletons (k = n + 1), we can show that
w must be less than (k2 - k)/2 for the likelihood of sampling an allele twice (homozygosity) to exceed that of sampling it once (hemizygosity). Also, under these assumptions, for any value of
w, the maximum-likelihood copy number cannot exceed two.
We conclude that either one of the two strains was hemizygous or biased amplification of SGR1-RUS occurred, and that the apparent haplotype sharing may have been due to experimental error. We assumed that we sampled only six functional alleles, which we refer to as haplotypes because we have not strictly demonstrated allelism. Except for the calculation of polymorphism statistics and neutrality tests, most of our results are indifferent to the true sample size (i.e., including an extra copy of SGR1-RUS). For results comparing nucleotide diversity to the number of segregating sites, because we are emphasizing an excess nucleotide diversity, this treatment of the data is conservative.
The overall level of polymorphism measured in nucleotide diversity (
) was 0.0174 ± 0.0088 with standard error assuming no recombination as calculated according to ![]()
ortholog are six sequences from a protein-coding partial segment of dnaE from the bacterium Vibrio cholerae (![]()
|
In comparison with the rough average nucleotide diversity of Drosophila and human protein-coding genes, the nucleotide diversity of DNA pol
in both Vibrio and Stylonychia is 2 and 10 times greater, respectively (![]()
![]()
in Stylonychia, this is considered hypervariable (![]()
![]()
0.2), but the extracellularity, low amino acid complexity, and high, probably neutral, nonsynonymous variation of its encoded protein make this case probably exceptional (![]()
A naturally misunscrambled haplotype:
The exceptional haplotype SGR2-a2 was reproducibly obtained from one of the South German (Entringen) strains, SGR2. This haplotype had at least three indels that could be explained by incomplete unscrambling: (1) MDS 6, the 12-base MDS of unknown micronuclear location, had not been spliced in; (2) the first 5' 12 bases of MDS 30 were missing; and (3) MDS 32 had not been spliced in, and instead IES 31-33 had been retained (Fig 2).
|
The failure in SGR2-a2 of MDS 32 to be correctly spliced and of IES 3133 to be excised may have been caused by two mutations at sites 3355 (G
T) and 3361 (C
A) in pointer 3132. SGR2-a2 also carried a unique base difference at site 3381 from the other haplotypes in pointer 3233. As stated above and reinforced by other results below, strain SGR2, from which this haplotype was taken, is closely related to the strain whose micronuclear sequence was previously published (![]()
Although the 12-base deletions of M6 and at the 5' end of M29 did not introduce frameshifts in the gene, many other small frameshifting deletions and insertions and a large number of unique nonsynonymous or nonsense-inducing differences that were unrelated to known locations of splicing were present in SGR2-a2. In relation to the allelic consensus, the SGR2-a2 coding region presented two frameshifting (1 base) insertions, four (12 base) deletions, 2 nonsense mutations, 25 nonsynonymous mutations, and 36 synonymous mutations. Noncoding regions contained 3 point mutations and one single-base deletion. The foregoing treats each mutation as an independent occurrence, that is, tallying synonymous and nonsynonymous substitutions disregarding the actual reading frame of the haplotype but rather analyzing each in the reading frame of the other haplotypes. We did not observe a tendency for compensatory frameshift mutations to restore frame. Rather, an abundance of stop codons appeared throughout its length. It was therefore surprising that among unique point mutations, in the original reading frame, nearly 1.5 times as many point mutations were synonymous as nonsynonymous. The relative base composition of the data was (A = 0.378, C = 0.154, T = 0.280, and G = 0.188). The expected ratio of synonymous to nonsynonymous mutations in a neutrally evolving sequence of this composition is 0.251, assuming infinite codons as calculated with the program NONSYN (D. H. Ardell), whereas the value would be 0.323 if all nucleotides were equally frequent. The observed ratio (1.44) is significantly different from its expectation (continuity-corrected
2 = 55.3, P < 0.001).
In summary, some of the sequence disruptions in SGR2-a2 corresponded to MDS-IES boundaries, suggesting partial or erroneous unscrambling. Many more did not correspond to MDS-IES boundaries. These suggested that SGR2-a2 is inactivated. We therefore excluded this haplotype from our subsequent analyses of recombination and polymorphism. However, the excess of synonymous mutations in this haplotype demands further explanation, and we turn our attention to it again below.
Direct evidence of a paralogous MDS:
We have scrutinized the sequence differences between IES X-29 from the previously published major locus (GenBank accession
AF194337) and MDS 30 from the minor locus (GenBank accession
AF194336) and show in Fig 3 evidence that IES X-29 is in fact an actively translated and transcribed paralogous copy of MDS 30.
|
If the major and minor loci were spliced together anywhere within IES X-29 as we know it rather than at its beginning, no disruption to the protein sequence would occur other than a deletion of the first five amino acids encoded by MDS 30 and a single conservative replacement of lysine with arginine caused by a transition at site 3022. All other differences between MDS 30 and IES X-29 are synonymous. Suggestively, AF194338 carries the IES X-29 state at site 3022 rather than the MDS 30 state, but the bulk of evidence suggests that AF194338 MDS 30 indeed derives from the minor locus MDS 30 in that strain.
The polymorphism data in the region are consistent with alternative unscrambling of MDS 30 and IES X-29, assuming that the corresponding micronuclear data from ![]()
|
Some aspects of the polymorphic data could be explained as alternative insertion of MDS 30 and IES X-29 into the macronucleus. Two haplotypes involving three synonymous segregating sites (2888, 2948, and 2952) could be explained as belonging to either the MDS 30 or the IES X-29 states as defined by AF194336 or AF194337, respectively, with the majority of the data by this interpretation coming not from MDS 30 but rather from IES X-29. Site 2888 is at the beginning of pointer P2930, where the base differences had previously been observed in micronuclear data (![]()
The previously published sequence data provide evidence that not only MDS 30 but also MDS 29 exists in at least two alternatively unscrambled copies. A large number of discrepancies exist between the previously published micro- and macronuclear versions of MDS 29, and these differences are also reflected in a telltale way in our polymorphism data (supplementary Fig 1; http://www.genetics.org/supplemental/). Fig 3 shows one of these differences, at site 2876. There are 12 other such unambiguous mismatches between macronuclear and micronuclear MDS 29 from the same strain (AF194338 and AF194337). We cannot easily explain this discrepancy through allelic micronuclear differences because the experimental procedure that was used generally detects these as ambiguities (although PCR bias cannot be excluded). However, in 9 of the 12 mismatching sites AF194338 and all polymorphic data except the misunscrambled haplotype SGR2-a2 are fixed in a state other than the observed micronuclear MDS 29 from AF194337 (supplementary Fig 1, http://www.genetics.org/supplemental/). The data therefore suggest that MDS 29 also exists in a paralogous micronuclear form, and this as yet unobserved paralogous MDS 29 is the source of most of the macronuclear data that have been observed to date.
Thus, the misunscrambled haplotype SGR2-a2 complements this evidence for paralogous alternatively unscrambled MDSs 29 and 30. The association of variability with a haplotype already identified as misunscrambled strongly implicates the unscrambling process in generating macronuclear variability generally. SGR2-a2 is the only haplotype that matches micronuclear MDS 29 in AF194337 perfectly, and furthermore, SGR2-a2 matches IES X-29 at all three of the mismatching sites in Fig 3 that are not otherwise reflected in polymorphic data (sites 2975, 3011, and 3053). Together with the 12-base deletion just after pointer 2930, shared by micronuclear AF194337 and SGR2-a2, these results strongly suggest that both MDSs 29 and 30 in SGR2-a2 derive mainly from a major locus sequence very similar to AF194337.
The previously published sequence AF194338, contrastingly, appears to derive almost entirely from the minor locus at MDS 30 and probably also at MDS 29. The minor locus is the most likely location for the hypothetical paralogous MDS 29, since homology in this region for MDS 30 is already known, and the sequence upstream of MDS 30 in the minor locus is unknown. Under this hypothesis MDSs 29 and 30 exist in two extensive and directly linked paralogous copies in both the major and minor loci, and one or more splicing events can occur at variable locations within these paralogous copies without disrupting the final gene. That is, splicing events between the major and minor loci within MDSs 29 and 30 may be multiple and they may vary in location from individual to individual. Note that if and when the major locus IES X-29 is the source for macronuclear MDS 30, 12 bases at the 5' end of MDS 30 must be spliced in from somewhere else, but the precedent of a short splice from another part of the genome already exists in MDS 6.
To conclude, all the differences between IES X-29 and MDS 30 are synonymous and are reflected in polymorphism data, the misunscrambled haplotype, or both, suggesting that IES X-29 is an additional, paralogous and alternatively unscrambled MDS 30. Also, the majority of the data at MDS 29 seem to reject a major locus origin for this part of the gene in all but the misunscrambled haplotype SGR2-a2.
Intragenic mosaic structure:
We can confirm the hypothesis of an alternative, dominantly minor locus origin of MDS 29 by analyzing the mosaic structure of the polymorphism data. Originally, our motivation was to check for genetic linkage of the major and minor loci. According to the ![]()
Analysis with an augmented version of TOPAL 2.01b (Fig 4C, see MATERIALS AND METHODS) showed significant mosaic structural signals between MDSs 29 and 30, between MDSs 35 and 36, and in the middle of MDS 30. But by far the largest peak in DSS came near the beginning of MDS 29. The shape and statistical significance of the DSS peak at the 5' end of MDS 29 were the same when a simpler Jukes-Cantor model was used for calculating DSS (data not shown). The absence of other recombination spikes in the distal region (where major and minor locus MDSs alternate) is probably due to the shortness of these MDSs relative to the window size used to calculate the DSS statistic [500 nucleotides (nt)].
The simplest model of meiotic recombination is not consistent with the concentrated multiple significant peaks in Fig 4C. Similarly, intermolecular unscrambling across homologous chromosomes during macronuclear development would be expected to create mosaic patterns of DNA polymorphism, but not concentrated in a particular region.
A more fitting explanation is that MDS 29 exists in a duplicated copy linked to the minor locus, as postulated in the previous section, and that variability in the location of splicing, and possible multiplicity in splice location within particular alleles, leads to a concentration of DSS spikes within this extensively duplicated region at the border of the major and minor loci. The data are also consistent with partial gene conversion events within a paralogously duplicated region encompassing MDSs 29 and 30.
Variability in splice location between two extensive paralogs can lead to mosaic macronuclear data even when there is no variation segregating within the paralogs, so long as there are differences between them. For this reason we cannot conclude from the peaks in the MDS 29MDS 30 region that the major and minor loci are unlinked. However, the significant peak between major locus MDS 35 and minor locus MDS 36, a region with no evidence of duplication, suggests that the major and minor loci may have independent genealogies.
We note in passing that the peaks in recombination signals seem to occur between the conserved protein domains, suggesting that either recombination or splicing during unscrambling or both is suppressed there (Fig 4C). In the micronucleus, this could be due to genetic hitchhiking of linked neutral sites in regions of purifying selection.
While DSS reveals aggregate local variation in windows of genealogical signal indicative of a recombination-like event, a different level of resolution on the mosaic structure in the data is achieved with compatibility analysis, as shown in Fig 5. In compatibility analysis all segregating sites are individually and simultaneously compared in pairwise fashion. Extra spatial resolution is gained at the cost of loss in power to detect genealogical differences of sequence regions, because all four haplotypes of two alleles at two sites must be present to detect incompatibility, pairwise incompatible sites may be mutually compatible with a third site, and genealogical information from sites with more than two segregating nucleotides may not be fully used.
|
Fig 5 shows results of compatibility analysis on a rescrambled version of the macronuclear data, projecting it into our best estimate of its hereditary micronuclear structure using the program RESCRAMBLE. In this reordering, major locus MDSs on the Crick strand are reordered in reverse complement, the minor locus is appended on the 3' end of the major locus, and MDS 6, the location of which is unknown, is appended to the 5' end. Departing from the previous model (Fig 1), but consistent with the DSS scan (Fig 4) and polymorphism data, MDS 29 is appended to the 5' end of the minor locus. Supplementary Fig 2 (http://www.genetics.org/supplemental/) shows our results with the raw data, along with the coordinates of sites analyzed in reference to the polymorphic alignment. The results are identical up to permutation, although the data pattern is easier to see in the form shown in Fig 5.
Several results stand out from compatibility analysis of the rescrambled macronuclear data:
The small-scale incompatibility within the major locus is very curious; its physical distance is much too small to display such extensive evidence of recombination. Although this signal could arise from meiotic recombination, it is more likely due to some combination of alternative unscrambling with variability in splice location, possibly among additional unobserved paralogous major locus MDSs, intragenic recombination during macronuclear development as has been observed in Tetrahymena (![]()
The interpretation of additional paralogous major locus MDSs might also explain the excess of synonymous mutations in the misunscrambled haplotype SGR2-a2. In that haplotype all of the synonymous mutations were in major locus MDSs, while nonsynonymous mutations were present in both major and minor locus MDSs (data not shown).
Gene genealogies and splits graphs of the major and minor loci:
We then examined more explicitly the genealogical patterns in the data. In so doing we asked whether the data reflect the geographic origin of the haplotypes, whether there is any aggregate support for the major and minor loci sharing genealogical history, and, if not, what the minimum number of changes that separate the genealogies is. These last questions bear on the open issue of whether the major and minor loci are meiotically linked.
We used split decomposition to make genealogical networks of the major and minor loci because of the incompatibilities within the major locus. We excluded regions that we knew or suspected were duplicated in both the major and minor loci: MDSs 29 and 30 and pointer sequences with copies in both loci. Fig 6 shows that the minor locus split system was tree-like with a perfect fit of the split decomposition to the original distance matrix used, while the major locus genealogy was much less tree-like and the fit of the decomposition was worse. Even when broken up into Watson and Crick strands, with the Watson strand into distal and proximal regions, the genealogies were not tree-like and the split decomposition had poor fit, indicating some unknown level of recombinative structure in the data. This is consistent with the results of the compatibility approach in Fig 5. However, in none of these splits graphs of parts of the major locus did we observe any support for the minor locus tree. Both FED-a1 and NGR-a2 have different phylogenetic affinity at the major and minor loci, with very strong bootstrap support by all methods. Also, we note in passing that the minor locus genealogy appeared to reflect the geographic origin of the strains better than the major locus genealogy.
|
These results, particularly the absence of the minor locus genealogy in the major locus split system, suggest that the major and minor loci are meiotically recombining, that somatic intragenic recombination occurs after unscrambling preferentially between the major and minor loci, or that intermolecular unscrambling (across homologous chromosomes) occurs preferentially between the major and minor loci. Yet the latter two processes might generally be expected to produce up to four haplotypes per individual, where we have never observed more than two (see DISCUSSION). We assume in this argument that unscrambling, which is believed to occur at the polytene stage (D. M. PRESCOTT, personal communication), is independent across different chromosomes or haploid sets of chromosomes. With this caveat, we tentatively suggest that our observations are more consistent with meiotic recombination.
Excess nucleotide diversity at MDS 29:
A statistical summary of the polymorphism data broken down by region appears in Table 1. Statistical significance was calculated through parametric bootstrapping with coalescent simulations. Table 1 shows a borderline positive Tajima's D in MDS 29. Positive Tajima's D is consistent with a variety of demographic and evolutionary forces including subdivided population structure and the maintenance of polymorphism due to balancing selection as in the case of neutral sites linked to the segregating electromorph in Drosophila Adh (![]()
The data as a whole did not indicate that the sample was derived from a subdivided population. For instance, the average nucleotide diversity within individuals was approximately equal to if not greater than the diversity between individuals in a sliding-window analysis (considering distinct haplotypes, data not shown). Also, other features of the data had negative Tajima's D. Tajima's D was significantly negative for the 5' leader sequence, consistent with purifying selection, perhaps to maintain cis-acting regulatory elements. Among 10 of 1500 amino acids segregating, none were in conserved protein regions. Watterson's estimator per amino acid residue for the entire protein sequence was
w = 0.0029 ± 0.0016, Tajima's estimator per amino acid residue was 
= 0.0025 ± 0.0015, and Tajima's D was D = -0.89. Standard errors here assume no recombination (![]()
A sliding-window analysis of the data using PI, shown in Fig 4A and Fig B, shows a starred positive-value peak in Tajima's D (Fig 4A) corresponding to two windows in MDS 29 that were significantly positive using the ß-distribution (![]()
w = 83 for the region as a whole (no recombination, population growth, or migration) gave a 95th percentile of 2.1534 (one-sided test) among maximal window values of D. This does not detract from our result that D may be positive in MDS 29, but it does show that the results of sliding-window analyses of neutrality tests should be treated with caution if multiple comparisons are not taken into account.
That the excess nucleotide diversity occurs near the boundary between the major and minor loci suggests that the pattern may have more to do with the process of unscrambling than protein evolution does. A possible explanation is as follows: subdivided population structure violates the random-mating assumption of the Wright-Fisher model and can lead to a positive Tajima's D (see, e.g., ![]()
Taking pointer sequences as an example, only one of the two repeats will be spliced into the macronucleus, but exactly where and how the cutting and splicing occur is unknown. A macronuclear pointer could derive entirely from the headpointer, entirely from the tailpointer, or chimerically from both, depending on the splice location. During micronuclear meiosis, however, headpointers and tailpointers are distinct. They segregate and evolve independently. A headpointer haplotype could never fix in a population of tailpointers without some kind of conversion eventthey are not at the same loci. This can be directly compared to a structured population where gene conversion acts as a weak migration force. Thus, if in forming the macronucleus, splicing occurs at variable locations within any region, a macronuclear sample would look like a sample from a subdivided population in just this region. Heterozygosity will be in apparent excess, for instance, because different allelic states in each subpopulation may be fixed but will be mixed at the macronuclear level. This excess heterozygosity leads to a positive Tajima's D.
Thus, the positive trend in Tajima's D in MDS 29 is consistent with alternative locations of splicing across haplotypes, which was also suggested by our other analyses. It is tempting to also ascribe the positive value of Tajima's D within pointer sequences to this effect. They share with MDS 29 a low absolute level of polymorphism. However, the region sampled includes only five segregating sites. More data are necessary to settle the tantalizing issue of whether variability in splice location within pointers increases the nucleotide diversity there or in adjacent regions.
| DISCUSSION |
|---|
IES excision, MDS splicing, and gene unscrambling have generally been regarded as deterministic processes. For instance, ![]()
![]()
![]()
![]()
Our results suggest that gene unscrambling may be fundamentally variable and perhaps best regarded as probabilistic in nature. Our data suggest at least three modes of variability in unscrambling: (1) unscrambling appears to facultatively incorporate one among a set of paralogous MDSs and pointers, (2) splicing location appears to vary within paralogous MDSs, and (3) entire MDSs can fail to be inserted and entire IESs can fail to excise.
Our observation of unscrambling variability raises issues in light of the hypothesis that macronuclear sequences may guide the unscrambling of micronuclear MDSs during development, as discussed by ![]()
![]()
![]()
![]()
![]()
![]()
![]()
Suppose now that an error occurs in unscrambling, causing a misunscrambled macronuclear haplotype, such as SGR2-a2. Would this haplotype feed forward and increase the probability of error in future generations? Would it increase the chances of misunscrambling other alleles with which it is coinherited? If so, misunscrambled haplotypes could decrease long-term reproductive success, as exconjugant offspring of an individual carrying misunscrambled macronuclear haplotypes would be subsequently induced to misunscramble. Individuals that carry misunscrambled haplotypes arising even through environmental means, or lineages that carry mutations that increase the chance of unscrambling error (acting either in cis or in trans), would suffer a greater than expected long-term loss in reproductive success if there were no transfer of information from the macronucleus to the micronucleus in exconjugants. Further breeding and macronuclear injection experiments with the SGR2 strain could allow exploration of this "error catastrophe" issue.
Our data are consistent with a large genetic distance between the major and minor loci. This could possibly reflect a high meiotic recombination rate (short map distance) or a large physical distance between the loci. The data are also consistent with developmental recombination either in the unscrambling stagethat is, unscrambling across homologous chromosomesor afterward during amplification as has been observed over very short distances (<1 kb) in Tetrahymena (![]()
Yet, unlike in the case of Tetrahymena studied by Deak and Doerder, we did not find evidence for more than two macronuclear haplotypes per Stylonychia individual either in this study or in a separate analysis of >30 clones made from exconjugants of a laboratory cross of genetically distinct individuals (A. GOODMAN, G. ZILIOLI, D. H. ARDELL and L. F. LANDWEBER, unpublished data). This objection assumes that unscrambling occurs after polytene duplication and is in some measure independent across these duplicates with respect to any variability in macronuclear development. For instance, one way in which development may not be independent is through a possible common dependency on macronuclear inheritance. A further uncertainty in the intermolecular unscrambling interpretation is whether or not homologous chromosomes are paired at the polytene stage in Stylonychia (![]()
![]()
It is also possible that the chance of intermolecular unscrambling in at least this region increases with physical distance on the chromosome, in which case we are still left with the interpretation of a relatively large physical distance between the major and minor loci. This is provocative because if unscrambling can indeed take place over large physical distances or between separate chromosomes, the entire genome could potentially become available for DNA-splicing interactions during macronuclear development.
In the case of DNA pol
, the variability of unscrambling that we inferred in the MDS 2930 region was essentially silent at the protein level. There is, however, no reason why the development of other genes could not be much more dramatically affected by these same phenomena. In principle, alternative unscrambling could generate combinatoric diversity in macronuclear genes, analogous to that generated by alternative splicing or alternative DNA processing in the development of the vertebrate immune system. However, this adaptive potential comes at the cost of increased vulnerability to errors in the unscrambling process, for instance, through the increased chance of unscrambling at undesirable locations after random mutation creates additional matching pointer sequences.
Biochemical data suggest that the DNA elimination during IES removal occurs only within vesicles that subdivide and isolate polytenic chromosomes from one another (![]()
![]()
![]()
Nonscrambled ciliate genes might be affected by variability in macronuclear development, particularly IES excision. For instance, ![]()
From a population genetic perspective, alternative unscrambling of paralogous MDSs will have mixed consequences on the molecular evolution of ciliate genes. This is because more than one micronuclear locus can contribute to the same macronuclear locus. In principle, perhaps an arbitrary number of paralogous MDSs could segregate independently yet contribute to the same macronuclear locus. Even if there were no genetic variation within each of these independently segregating paralogous loci, their alternative incorporation or variability in the location of splicing between them would disproportionately increase macronuclear nucleotide diversity as we have observed in MDS 29. One may speculate that the alternative incorporation of paralogous MDSs decreases the efficacy of selection at individual loci, for instance, by decreasing the effective population size of individual paralogs. On the other hand, the efficacy of weak directional selection could be enhanced by genetic distance between MDSs, paralogous or otherwise, through the reduction of Hill-Robertson interference (![]()
![]()
![]()
The significant excess of unique synonymous differences in SGR2-a2 cannot be the consequence of allelic inactivation. It may be that this haplotype is more distantly related to the rest of the sample. Or perhaps this haplotype contains other paralogous MDSs that are alternatively unscrambled, but not otherwise present in the data. If so, perhaps their presence in SGR2-a2 came from a disruption in the unscrambling process. It is also tempting to attribute the small-scale incompatibility in the major locus (Fig 5) to the presence of other paralogous major locus MDSs, but no other evidence supports this.
At least in some cases, paralogous MDSs and other features of scrambled genes could help protect against errors in the unscrambling process and other more general deleterious developmental and evolutionary processes. For instance, the extensive paralogy of MDS 30 and inferred paralogy of MDS 29 occur precisely at a point of major disruption in gene structure, right before a conserved protein region (Fig 4). Paralogy provides redundancy that, so long as multiple developmental pathways are expressed within individuals, might increase the chances of creating a functional macronuclear gene. The partly unscrambled haplotype that we observed, SGR2-a2, directly demonstrates that unscrambling can fail and also that other features of scrambled genes can protect against these failures. For instance, MDS 6 and the 5'-most 12 bases of MDS 30 failed to splice into the otherwise correctly scrambled haplotype. Perhaps the short length of these segments, and their length multiplicity of three, protects against misunscrambling in other haplotypes that are actively transcribed and translated. There is almost certainly more variability in the unscrambling process than we directly observed, which was presumably limited to that which was neutral or nearly neutral in the laboratory-cultivated natural strains that we studied.
We suggest that ciliates endure rather frequent introductions of IESs or other noncoding sequence into their coding regions. This is proposed to occur both evolutionarily, through cis-acting mutations in pointer sequences, MDSs, and IESs, and developmentally, through chance errors in unscrambling and other DNA-splicing events during macronuclear development. Perhaps trans-acting mutations in the biochemical machinery of developmental DNA rearrangement also contribute to deleterious variability in the unscrambling process. However these errors arise, both the redundancy of paralogous MDSs and MDS-IES gene structure could play roles in protecting from loss of function as stated above. In this context we introduce the nonsense-suppression hypothesis for the origin of altered genetic codes in ciliates. All altered genetic codes in ciliates reduce the number of stop codons relative to the standard code, and such codes appear to have evolved multiple times in the evolution of ciliates (![]()
Many of our interpretations and their uncertainties rest on inferring the micronuclear sequences from a related but distinct strain. Future studies that include micronuclear sequencing, searching for the paralogous MDS 29 we have inferred, and careful breeding experimentsfor instance, with strains that present misunscrambled haplotypeswill vastly improve and refine our understanding of the genomic underpinning of gene unscrambling, its process, and its short- and long-term evolutionary consequences in ciliates and their genes.
The exceptional nature of ciliate genetics provides a potentially fertile ground for extending and testing molecular evolutionary theory and tools. For instance, gene scrambling and nuclear duality provide a unique opportunity to explore the interaction of recombination and selection on the evolution of protein-coding genes.
| FOOTNOTES |
|---|
Sequence data from this article have been deposited with the EMBL/GenBank/DDBJ Data Libraries under accession nos.
AY243489,
AY243490,
AY243491,
AY243492,
AY243493,
AY243494,
AY243495,
AY243496. ![]()
2 Present address: Institute for Cell and Molecular Biology, Department of Microbiology, Biomedical Center Box 596, Uppsala University, SE-751 24 Uppsala, Sweden. ![]()
3 Present address: MCDB Department, University of Colorado, Boulder, CO 80309. ![]()
| ACKNOWLEDGMENTS |
|---|
The authors thank Tom Doak, Matthew Webster, Alex Mira, Andrew G. Clark, Montgomery Slatkin, and two anonymous reviewers for helpful criticism of the manuscript. We also thank Michael Cummings and the Workshop on Molecular Evolution at the Marine Biological Laboratory at Woods Hole for hospitality and discussions at the beginning of this project. This material is based upon work supported by the National Science Foundation under a Postdoctoral Fellowship in Bioinformatics awarded to D.H.A. in 2000. L.F.L. acknowledges support from National Science Foundation grant 0121422 and National Institute of General Medical Sciences award GM59708.
Manuscript received February 26, 2003; Accepted for publication August 11, 2003.
| LITERATURE CITED |
|---|
AMMERMANN, D., 1965 Cytologische und genetische Untersuchungen an dem ciliaten Stylonychia mytilus Ehrenberg. Arch. Protistenkd. 108:109-152.
AMMERMANN, D., 1971 Morphology and development of the macronuclei of the ciliates Stylonychia mytilus and Euplotes aediculatus.. Chromosoma 33:209-238.[Medline]
AMMERMANN, D., 1987 Giant chromosomes in ciliates. Results Probl. Cell. Differ. 14:59-67.[Medline]
BERNHARD, D., 1999 Several highly divergent histone H3 genes are present in the hypotrichous ciliate Stylonychia lemnae.. FEMS Microbiol. Lett. 175:45-50.[Medline]
BYUN, R., L. D. ELBOURNE, R. LAN, and P. R. REEVES, 1999 Evolutionary relationships of pathogenic clones of Vibrio cholerae by sequence analysis of four housekeeping genes. Infect. Immun. 67:1116-1124.
CARTINHOUR, S. W. and G. A. HERRICK, 1984 Three different macronuclear DNAs in Oxytricha fallax share a common sequence block. Mol. Cell. Biol. 4:931-938.
CHALKER, D. L. and M. C. YAO, 1996 Non-Mendelian, heritable blocks to DNA rearrangement are induced by loading the somatic nucleus of