| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Genetics, Vol. 172, 127-143, January 2006, Copyright © 2006
doi:10.1534/genetics.104.040030
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Department of Genome Sciences, University of Washington, Seattle, Washington 98195
1 Address for correspondence: Department of Genome Sciences, Box 357730, University of Washington, Seattle, WA 98915.
E-mail: jht{at}u.washington.edu
| ABSTRACT |
|---|
|
|
|---|
15% of genes are members of cotranscribed operons (BLUMENTHAL et al. 2002). The two to eight genes in each of the
1050 operons are subject to similar transcriptional regulation (LERCHER et al. 2003), and genes within an operon are often involved in related biological processes (BLUMENTHAL et al. 2002). Such operon clusters are not generally composed of homologous genes, but instead seem to group distinct gene sequences for transcriptional coregulation (BLUMENTHAL et al. 2002). In addition to such functional clustering, specific examples of clusters of homologous genes have also been reported in C. elegans (e.g., GOTOH 1998; ROBERTSON 1998, 2000; SLUDER et al. 1999) and in a wide variety of other metazoans (e.g., FRITSCH et al. 1980; AKAM 1989; HOFKER et al. 1989; DEL PUNTA et al. 2000; GLUSMAN et al. 2001). A cursory global analysis of homologous gene clusters was reported in the C. elegans genome sequence report (C. ELEGANS SEQUENCING CONSORTIUM 1998). To investigate homologous gene clusters systematically, I developed an algorithm for scanning the genome for locally abundant gene families. This method identified 1391 cases of local clusters of two or more homologous genes, 216 of which had five or more members. The larger families tend to share a variety of interesting properties, including striking clustering on autosomal arms, an abundance of nematode-specific gene families, and probable involvement in environmental and pathogen interactions. | MATERIALS AND METHODS |
|---|
|
|
|---|
0.001 and with a blast alignment that extended over at least 80% of the query protein. For each such pair, the blast bit score was divided by the query length to generate a bit-score-per-residue value. All such pair values for the window were summed and divided by the number of genes in the window. A histogram bar was plotted at the position of the window centroid (the mean position of the coding start for each gene in the window). This plot is shown in Figure 1, after scaling and annotation of some of the major clusters. Note that this approach produces a histogram bar that reflects the total local gene clustering, regardless of whether this results from one or more gene families. It also weights high-scoring blast matches more heavily than low-scoring ones. The window was moved one gene at a time, so a specific local cluster will contribute values to several histogram bars in its region. Search window sizes ranging from 10 to 50 and various blast match criteria were tested with broadly similar results to those reported here.
|
Cluster number and size statistics:
To test whether the distribution of clusters was nonrandom, I used a position-randomizing approach. For each chromosome to test, gene order was randomized and an identical clustering algorithm was run with the new gene order. This was repeated multiple times to acquire a statistical sampling. The number of clusters in randomized tests fit a normal distribution and a one-sample t-test was performed to determine whether the real cluster number deviated from this distribution. Significance of the size of clusters was determined by a nonparametric test because the distribution of real cluster sizes deviated sharply from normal. A list of real cluster sizes was compared to a concatenated list of cluster sizes from multiple randomized tests and the lists were compared by the Mann-Whitney U-test. For both cluster number and cluster size, the P-value was two-tailed and was determined using InStat 3.05 (GraphPad Software).
Merged clusters:
The sliding-window approach arbitrarily limited clusters to the local 20 gene window, which is useful for plotting genomic distributions. For subsequent analysis, these local clusters were merged by joining clusters whenever they shared at least one gene. The result is a set of merged clusters, each of which represents a regional sequence family. More than one cluster of a particular sequence family will be assigned on the same chromosome only when their nearest genomic neighbors are separated by at least 20 unrelated genes. In principle, this might result in undesirable merging of extended groups of genes that are scattered sparsely across long regions. In practice, no such cases were found and clusters defined in this manner were remarkably tight, in the sense that most genes in each cluster region belonged to the cluster family, interspersed with a modest number of unrelated genes. Figure 6 shows an example, albeit an unusually dense one.
|
Data records:
An HTML table, which lists all gene clusters and documents the identity and size of each cluster with five or more genes (supplementary data 4 at http://www.genetics.org/supplemental/), was created for each chromosome. These tables include entries for each gene in the cluster and links to stable UCSC Family Browser pages for one member of each cluster. The linked UCSC page is the family browser output keyed to the protein encoded by the link gene and sorted by protein similarity. To provide a stable available data set, all of the UCSC family tables were saved and the links are to these saved files. In addition, two tables that merged clusters from all chromosomes were made. One was sorted by genome position and has selected annotations (supplemental data 3A at http://www.genetics.org/supplemental/); the second was sorted by cluster size and includes more extensive annotations, including annotations for every cluster of five or more genes (supplemental data 3B at http://www.genetics.org/supplemental/). Members of a gene family were defined by gene products on the UCSC Family Browser (UCSC Gene Sorter, May 2003 C. elegans data set; http://genome.ucsc.edu/) that had blastp E-values <106 and at least 20.0% blastp identity with reference members of the family. In a few cases, additional information that was more recent than the May 2003 data release was incorporated, notably for the seven-pass receptor (SR) families and the insulin-like gene family. All of these files are available in supplemental data 2 and 3 at http://www.genetics.org/supplemental/.
Conservation in other phyla:
All-by-all blastp searches were conducted with the most current predicted protein sets from Drosophila melanogaster, Saccharomyces cerevisiae, and Homo sapiens, using the WS120 version of C. elegans WormPep as query. Quality scores as a function of genome position were computed by averaging the blastp score for a sliding window of 20 genes. The score was computed by dividing the blastp bit score by the length of the query protein. The general feature of higher conservation on autosomal arms was first reported in C. ELEGANS SEQUENCING CONSORTIUM (1998). My results showed a smaller difference between autosomal arms and centers, probably because the protein query set used previously was from an earlier genome curation, which may have tended to exclude genes on chromosomal arms.
Cluster annotations:
Most annotations derive from reports on the Pfam and InterPro websites, which are based on conserved domains noted on the UCSC family browser [Pfam release 16 (http://www.sanger.ac.uk/Software/Pfam/); InterPro release 8.1 (http://www.ebi.ac.uk/interpro/); UCSC Gene Sorter, May 2003 C. elegans data set (http://genome.ucsc.edu/)]. In a few cases, blastp or
-blast searches were conducted (NCBI_Blast 2004 at http://www.ncbi.nlm.nih.gov/BLAST/) and manual inspection of hits was used to further confirm or reinterpret these annotations. The UCSC data set was from May 2003 (the most recent available at the time) and updated annotations of the SR and insulin gene superfamilies were abstracted from WormBase WS120 (March 2004) because I was aware that improved annotations had occurred in the interim.
Signal sequence and transmembrane domain analysis:
All 19,874 proteins analyzed for clustering were submitted to the SignalP 3.0 and TMHMM 2.0 servers (NIELSEN et al. 1997; REMM and SONNHAMMER 2000; BENDTSEN et al. 2004). A protein was assigned as secreted according to the SignalP HMM method. A protein was assigned as transmembrane (TM) if the TMHMM short report predicted one or more TM domains and the protein did not have a predicted signal sequence (these are often mistermed "N-terminal transmembrane domains"). Lists of the one-line summary SignalP and TMHMM outputs for the entire protein set are available in supplemental materials (supplemental data 7 and 8 at http://www.genetics.org/supplemental/).
Gene counts and nematode specificity:
Gene counts in C. elegans and C. briggsae were assessed mostly on the basis of blastp searches on WormPep and BriggPep with WormBase data set WS123 (Release WS123; http://wormbase.org/). The gene numbers are presented as rough estimates based on a relatively arbitrary E-value cutoff of 104 and a consensus hit count from several different queries from the family. No attempt was made to determine how many members are unpredicted or how many of the predicted members are likely to be pseudogenes. Representation outside of nematodes was assessed from a combination of Pfam and InterPro annotations and a
-blast search in June 2004 on the NCBI nr data set with a persistent threshold E-value of 106 (http://www.ncbi.nlm.nih.gov/BLAST/). This threshold appeared to serve well in preventing convergence on short domain matches that do not represent near-full-length homologs. For small proteins, these criteria were relaxed somewhat because of their lower information content. Prior to choosing
-blast search query proteins, family members were recursively aligned and culled in an attempt to discard gene-prediction artifacts.
-blast searches were initiated with proteins that appeared typical for the family as a whole, without any large insertions, deletions, or terminal extensions (these are common gene-finding artifacts in C. elegans).
Protein alignment, phylogenetics, and hydropathy plots:
Protein alignments for specifically investigated families were computed with ClustalX using BLOSUM matrices and otherwise default settings (THOMPSON et al. 1994, 1997). Phylogenetic trees were generated by Bonsai 1.1.4 (J. H. Thomas, March 2004 at http://calliope.gs.washington.edu/software/index.html) using the neighbor-joining distance method (SAITOU and NEI 1987) and by PHYLIP proml using the maximum-likelihood method (FELSENSTEIN 1993). Composite hydropathy plots were generated from ClustalX multiple alignments using Bonsai 1.1.4 and a window of nine amino acids. This method determines average hydropathy in aligned columns and is otherwise the same as Kyte-Doolittle hydropathy on single proteins (KYTE and DOOLITTLE 1982).
Codon analysis for positive selection:
Codon analysis was performed only with members that appeared typical for the family as a whole, with no large insertions, deletions, or terminal extensions. Sets of 510 closely related proteins were selected and aligned using ClustalX with BLOSUM matrices and otherwise default settings. This protein alignment was used to construct a maximum-likelihood phylogenetic tree with proml and to make a corresponding codon alignment. These were provided to the codeml program in the PAML package (YANG 1997). Models 7 and 8, using at least three starting dN/dS values to avoid local optima during maximum-likelihood analysis were run (YANG 1997). Statistical significance was assessed using a chi-square test of twice the difference in likelihood scores for the two models, with 2 d.f. (YANG 1997).
Clusters in D. melanogaster:
Analysis of clusters in D. melanogaster was the same as for C. elegans except that a gene window of 30, blastp cutoffs of 0.0001, and an alignment length of
70% were used. Statistical tests were similar to those for C. elegans.
| RESULTS |
|---|
|
|
|---|
|
|
|
|
Relationship of phylogenetic conservation to gene clusters:
The clustering method has the potential to find families with a wide range of conservation properties. Although the number of clusters makes a full description of their nature difficult, investigation of specific families made it clear that most or all meet a reasonable standard for constituting a gene family. A sampling of alignments among members of four families is shown in Figure 4. Apart from choosing families of sufficient size to provide abundant alignment material, these four families were arbitrarily chosen and appear typical. In all four families, a significant fraction of predicted proteins aligned dubiously with other family members, with large insertions, deletions, or extensions on one or both ends. Some of these are likely to be nonfunctional genes, but preliminary investigation suggests that many are due to errors in ab initio gene finding, since improved gene models were readily identified by manual curation (data not shown). The alignments in Figure 4 were made with proteins that appear typical for their family and that appear to have satisfactory gene models (no large deletions or insertions). In addition to good alignments within cluster families, extensive blast searches and annotation with the UCSC Gene Sorter (May 2003 C. elegans data set; http://genome.ucsc.edu/) showed that the sequences of cluster families are well separated from each other and from unclustered genes. A good test case was the SR families; for all previously identified families and one new family (see below), the clustering algorithm correctly grouped specific families in local clusters, even though members of different SR gene families are often close to each other and sometimes interspersed in the genome.
|
As a preliminary assessment of the relationship of cluster size to divergence, I made a systematic analysis using blastp to compare proteins within each cluster size class. Clusters showed a weak correlation between cluster size and mean protein divergence within the cluster, with a slight trend toward greater divergence in large clusters. Specifically, genes had a mean length-normalized blastp score within their cluster as follows: cluster-2 (1.06), cluster-3 (0.96), cluster-4 (0.99), cluster-5 (0.95), and cluster-6 and greater (0.73). As expected, there was substantial variation among clusters, but on average small local clusters appear to result from duplications that are nearly as old as those in larger clusters. Complete lists of mean blastp values for each cluster are available in supplemental data 6 at http://www.genetics.org/supplemental/. Due to difficulties in gene prediction and the use of blast scores as a surrogate for proper distance measures, this analysis should be regarded as strictly provisional.
Finally, I analyzed the relationship of cluster proteins to proteins in other phyla. As previously noted (C. ELEGANS SEQUENCING CONSORTIUM 1998), genes on autosomal arms in C. elegans tend to be less conserved in other phyla than genes in autosomal centers. This tendency is apparent in Figure 5, which graphs the best blastp match to D. melanogaster as a function of genome position for predicted proteins on three C. elegans chromosomes. Very similar patterns were observed for matches to S. cerevisiae and H. sapiens (data not shown). A substantial part of this trend appears to result from lower phylogenetic conservation of proteins in large gene clusters, which are concentrated on autosomal arms (Figure 5). The graphical correlation is striking and it stands up to quantitative scrutiny: cluster-5 proteins had a mean length-normalized best blastp score to D. melanogaster of 0.07, whereas cluster-2 proteins had a mean score of 0.146, and all other proteins had a mean score of 0.27 (see MATERIALS AND METHODS). This trend is not due to cluster genes being part of gene families in C. elegans: when analyzed without regard to genome position, the half of C. elegans proteins with best self-blast hits are slightly more conserved in D. melanogaster than the lower half of proteins (not shown). This presumably results from the fact that unclustered gene families in C. elegans include many that are particularly well conserved phylogenetically (see DISCUSSION).
|
Molecular identity of gene cluster proteins:
All gene clusters were documented as described in MATERIALS AND METHODS and these data are available as supplemental data 3 and 4 at http://www.genetics.org/supplemetal/. Briefly, gene clusters with five or more members were annotated using the UCSC family browser, WormBase, Pfam, various blast resources, signal sequence and transmembrane domain predictors, and other resources. The records include overall family size, potential functional identity, links to specific genes in each cluster, links to additional data, and other notes. Brief summaries of the 24 largest gene clusters are shown in Table 3, and additional summaries of all clusters-5 with functionally obscure gene products are found in Table 4. The molecular identities of gene products encoded by clustered gene families are unusual in a variety of ways. I summarize these features first and then discuss each in more detail. First, most of the families are nematode specific, suggesting that they evolve more rapidly than the typical gene. Second, the families are enriched for predicted secreted and transmembrane proteins. Finally, cluster genes are enriched for genes implicated in environmental interactions, specifically those involved in chemosensation, xenobiotic detoxification, and antimicrobial response.
|
|
Table 5 documents the enrichment for secretion signals and transmembrane domains in cluster-2 and cluster-5 genes when compared to noncluster genes. The difference is not as dramatic as for nematode specificity, but this results in part from the presence of a few large cluster families with putative cytoplasmic or nuclear localization (the F-box domain, MATH domain, and nuclear hormone receptor proteins). When analyzed at the level of families, the trend is clearer: 34 of the 50 largest cluster families are predicted to be secreted or transmembrane.
|
|
MATH-domain and F-box domain families:
The two largest novel cluster families are the MATH-domain family and the F-box-FTH domain family, with
100 and 200 members, respectively. Both are predicted to encode cytoplasmic proteins and neither has yet been implicated in environmental interactions. However, both families appear to be subject to positive selection, a property often associated with changing selective pressure from the environment (KAMEI et al. 2000; CHOI and LAHN 2003; THOMAS et al. 2005). Few of the MATH-domain and F-box-FTH domain genes have identified cDNAs and preliminary inspection of protein alignments and genomic sequences suggests that a substantial fraction of them are nonfunctional genes, perhaps as many as one-third. Although some are likely to be pseudogenes, there is no doubt that many of the genes are functional since there are large families of similar proteins in C. briggsae and dN/dS analysis shows that most of the protein sequence in both families is under strong purifying selection (data not shown). I carried out a preliminary evolutionary analysis of these two protein families based on the subset of predictions that align well with other members in the same family. Lists of proteins analyzed, schematics of protein structure, alignment, and dN/dS results are available in supplemental data 10 at http://www.genetics.org/supplemental/).
The MATH domain is
100 amino acids in length and is named for founding domain-containing members meprins and TRAF-C (UREN and VAUX 1996). The domain probably functions in protein-protein interactions (SUNNERHAGEN et al. 2002). C. elegans MATH-domain cluster-5 genes are predominantly of two sorts. In one type, nearly the entire protein is occupied by two or more repeats of the MATH domain. In the second type, there is a single N-terminal MATH domain followed by a BTB/POZ domain (ZOLLMAN et al. 1994). Like the MATH domain, the BTB domain is implicated in protein-protein interactions (BARDWELL and TREISMAN 1994). Recent evidence indicates that some MATH-BTB proteins function as adapters to target other proteins to the ubiquitin-mediated proteolysis pathway (FURUKAWA et al. 2003; PINTARD et al. 2003; XU et al. 2003; FIGUEROA et al. 2005). An alignment of the first MATH domain from a sampling of the two-domain proteins is shown at the top of Figure 4. A minority of MATH-domain gene predictions consist largely of a single MATH domain; this fact, coupled with the criteria for clustering (see MATERIALS AND METHODS), presumably explains why most MATH-domain genes were identified as members of the same merged clusters. Because of the paucity of confirmed gene structures, it is unclear whether there is real variability in the number of tandem MATH domains or whether the variability is an artifact of mispredicted genes or pseudogenes. Although neither the MATH nor the BTB domains are nematode specific, C. elegans has a hugely expanded number of MATH domains compared to other sequenced genomes (Pfam release 16 at http://www.sanger.ac.uk/Software/Pfam/). Analysis of codon alignments among closely related MATH-domain genes indicates that there is significant positive selection acting on specific sites. High dN/dS sites are concentrated largely in the MATH domain (supplemental data 10 at http://www.genetics.org/supplemetal/) and alignment with a solved MATH-domain protein structure suggests that the sites under positive selection are concentrated on one face of the domain in a region that interacts with one of its binding partners, CD40 (MCWHIRTER et al. 1999; data not shown).
The F-box domain is
40 amino acids long and in some cases is known to act as an adapter to target other proteins to the ubiquitin-mediated proteolysis pathway (e.g., BAI et al. 1996; SCHULMAN et al. 2000). In the C. elegans F-box-FTH family, the F-box domain occupies the N terminus followed by
250 amino acids called the FTH domain (CLIFFORD et al. 2000; NAYAK et al. 2005), which has no sequence relatives outside of nematodes. The entire protein aligns well among most members of the family in C. elegans, with the exceptions most likely being nonfunctional genes and gene prediction errors (supplemental data 10 at http://www.genetics.org/supplemetal/; data not shown). As with MATH-domain genes, analysis of codon alignments among closely related F-box genes shows clear indications of positive selection at specific sites in these proteins (supplemental data 10 at http://www.genetics.org/supplemental/). These sites are not in the F-box region and may cluster in specific regions in the rest of the protein. I speculate that F-box proteins in C. elegans function to target foreign proteins for proteolysis via binding sites in the regions under positive selection.
Chemosensory receptor families:
Members of multiple putative chemosensory receptor (SR) gene families are prominent contributors to gene clustering: 219 of the 1391 clusters contain genes in annotated SR families. These clusters range in size from 2 to 24 genes and contain a total of 1065 genes, including members of all previously described SR families. Extensive clustering of odorant, gustatory, and vomeronasal receptors is also found in vertebrates (e.g., DEL PUNTA et al. 2000; MATSUNAMI et al. 2000; GLUSMAN et al. 2001) and, to a lesser extent, in Drosophila (ROBERTSON et al. 2003), suggesting that local gene duplication and diversification is a phylogenetically conserved feature of chemoreceptor gene families. The specificity of the clustering algorithm in C. elegans is supported by the fact that each of many analyzed SR clusters contains genes from one specific SR family, despite the fact that most SR families have similar genome distributions and are often interspersed locally. One of the cluster families, with
75 predicted members, defines a new family in the SR superfamily. The new family is distantly related to the previously recognized srg, sru, srv, srh, and str SR families in C. elegans.
-Blast searches started from two proteins from the new family (persistent E-value cutoff 104) also suggest a very distant relationship to melatonin receptors and opsins. A composite hydropathy plot and protein tree of 29 putative full-length predictions from the new family are shown in Figure 7. As with other SR families, the new family is concentrated on chromosome V (57 of 74 genes). Of the 57 genes on chromosome V, 45 are located on the left arm (Figure 7), including four clusters of 5 or more genes, all of which were identified by the clustering algorithm. The new family has been assigned the C. elegans gene designation srt (J. HODGKIN, personal communication). A full annotation of the srt family was completed and submitted to WormBase; a list of all known SR proteins, including the new srt family, is in supplemental data 9 at http://www.genetics.org/supplemental/.
|
-thionin defensins (PF00304, FlyBase 2004 at http://flybase.bio.indiana.edu; UCSC Gene Sorter, May 2003 C. elegans data set; http://genome.ucsc.edu/), lysozymes (ROXSTROM-LINDQUIST et al. 2004), and serpin protease inhibitors (LEVASHINA et al. 1999). | DISCUSSION |
|---|
|
|
|---|
Gene clusters contain unusual genes:
By various measures, the gene families identified by the clustering algorithm are unusual. Perhaps the most marked of these is the preponderance of nematode-specific gene products. In contrast to clustered gene families, unclustered homologous families in C. elegans are dramatically different. Using the same family-finding algorithm without regard to genome position, the largest gene families (after removal of cluster-5 families) encoded protein kinases, ligand-gated ion channels, ras/rab family G proteins, two types of transposases, transmembrane tyrosine kinases, protein phosphatases, and phosphoesterases. All of these families are well known, none are nematode specific, and all are the subject of thousands of research articles. To the extent that I investigated, their genome distribution lacks the autosomal arm bias that characterizes nearly every cluster-5 family. Other distinctive features of cluster-5 genes, when compared to the rest of C. elegans genes, are listed in Table 7. These include a dramatically reduced frequency of assigned phenotypes in RNA interference tests of gene function, shorter introns, reduced expression levels by two measures, and increased divergence from their closest predicted C. briggsae relative. All of these features, except shorter introns, are readily rationalized on the basis of functional redundancy and higher rates of evolution for cluster genes. Shorter introns may be an indirect result of the fact that genes with very low expression levels have a smaller average intron size (data not shown).
|
Nuclear hormone receptors:
The genome distribution of nuclear hormone receptor (nhr) genes is particularly telling because it includes both phylogenetically conserved genes and a large expanded family of nematode-specific relatives (SLUDER et al. 1999). C. elegans possesses
30 nhr genes that belong to families with broad phylogenetic representation, including members of five of the six recognized chordate nhr families (SLUDER and MAINA 2001). The phylogenetically conserved subset of the nhr genes is distributed widely in the genome, with no obvious chromosome or arm bias. There is also a large expansion of nhr genes in C. elegans, including >200 that define expanded nematode-specific families. Presumably, these genes duplicated and diversified during nematode evolution. The phylogenetic tree for these genes indicates that these duplications occurred over a long period with no obvious indication of temporal clustering. The expanded nhr genes are distributed very nonuniformly in the genome, with 149 of 197 tested genes residing on chromosome V and with a strong bias toward clusters on autosomal arms. I speculate that this segment of the nhr family is specialized for transcriptional response to environmental challenges and that the genes duplicate and diversify on chromosomal arms in concert with this selection. One member of the nhr family, nhr-8, is experimentally implicated in xenobiotic response (LINDBLOM et al. 2001); however, this gene is not in a homology cluster and appears to be a member of the phylogenetically conserved class of C. elegans nhr genes. I speculate that some of these genes also participate in environmental responses.
Operons and homology clusters:
Genes that are found in large homology clusters tend not to be found in operons and vice versa. Why is this true? I speculate that operons, in their role as transcriptional regulatory units, tend to group genes that are unrelated in sequence but that function together in shared processes (as suggested by BLUMENTHAL et al. 2002). In contrast, homology clusters exist as a consequence of evolutionary patterns of duplication and divergence rather than shared transcriptional regulation. Evolutionary theory indicates that duplicate genes that persist over time must acquire at least partially distinct functions to permit natural selection to act in retaining both gene copies (OHNO et al. 1968). It is likely that some of the functional distinctness acquired by such duplicate genes occurs at the transcriptional level, for example, when each of the duplicates is expressed in a subset of the tissues that expressed their ancestor. This mechanism of divergence is unlikely to be consistent with the duplicates residing in the same operon. In addition, the duplications that give rise to gene clusters might disrupt operon structure, favoring persistence of genes with their own promoters. The genomic distribution of operons is also different from homologous gene clusters; operons are relatively evenly distributed by chromosome (although reduced on the X chromosome) and they are less common on autosomal arms where most homology clusters reside (BLUMENTHAL et al. 2002; data not shown).
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
| LITERATURE CITED |
|---|
|
|
|---|
ADAMS, M. D., S. E. CELNIKER, R. A. HOLT, C. A. EVANS, J. D. GOCAYNE et al., 2000 The genome sequence of Drosophila melanogaster. Science 287: 21852195.
AKAM, M., 1989 Hox and HOM: homologous gene clusters in insects and vertebrates. Cell 57: 347349.[Medline]
BAI, C., P. SEN, K. HOFMANN, L. MA, M. GOEBL et al., 1996 SKP1 connects cell cycle regulators to the ubiquitin proteolysis machinery through a novel motif, the F-box. Cell 86: 263274.[CrossRef][Medline]
BARDWELL, V. J., and R. TREISMAN, 1994 The POZ domain: a conserved protein-protein interaction motif. Genes Dev. 8: 16641677.
BARNES, T. M., Y. KOHARA, A. COULSON and S. HEKIMI, 1995 Meiotic recombination, noncoding DNA and genomic organization in Caenorhabditis elegans. Genetics 141: 159179.[Abstract]
BENDTSEN, J., H. NIELSEN, G. VON HEIJNE and S. BRUNAK, 2004 Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 340: 783795.[CrossRef][Medline]
BLUMENTHAL, T., D. EVANS, C. D. LINK, A. GUFFANTI, D. LAWSON et al., 2002 A global analysis of Caenorhabditis elegans operons. Nature 417: 851854.[CrossRef][Medline]
BOMAN, H. G., 2003 Antibacterial peptides: basic facts and emerging concepts. J. Intern. Med. 254: 197215.[CrossRef][Medline]
CAMPBELL, A. M., P. H. TEESDALE-SPITTLE, J. BARRETT, E. LIEBAU, J. R. JEFFERIES et al., 2001 A common class of nematode glutathione S-transferase (GST) revealed by the theoretical proteome of the model organism Caenorhabditis elegans. Comp. Biochem. Physiol. B Biochem. Mol. Biol. 128: 701708.[CrossRef][Medline]
C. ELEGANS SEQUENCING CONSORTIUM, 1998 Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282: 20122018.
CHEN, N., S. PAI, Z. ZHAO, A. MAH, R. NEWBURY et al., 2005 Identification of a nematode chemosensory gene family. Proc. Natl. Acad. Sci. USA 102: 146151.
CHOI, S. S., and B. T. LAHN, 2003 Adaptive evolution of MRG, a neuron-specific gene family implicated in nociception. Genome Res. 13: 22522259.
CLIFFORD, R., M. H. LEE, S. NAYAK, M. OHMACHI, F. GIORGINI et al., 2000 FOG-2, a novel F-box containing protein, associates with the GLD-1 RNA binding protein and directs male sex determination in the C. elegans hermaphrodite germline. Development 127: 52655276.[Abstract]
COUILLAULT, C., N. PUJOL, J. REBOUL, L. SABATIER, J. F. GUICHOU et al., 2004 TLR-independent control of innate immunity in Caenorhabditis elegans by the TIR domain adaptor protein TIR-1, an ortholog of human SARM. Nat. Immunol. 5: 488494.[CrossRef][Medline]
DEL PUNTA, K., A. ROTHMAN, I. RODRIGUEZ and P. MOMBAERTS, 2000 Sequence diversity and genomic organization of vomeronasal receptor genes in the mouse. Genome Res. 10: 19581967.
DRICKAMER, K., and R. B. DODD, 1999 C-type lectin-like domains in Caenorhabditis elegans: predictions from the complete genome sequence. Glycobiology 9: 13571369.
FELSENSTEIN, J., 1993 PHYLIP (Phylogeny Inference Package), Version 3.6a2. Department of Genome Sciences, University of Washington, Seattle.
FIGUEROA, P., G. GUSMAROLI, G. SERINO, J. HABASHI, L. MA et al., 2005 Arabidopsis has two redundant Cullin3 proteins that are essential for embryo development and that interact with RBX1 and BTB proteins to form multisubunit E3 ubiquitin ligase complexes in vivo. Plant Cell 17: 11801195.
FRASER, A. G., R. S. KAMATH, P. ZIPPERLEN, M. MARTINEZ-CAMPOS, M. SOHRMANN et al., 2000 Functional genomic analysis of C. elegans chromosome I by systematic RNA interference. Nature 408: 325330.[CrossRef][Medline]
FRITSCH, E. F., R. M. LAWN and T. MANIATIS, 1980 Molecular cloning and characterization of the human beta-like globin gene cluster. Cell 19: 959972.[CrossRef][Medline]
FURUKAWA, M., Y. J. HE, C. BORCHERS and Y. XIONG, 2003 Targeting of protein ubiquitination by BTB-Cullin 3-Roc1 ubiquitin ligases. Nat. Cell Biol. 5: 10011007.[CrossRef][Medline]
GLUSMAN, G., I. YANAI, I. RUBIN and D. LANCET, 2001 The complete human olfactory subgenome. Genome Res 11: 685702.
GOTOH, O., 1998 Divergent structures of Caenorhabditis elegans cytochrome P450 genes suggest the frequent loss and gain of introns during the evolution of nematodes. Mol. Biol. Evol. 15: 14471459.
HILL, A. A., C. P. HUNTER, B. T. TSUNG, G. TUCKER-KELLOGG and E. L. BROWN, 2000 Genomic analysis of gene expression in C. elegans. Science 290: 809812.
HOFKER, M. H., M. A. WALTER and D. W. COX, 1989 Complete physical map of the human immunoglobulin heavy chain constant region gene complex. Proc. Natl. Acad. Sci. USA 86: 55675571.
KALLBERG, Y., U. OPPERMANN, H. JORNVALL and B. PERSSON, 2002 Short-chain dehydrogenase/reductase (SDR) relationships: a large family with eight clusters common to human, animal, and plant genomes. Protein Sci. 11: 636641.
KAMEI, N., W. J. SWANSON and C. G. GLABE, 2000 A rapidly diverging EGF protein regulates species-specific signal transduction in early sea urchin development. Dev. Biol. 225: 267276.[CrossRef][Medline]
KATJU, V., and M. LYNCH, 2003 The structure and early evolution of recently arisen gene duplicates in the Caenorhabditis elegans genome. Genetics 165: 17931803.
KATO, Y., T. AIZAWA, H. HOSHINO, K. KAWANO, K. NITTA et al., 2002 abf-1 and abf-2, ASABF-type antimicrobial peptide genes in Caenorhabditis elegans. Biochem. J. 361: 221230.[CrossRef][Medline]
KYTE, J., and R. F. DOOLITTLE, 1982 A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157: 105132.[CrossRef][Medline]
LEAH, R., H. TOMMERUP, I. SVENDSEN and J. MUNDY, 1991 Biochemical and molecular characterization of three barley seed proteins with antifungal properties. J. Biol. Chem. 266: 15641573.
LEIERS, B., A. KAMPKOTTER, C. G. GREVELDING, C. D. LINK, T. E. JOHNSON et al., 2003 A stress-responsive glutathione S-transferase confers resistance to oxidative stress in Caenorhabditis elegans. Free Radic. Biol. Med. 34: 14051415.[CrossRef][Medline]
LERCHER, M. J., T. BLUMENTHAL and L. D. HURST, 2003 Coexpression of neighboring genes in Caenorhabditis elegans is mostly due to operons and duplicate genes. Genome Res. 13: 238243.
LEVASHINA, E. A., E. LANGLEY, C. GREEN, D. GUBB, M. ASHBURNER et al., 1999 Constitutive activation of toll-mediated antifungal defense in serpin-deficient Drosophila. Science 285: 19171919.
LIGOXYGAKIS, P., N. PELTE, J. A. HOFFMANN and J. M. REICHHART, 2002 Activation of Drosophila Toll during fungal infection by a blood serine protease. Science 297: 114116.
LINDBLOM, T. H., G. J. PIERCE and A. E. SLUDER, 2001 A C. elegans orphan nuclear receptor contributes to xenobiotic resistance. Curr. Biol. 11: 864868.[CrossRef][Medline]
MAGLICH, J. M., A. SLUDER, X. GUAN, Y. SHI, D. D. MCKEE et al., 2001 Comparison of complete nuclear receptor sets from the human, Caenorhabditis elegans and Drosophila genomes. Genome Biol. 2: RESEARCH0029.
MALLO, G. V., C. L. KURZ, C. COUILLAULT, N. PUJOL, S. GRANJEAUD et al., 2002 Inducible antibacterial defense system in C. elegans. Curr. Biol. 12: 12091214.[CrossRef][Medline]