Genetics, Vol. 158, 1311-1320, July 2001, Copyright © 2001

Functional Divergence in the Caspase Gene Family and Altered Functional Constraints: Statistical Analysis and Prediction

Yufeng Wanga and Xun Gua
a Department of Zoology and Genetics, Center for Bioinformatics and Biological Statistics, Iowa State University, Ames, Iowa 50011

Corresponding author: Xun Gu, Department of Zoology and Genetics, Center for Bioinformatics and Biological Statistics, 332 Science II Hall, Iowa State University, Ames, IA 50011., xgu{at}iastate.edu (E-mail)

Communicating editor: C.-I WU


*  ABSTRACT
*TOP
*ABSTRACT
*METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

In this article, we explore the pattern of type I functional divergence (i.e., altered functional constraints or site-specific rate difference) in the caspase gene family that is important for apoptosis (programmed cell death) and cytokine maturation. By taking advantage of substantial experimental data from caspases, the functional/structural basis of our posterior predictions from sequence analysis was extensively studied. Our results are as follows: (1) Phylogenetic analysis shows that the evolution of major caspase-mediated pathways has been facilitated by gene duplications, (2) type I functional divergence (altered functional constraints) is statistically significant between two major subfamilies, CED-3 and ICE, (3) 4 of 21 predicted amino acid residues (for site-specific rate difference between CED-3 and ICE) have been verified by experimental evidence, and (4) we found that some CED-3 caspases may inherit more ancestral functions, whereas other members may employ some recently derived functions. Our approach can be cost effective in functional genomics to make statistically sound predictions from amino acid sequences.


GENE family proliferation provides the raw material for functional innovation in higher eukaryotes. After gene duplication, the classical model (OHNO 1970 Down) suggests that one gene copy maintains the original function, while the other copy is free to accumulate amino acid changes toward functional divergence. Since then, many specific models have been proposed (e.g., LI 1983 Down; CLARK 1994 Down; FORCE et al. 1999 Down). However, the details of functional divergence between duplicate genes remain largely unexplored. GU 1999 Down developed a method to detect amino acid residues that contribute to functional divergence after gene duplication, which can be considered as candidates for further experimentation. Certainly, its effectiveness for functional genomics needs to be verified by using gene families with substantial biological/structural information.

Apoptosis, or programmed cell death, is an ordered process in which cells commit suicide when they are not needed or are potentially harmful. The key component in the apoptotic machinery is a cascade of cysteine aspartyl proteases (caspases). All caspases, which are initially inactive proenzymes, share the same processing scheme to achieve mature forms after cleavage(s) at specific Asp sites (KUMAR 1995 Down; THORNBERRY and LAZEBNIK 1998 Down). To date, at least 14 members of the caspase gene family have been identified in mammals, which can be further classified into two major subfamilies, CED-3 (including caspase-2, -3, -6, -7, -8, -9, -10, and -14) and ICE (including caspase-1, -4, -5, -11, -12, and -13; NICHOLSON and THORNBERRY 1997 Down). Substantial evidence has shown that the CED-3-type caspases are essential for most apoptotic pathways (YUAN et al. 1993 Down; KUIDA et al. 1996 Down). In contrast, the major function of the ICE-type caspases is to mediate immune response, although some members may play a role in cell death in some circumstances (YUAN and HORVITZ 1990 Down; WANG et al. 1998 Down). X-ray crystallography has also shown a significant structural difference between these two types of caspases (e.g., WILSON et al. 1994 Down; ROTONDA et al. 1996 Down).

In this article, we take advantage of experimental evidence of caspases to study the functional-structural basis of statistical predictions from GU's (1999) method. We statistically evaluate the functional divergence between CED-3 and ICE subfamilies and then show that our predictions are consistent with the observations from structural or functional assay. Our analysis shows the potential of evolutionary analysis for functional genomics.


*  METHODS
*TOP
*ABSTRACT
*METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

The data set:
We conducted an exhaustive search (e.g., the gapped BLAST and PSI-BLAST) in several major databases to find all available sequences that are homologous to the Caenorhabditis elegans CED-3 gene. After synthetic peptides, expressed sequence tags, partial sequences, and redundant sequences were removed, the final data set includes 42 CED-3 homologous sequences, whose accession numbers are listed in the Fig 3 legend.



View larger version (39K):
In this window
In a new window
Download PPT slide
 
Figure 1. (A) Classification of amino acid configurations for two duplicate gene clusters. Type 0 sites are universally conserved through the whole gene family. Type I sites are very conserved in one cluster but highly variable in the other. Type II sites are very conserved in both clusters but with very different biochemical properties. Type U sites are unclassifiable. (B) A diagram shows the stochastic nature of molecular evolution. Each site (represented as a box) has a nonzero probability for any type of amino acid configuration. At site 1 or 2, no altered functional constraint occurs in either cluster, a status defined as S0 = (F0, F0). At site 3, 4, or 5, altered functional constraint occurs in at least one cluster, a status defined as S0 = (F1, F0) or (F0, F1) or (F1, F1) (see METHODS for details). (C) A flow chart to illustrate GU's (1999) method.



View larger version (39K):
In this window
In a new window
Download PPT slide
 
Figure 2. The phylogenetic tree of the caspase gene family, inferred by the neighbor-joining method on the basis of the amino acid sequence with Poisson correction. Bootstrap values >50% are presented. Initiator caspases (I-casps) are involved in upstream regulatory events, and effector caspases (E-casps) directly lead to cell disassembly. The accession numbers for protein sequences are (1) casp-3, U13737 (human 3-{alpha}), U13738 (human 3-ß), U49930 (rat 3-{alpha}), U58656 (rat 3-ß), Y13086 (mouse), U27463 (hamster), AF083029 (chicken), D89784 (frog); (2) casp-7, U37448 (human), Y13088 (mouse), AF072124 (rat), U47332 (hamster); (3) casp-6, U20536 (human), AF025670 (rat), Y13087 (mouse), AF082329 (chicken); (4) casp-8, AF102146 (human), AF067841 (mouse); (5) casp-10, U60519 (human 10a), U86214 (human 10/b), AF111345 (human 10/d); (6) casp-9, U60521 (human); (7) casp-2, U13021 (human), U77933 (rat), Y13085 (mouse), U64963 (chicken); (8) casp-14, AF097874 (human), AJ007750 (mouse); (9) casp-1, X65019 (human), AF090119 (horse), L28095 (mouse), U14647 (rat), D89783 (frog ICE-A), D89785 (frog ICE-B); (10) casp-4, Z48810 S78281 (human); (11) casp-5, X94993 (human); (12) casp-13, AF078533 (human); (13) casp-11, Y13089 (mouse); (14) casp-12, Y13090 (mouse); (15) invertebrate caspase, P42573 (C. elegans CED-3), Y12261 (Drosophila melanogaster), U81510 (armyworm, Spodoptera frugiperda).



View larger version (18K):
In this window
In a new window
Download PPT slide
 
Figure 3. A schematic of evolution of caspase-mediated pathways. Note that the ancestral function of caspases (as well as the origin of ICE-type caspases) is uncertain. A–C correspond to ancestral nodes in Fig 1. Bcl-2/Apaf, BCR, death receptors (DRs), TNFR1, and CD95 are death signals for specific apoptotic pathways. Caspase-3/-6/-7 are effector caspases (E-casps), which are the real killer proteins in programmed cell death.

Multiple alignment and phylogenetic analysis:
The multiple alignment of 42 caspase amino acid sequences was obtained by the program CLUSTALX (THOMPSON et al. 1997 Down), followed by manual editing according to the structure information (NICHOLSON and THORNBERRY 1997 Down). A phylogenetic tree was inferred by the neighbor-joining method (SAITOU and NEI 1987 Down) using MEGA2.0 (http://www.megasoftware.net/). PAUP4.0 and PHYLIP were used to examine whether the inferred phylogeny is sensitive to any tree-making method. To evaluate the intensity of functional constraints in each caspase, we calculated the ratio of nonsynonymous to synonymous rates between human/mouse orthologs using LI 1993 Down and modified Nei and Gojobori (in MEGA2.0) methods.

Type I functional divergence (altered functional constraint) analysis:
Types of amino acid configurations: Consider a multiple alignment of a gene family with two homologous genes A and B (Fig 1A). Although different classifications were put forward (e.g., LIVINGSTONE and BARTON 1996 Down), we adopt the following schemes: (i) Type 0 represents amino acid configurations that are universally conserved through the whole gene family, implying that these residues may be important for the common function shared by all member genes, (ii) type I represents amino acid configurations that are very conserved in gene A but highly variable in gene B, or vice versa, implying that these residues may have experienced altered functional constraints resulting in site-specific rate difference, (iii) type II represents amino acid configurations that are very conserved in both genes but their biochemical properties are very different, e.g., charge positive vs. negative, implying that these residues may be responsible for functional specification in the different subfamilies, and (iv) amino acid configurations at many residues are not so clear-cut that they have to be regarded as unclassified (type U).

Several algorithms were proposed to define these types of amino acid configurations automatically (e.g., CASARI et al. 1995 Down; LICHTARGE et al. 1996 Down; LANDGRAF et al. 1999 Down). However, these methods are subject to various problems, e.g., negligence of phylogenetic tree, unclear statistical basis, or arbitrary cutoff for classification. To deal with these problems, a statistical model is needed.

Functional divergence and altered functional constraint: After gene duplication, two duplicates can undergo substantial functional divergence. It seems that only a small portion of residues are involved in functional divergence (GOLDING and DEAN 1998 Down). The trajectories of differentiation can affect the evolutionary pattern of the gene family divergence in several ways. According to GU 1999 Down, type I functional divergence refers to the evolutionary process that results in altered functional constraints (or site-specific rate difference) between two duplicate genes, regardless of the underlying evolutionary mechanisms. Intuitively, type I amino acid configuration is likely observed at a residue with different evolutionary rates between duplicate genes. However, because of the stochastic nature of molecular evolution, each site, no matter whether it is related to functional divergence, has a nonzero probability of becoming any type of amino acid configuration (Fig 1B). Therefore, instead of classifying ad hoc type I amino acid configuration, GU's (1999) method is to compute the (posterior) probability of type I functional divergence for each amino acid site. Type II functional divergence can be defined in the same manner (results not shown).

Statistical modeling for type I functional divergence: It is conceptually convenient to use the ancestral gene (before duplication) as a reference. For each duplicate gene cluster, the evolutionary rate at a site may differ from the ancestral gene, which is called the F1 site (functional divergence related); otherwise it is called F0 site (functional divergence unrelated). As shown in Fig 1B, different evolutionary rates between duplicate genes are expected only when a site is F1 in at least one cluster (e.g., sites 3, 4, and 5), a status denoted by S1. The coefficient of type I functional divergence ({theta}) between two gene clusters is defined as the probability of a site being status S1, i.e., {theta} = P(S1). The alternative status is S0, which means a site being F0 in both clusters (i.e., the evolutionary rate of each duplicate gene is the same as the ancestral gene, e.g., sites 1 and 2 in Fig 1B). Obviously, P(S0) = 1 - {theta}. The null hypothesis is {theta} = 0, which means that the evolutionary rate is virtually the same between duplicate genes (as well as the ancestral gene) at each site. In this case, the model is reduced to the conventional rate variation among sites (e.g., GU and ZHANG 1997 Down).

Let {lambda}A and {lambda}B be the evolutionary rates of a site in clusters A and B, respectively, which vary among sites. For a site being F0 in both clusters (status S0) with a probability of 1 - {theta}, we can assume {lambda}A = {lambda}B without loss of generality. However, for a site being S1 (i.e., being F1 in at least one cluster) with a probability of {theta}, we have {lambda}A != {lambda}B. To avoid too many parameters, GU 1999 Down made the following simplification: Under S1, although {lambda}A > {lambda}B at some sites or vice versa at others, over all sites {lambda}A and {lambda}B are statistically independent. Fig 1C outlines the statistical procedure on how to estimate {theta} from sequences.

Prediction of critical amino acid residues:
If {theta} > 0 significantly, it provides statistical evidence that type I functional divergence (site-specific rate difference) may have occurred after gene duplication. If so, it is of interest to predict which residues are responsible, which can be achieved by posterior analysis (Fig 1C). Let P(S1|X) be the posterior probability of a site being S1 when the amino acid configuration (X) is observed. Since the alternative status S0, with posterior probability P(S0|X) = 1 - P(S1|X), means no altered functional constraint, the predicted residues are meaningful only when P(S1|X) > 0.5 such that the posterior odd ratio R() = > 1. A more stringent cutoff may be P(S1|X) > 0.67 or R(S1/S0) > 2.

Cluster-specific type I functional divergence: functional distance analysis:
The two-cluster analysis described above cannot tell in which gene cluster the altered functional constraint took place after gene duplication. This problem can be solved by a simple method when at least three homologous gene clusters are available. For any cluster i, let {theta}i = Pi(F1) be the probability of a site having a different rate from the ancestral gene, and Pi(F0) = 1 - {theta}i be the probability of having the same rate. Consider two clusters i and j in which the coefficient of type I functional divergence is denoted by {theta}ij = Pij(S1) = 1 - Pij(S0). If a site being F1 or F0 is independent between clusters, we have the relation Pij(S0) = Pi(F0) x Pj(F0) or 1 - {theta}ij = (1 - {theta}i)(1 - {theta}j). Therefore, we define type I functional distance between clusters i and j as dF(i, j) = -ln(1 - {theta}ij) and functional branch length for cluster i or j as bF(i) = -ln(1 - {theta}i) and bF(j) = -ln (1 - {theta}j), respectively. Obviously, dF(i, j) is additive, i.e.,

(1)

When the coefficient of type I functional divergence ({theta}ij) for each pair of clusters is estimated, the matrix of dF(i, j) can be computed easily. Then, a standard least-squares method is implemented on the basis of Equation 1 for estimating all bF's. A large bF indicates substantial altered functional constraints in this gene cluster, while bF = 0 indicates that the evolutionary rate of each site in this duplicate gene is almost identical to the ancestral gene. In other words, a duplicate gene cluster with bF = 0 may contain a larger component of ancestral function compared to other gene clusters.


*  RESULTS
*TOP
*ABSTRACT
*METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Evolution of caspase-mediated molecular pathways:
The phylogenetic tree: The evolutionary tree (Fig 2) of the caspase gene family was inferred by the neighbor-joining (NJ) method (SAITOU and NEI 1987 Down). The parsimony (PAUP4.0) and likelihood (PHYLIP) methods give virtually the same topology (data not shown). The presence of caspases in vertebrates, arthropods, and nematodes suggests that the emergence of the caspase gene family might be close to or even earlier than the origin of the animal kingdom.

Although ARAVIND et al. 1999 Down suggested that caspase may evolve from an ancient protease supergene family, the root of the inferred tree (Fig 2) remains unclear. The evolutionary pattern of caspases can be generally described as follows. On the basin of the tree (see A in Fig 3), there were at least four duplication events that had occurred during a very short time period, resulting in five major lineages: (i) the ICE subfamily, consisting of caspase-1, -4, -5, -13, -11, and -12; (ii) caspase-14; (iii) caspase-2; (iv) caspase-9; and (v) the common ancestor of caspase-8/-10 and caspase-3/-6/-7. In addition, the effector caspases (E-casp-3/-7/-6) and the ancestor of caspase-8 and -10 were generated before the emergence of arthropods. Interestingly, in contrast to the major (ancient) lineages in CED-3-type caspases, ICE-type caspases diversified recently after the divergence of amphibians and mammals, and some of them (e.g., caspase-4 and -5) arose even after the mammalian radiation.

Evolutionary innovations of the caspase-mediated apoptosis pathway by gene duplications: To understand the origin of different caspase-mediated biochemical pathways in apoptosis, we compared the evolutionary relationship of (CED-3-type) caspases with apoptotic pathways (Fig 3). Our major finding is that major evolutionary lineages of caspases may coincide with different caspase-mediated apoptotic pathways triggered by specific death signals. That is, (i) caspase-9 is a key component in the mitochondrial initiated pathway, which is initiated by the intracellular stimuli, upstream Bcl-2, and Apaf-1 proteins (BUDIHARDJO et al. 1999 Down); (ii) caspase-2 initiates the apoptosis induced by negative signaling after B cell Ag receptor (BCR) ligation (CHEN et al. 1999 Down); (iii) apoptoses mediated by caspase-8 and -10 are similar, both initiated by responding to the death receptors (DRs), which contain the death effector domain (DED); and (iv) uniquely, caspase-14 is not processed by any known death stimuli (VAN DE CRAEN et al. 1998 Down). In summary, since ancient origins, these caspases may evolve through different avenues and provide cells with a potential to initiate apoptosis in response to a variety of intracellular or intercellular stimuli.

Interestingly, although upstream initiator caspases (I-casps, e.g., casp-2, -9, -8/-10) are recruited by different receptors under different physiological or pathological stimuli, they all eventually catalyze the same set of downstream effector caspases (caspase-3, -6, -7), which are the real killers that commit the cell suicide (Fig 3). Our results suggest that (1) gene duplication followed by functional divergence is one major mechanism to generate the complexity of the apoptotic network and (2) such a process is constrained by coordinated regulation. Indeed, in the last step, effector caspases as real killers remain unchanged when more initial death signals are continuously recruited at different levels during the evolution of apoptotic pathways.

Predicting critical residues for type I functional divergence (altered functional constraints) between CED-3 and ICE subfamilies:
We estimated that the coefficient of functional divergence between ICE and CED-3 subfamilies is {theta} = 0.29 ± 0.05 [the ML option in GU's (1999) method], implying that the altered functional constraint between them is statistically significant. Further, we use the posterior probability P(S1|X) to predict critical amino acid residues responsible for type I functional divergence (site-specific rate difference) between CED-3 and ICE subfamilies (Fig 1C). The baseline of the site-specific profile measured by P(S1|X) is ~0.2–0.3 (Fig 4A). Thirty-two sites (16% of total sites) have P(S1|X) > 0.5. The fact that most sites have scores <50% indicates their similar functional roles between CED-3 and ICE.



View larger version (45K):
In this window
In a new window
Download PPT slide
 
Figure 4. (A) The site-specific profile for predicting critical amino acid residues responsible for the functional divergence between CED-3 and the ICE subfamilies, measured by the posterior probability of being functionally divergence related at each site [P(S1|X)]. The arrows point to four amino acid residues at which functional divergence between two subfamilies has been verified by experimentation. (B) Four predicted sites that have been verified by experimentation.

Although posterior analysis is widely used in bioinformatics, the cutoff value for residue selection is usually empirical. We found that when the first 21 highest-scored residues are removed from the multiple alignment, the estimate of {theta} is virtually 0. These 21 amino acid residues (among 198 residues) corresponding to the cutoff value P(S1|X) > 0.61 are then chosen for further analysis. Of course, this procedure is meaningful only when {theta} > 0 significantly.

The functional-structural basis of altered functional constraints:
We mapped these 21 predicted sites onto the 3-D structure of caspases. The resolved X-ray crystal structures of human caspase-1 and -3 (WILSON et al. 1994 Down; ROTONDA et al. 1996 Down) were used to illustrate the structural features of ICE and CED-3 subfamilies, respectively. From the literature, we found experimental evidence for four predicted residues that are involved in the functional-structural divergence between CED-3 and ICE subfamilies (Fig 4B):

  1. Residue 161(348) (In the literature, this site is numbered as W348, according to the protein sequence of human caspase-1) is critical for CED-3 caspase substrate specificity by interacting with a unique surface loop in 3-D structure [P(S1|X) = 0.999] (ROTONDA et al. 1996 Down). At this position, all 22 sequences from the CED-3 subfamily contain an invariant tryptophan (W), whereas a variety of residues are present in the ICE subfamily (Fig 5). Crystal structural analysis reveals that W348 is a key determinant for the caspase-3 (CED-3)-type specificity. First, W348 forms a narrow pocket with the surface loop that is highly conserved in the CED-3 subfamily; see the boxed region in Fig 5. The steric constriction due to this pocket determines the preference of caspase-3 to the substrates with small hydrophilic side chains. Second, W348 along with a group of residues forms a hydrogen bond network, which affects the interaction with the substrate. In contrast, the surface loop shared with CED-3 caspases seems to be deleted in all ICE-type caspases, as shown in the boxed region in Fig 5. Hence, the relaxed evolutionary constraint observed at this position in the ICE subfamily is likely to be caused by the 3-D structural difference.



    View larger version (121K):
    In this window
    In a new window
    Download PPT slide
     
    Figure 5. Alignment of predicted regions of caspases. Four predicted sites with experimental evidence are highlighted. The sites with asterisks are predicted residues within this region. The boxed region in the C terminus is the critical region for CED-3 substrate specificity: Most CED-3-type caspases form a surface loop, whereas a shallow depression is found in ICE-type caspases.

  2. Residues 86 [P(S1|X) = 0.75] and 88[P(S1|X) = 0.74] are responsible for 3-D difference with an unknown functional role. Indeed, in human caspase-1 (ICE), these two residues appear to lie in a small loop that is not found in the CED-3 subfamily.

  3. Residue 131 [P(S1|X) = 0.866] is proteolytic site specific to the ICE subfamily. All caspases are synthesized as inactive proenzymes that need to be processed to the mature forms (NICHOLSON et al. 1995 Down). However, distinct cleavage sites within the precursors are found for two subfamilies. D131 is known as a cleavage site in human caspase-1 (ICE type; THORNBERRY et al. 1992 Down). All ICE-type caspases preserve an Asp (D) at this position, except for mouse caspase-12 (Asn, E). However, human caspase-3 (CED-3 type) utilizes two other Asn sites for cleavage (ROTONDA et al. 1996 Down) so that the functional role of position 131 in CED-3 caspases is no longer important. Therefore, the altered evolutionary constraints at this position can be well explained by the different utilization of cleavage sites for the precursor processing between CED-3 and ICE subfamilies.

Pattern of type I functional divergence among CED-3-type caspases:
The CED-3 subfamily consists of a specific group of caspases that mediate the programmed cell death in a well-regulated proteolytic cascade and employ related but distinct functions. Here we address an interesting problem, i.e., to infer the trend of altered functional constraint of each cluster.

We study five gene clusters: caspase-3, -7, -6, -8/-10, and -2. Due to insufficient data, caspase-9 was excluded, and caspase-8 and -10 are grouped for their closely related function (FERNANDES-ALNEMRI et al. 1996 Down). The upper diagonal of Table 1 shows pairwise coefficients of type I functional divergence ({theta}) between them; all of them are significantly >0 (P < 0.05), with only one exception; i.e., {theta} = 0.006 between caspase-7 and cluster-8/-10.


 
View this table:
In this window
In a new window

 
Table 1. {theta} values and dF values from pairwise comparisons in the CED-3 subfamily

To explore the pattern of type I functional divergence in each cluster, we performed functional distance analysis (see METHODS). The pairwise functional distances (dF) between clusters are shown in the lower diagonal of Table 1. The star-like tree presented in Fig 6 shows the type I functional branch length (bF) of each cluster, estimated by the least-squares method. The null hypothesis of equal bF value for each cluster was statistically rejected (P < 0.05).



View larger version (19K):
In this window
In a new window
Download PPT slide
 
Figure 6. (A) A star-like topology of the CED-3 caspases in terms of type I functional branch length bF. Biological evidence of functional specification for each caspase cluster is shown in the stacked boxes. (B) Functional branch length (bF) and the ratio of nonsynonymous to synonymous rates (dN/dS) for each gene cluster, which were computed by using human-mouse sequences.

Long functional branch lengths (bF) of caspase-3, -6, and -2 suggest that these genes may have undergone extensive altered functional constraints as a result of specialized functional roles in apoptosis (Fig 6). Supportive experimental evidence is summarized as follows: (i) The nonredundant functional role of caspase-3 in neurological apoptosis is confirmed by caspase-3 -/- knockout mice (KUIDA et al. 1996 Down), (ii) caspase-6 and -3 have different substrate specificity, but both participate in the protease amplification cycle by activating each other, which triggers a series of apoptotic interactions (LAZEBNIK et al. 1995 Down; SRINIVASULA et al. 1996 Down), and (iii) caspase-2 has its unique dual-role position in positive and negative regulation in apoptosis by differential expression of two alternative splicing isoforms (2L and -2S; WANG et al. 1994 Down). This dual-role property is also confirmed by knockout mice: Caspase-2 deficiency causes one defective apoptotic pathway (mediated by granzyme B and perforin) but accelerates another pathway (cell death of motor neurons; BERGERON et al. 1998 Down).

In contrast, virtually zero bF values of caspase-7 and -8/-10 indicate that the evolutionary rate of each site in these genes is almost identical to that of the ancestral gene. In this regard, these caspases may inherit a large component of ancestral function during caspase gene family evolution.

For each duplicate gene, the average intensity of functional constraints can be approximately measured by the dN/dS ratio between appropriate orthologous sequences (e.g., human-mouse). Interestingly, caspase-3, -6, and -2 (long bF) have lower dN/dS ratios than caspase-7 and -8/-10 (zero bF), indicating that type I functional divergence in caspases may result in a stronger functional constraint (Fig 6B).


*  DISCUSSION
*TOP
*ABSTRACT
*METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

The significance of this functional divergence study is twofold: First, we showed that altered functional constraint after gene duplication may play an important role in evolutionary novelties after gene duplication. Second, the site-specific profile based on posterior analysis is useful not only for understanding the functional-structural basis of protein family evolution but also for designing a cost-effective approach in functional genomics, e.g., the strategy for a large-scale mutagenesis.

Predicted sites for type I functional divergence (site-specific rate difference) without evidence could be either lacking experimental data or due to statistical artifacts (e.g., cutoff value). On the other hand, experimentally verified critical sites that were missed by our analysis may indicate other types of functional divergence (e.g., type II). Clearly, the accuracy of our prediction depends on how strong the association is between functional divergence and site-specific rate difference. To avoid overinterpretation, we should adopt the posterior-based analysis (site-specific profile) in practice only when {theta} > 0 significantly, and the cutoff value should be weighted by other biological information.

Many other methods are available for functional prediction from molecular evolutionary analysis (e.g., see GOLDING and DEAN 1998 Down as a review; POLLOCK et al. 1999 Down; SUZUKI and GOJOBORI 1999 Down; NAYLOR and GERSTEIN 2000 Down; DERMITZAKIS and CLARK 2001 Down; GAUCHER et al. 2001 Down; GU 2001 Down). For example, the method of SUZUKI and GOJOBORI 1999 Down for detecting positive selection on single sites could make an effective prediction when natural selection is the major force for functional diversity. However, its application is unfeasible for many ancient gene families (e.g., WANG and GU 2000 Down, as synonymous distance is saturated). Since all these approaches have their own limitations but complement in some aspects, appropriate combination is strongly recommended.

In many models (e.g., LI 1983 Down; CLARK 1994 Down; FORCE et al. 1999 Down), ancestral function of a gene family is conceptual rather than measurable. Functional distance analysis provides a quantitative measure for the altered functional constraints between the ancestral gene and one duplicate gene. This evolutionary measure has raised an interesting hypothesis that caspase-7 and -8/-10 may represent the function of the common ancestor of the CED-3 subfamily since their respective functional branch lengths are virtually zero. We hope this hypothesis can be tested by experimentation.

Similar to any site-specific analysis, our prediction is sensitive to the quality of the multiple alignment. We examined the multiple alignment of the caspase family, particularly in the surrounding regions of the four verified predicted sites (Fig 5). To the best of our knowledge, the alignment can be considered "nearly optimized." For example, the alignment of position 161 (W348) is almost indisputable.

In conclusion, we conducted a case study to show the capability of predicting type I functional divergence (i.e., altered functional constraints that are site specific) from sequence evolution. Moreover, our analysis showed that a comprehensive approach including various computational methods and multilevel information (from sequence to experimental data) is beneficial for understanding functional diversity of a large gene family in the postgenomics era.


*  ACKNOWLEDGMENTS

We are grateful to Drs. C.-I Wu and Galvin Naylor for constructive comments, which have improved the manuscript significantly. Thanks go to Jianying Gu for assistance.This study is supported by National Institutes of Health grant RO1 GM62118 to X.G.

Manuscript received September 1, 2000; Accepted for publication March 29, 2001.


*  LITERATURE CITED
*TOP
*ABSTRACT
*METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

ARAVIND, L., V. M. DIXIT, and E. V. KOONIN, 1999  The domains of death: evolution of the apoptosis machinery. Trends Biochem. Sci. 24:47-53[Medline].

BERGERON, L., G. I. PEREZ, G. MACDONALD, L. SHI, and Y. SUN et al., 1998  Defects in regulation of apoptosis in caspase-2-deficient mice. Genes Dev. 12:1304-1314[Abstract/Free Full Text].

BUDIHARDJO, I., H. OLIVER, M. LUTTER, X. LUO, and X. WANG, 1999  Biochemical pathways of caspase activation during apoptosis. Annu. Rev. Cell. Dev. Biol. 15:269-290[Medline].

CASARI, G., C. SANDER, and A. VALANCIA, 1995  A method to predict functional residues in proteins. Nat. Struct. Biol. 2:171-178[Medline].

CHEN, W., H. G. WANG, S. M. SRINIVASULA, E. S. ALNEMRI, and N. R. COOPER, 1999  B cell apoptosis triggered by antigen receptor ligation proceeds via a novel caspase-dependent pathway. J. Immunol. 163:2483-2491[Abstract/Free Full Text].

CLARK, A. G., 1994  Invasion and maintenance of a gene duplication. Proc. Natl. Acad. Sci. USA 91:2950-2954[Abstract/Free Full Text].

DERMITZAKIS, E. T. and A. G. CLARK, 2001  Non-neutral diversification after duplication in mammalian developmental genes. Mol. Biol. Evol. 18:557-562[Abstract/Free Full Text].

FERNANDES-ALNEMRI, T., R. C. ARMSTRONG, J. KREBS, S. M. SRINIVASULA, and L. WANG et al., 1996  In vitro activation of CPP32 and Mch3 by Mch4, a novel human apoptotic cysteine protease containing two FADD-like domains. Proc. Natl. Acad. Sci. USA 93:7464-7469[Abstract/Free Full Text].

FORCE, A., M. LYNCH, F. B. PICKETT, A. AMORES, and Y. L. YAN et al., 1999  Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151:1531-1545[Abstract/Free Full Text].

GAUCHER, E. A., M. M. MIYAMOTO, and S. A. BENNER, 2001  Function-structure analysis of proteins using covarion-based evolutionary approaches: elongation factors. Proc. Natl. Acad. Sci. USA 98:548-552[Abstract/Free Full Text].

GOLDING, G. B. and A. M. DEAN, 1998  The structural basis of molecular adaptation. Mol. Biol. Evol. 15:355-369[Abstract].

GU, X., 1999  Statistical methods for testing functional divergence after gene duplication. Mol. Biol. Evol. 16:1664-1674[Abstract].

GU, X., 2001  Maximum likelihood approach for gene family evolution under functional divergence. Mol. Biol. Evol. 18:453-464[Abstract/Free Full Text].

GU, X. and J. ZHANG, 1997  A simple method for estimating the parameter of substitution rate variation among sites. Mol. Biol. Evol. 14:1106-1113[Abstract].

KUIDA, K., T. S. ZHENG, S. NA, C. KUAN, and D. YANG et al., 1996  Decreased apoptosis in the brain and premature lethality in CPP32-deficient mice. Nature 384:368-372[Medline].

KUMAR, S., 1995  ICE-like proteases in apoptosis. Trends Biochem. Sci. 20:198-202[Medline].

LANDGRAF, R., D. FISCHER, and D. EISENBERG, 1999  Analysis of heregulin symmetry by weighted evolutionary tracing. Protein Eng. 12:943-951[Abstract/Free Full Text].

LAZEBNIK, Y. A., A. TAKAHASHI, R. D. MOIR, R. D. GOLDMAN, and G. G. POIRIER et al., 1995  Studies of the lamin proteinase reveal multiple parallel biochemical pathways during apoptotic execution. Proc. Natl. Acad. Sci. USA 92:9042-9046[Abstract/Free Full Text].

LI, W.-H., 1983 Evolution of duplicate genes and pseudogenes, pp. 14–37 in Evolution of Genes and Proteins, edited by M. NEI and R. K. KEOHN. Sinauer Associates, Sunderland, MA.

LI, W.-H., 1993  Unbiased estimation of the rates of synonymous and nonsynonymous substitution. J. Mol. Evol. 36:96-99[Medline].

LICHTARGE, O., H. R. BOURNE, and F. E. COHEN, 1996  An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 257:342-358[Medline].

LIVINGSTONE, C. D. and G. J. BARTON, 1996  Identification of functional residues and secondary structure from protein multiple sequence alignment. Methods Enzymol. 266:497-512[Medline].

NAYLOR, G. J. and M. GERSTEIN, 2000  Measuring shifts in function and evolutionary opportunity using variability profiles: a case study of the globins. J. Mol. Evol. 51:223-233[Medline].

NICHOLSON, D. W. and N. A. THORNBERRY, 1997  Caspases: killer proteases. Trends Biochem. Sci. 22:299-306[Medline].

NICHOLSON, D. W., A. ALI, N. A. THORNBERRY, J. P. VAILLANCOURT, and C. K. DING et al., 1995  Identification and inhibition of the ICE/CED-3 protease necessary for mammalian apoptosis. Nature 376:37-43[Medline].

OHNO, S., 1970 Evolution by Gene Duplication. Springer-Verlag, Berlin.

POLLOCK, D., W. R. TAYLOR, and N. GOLDMAN, 1999  Coevolving protein residues: maximum likelihood identification and relationship to structure. J. Mol. Biol. 287:187-198[Medline].

ROTONDA, J., D. W. NICHOLSON, K. M. FAZIL, M. GALLANT, and Y. GATEAU et al., 1996  The three-dimensional structure of apopain/CPP32, a key mediator of apoptosis. Nat. Struct. Biol. 7:619-625.

SAITOU, N. and M. NEI, 1987  The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406-425[Abstract].

SRINIVASULA, S. M., T. FERNANDES-ALNEMRI, J. ZANGARILLI, N. ROBERTSON, and R. C. ARMSTRONG et al., 1996  The Ced-3/interleukin 1beta converting enzyme-like homolog Mch6 and the lamin-cleaving enzyme Mch2alpha are substrates for the apoptotic mediator CPP32. J. Biol. Chem. 271:27099-27106[Abstract/Free Full Text].

SUZUKI, Y. and T. GOJOBORI, 1999  A method for detecting positive selection at single amino acid sites. Mol. Biol. Evol. 16:1315-1328[Abstract].

THOMPSON, J. D., T. J. GIBSON, F. PLEWNIAK, F. JEANMOUGIN, and D. J. HIGGINS, 1997  The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25:4876-4882[Abstract/Free Full Text].

THORNBERRY, N. A. and Y. LAZEBNIK, 1998  Caspases: enemies within. Science 281:1312-1316[Abstract/Free Full Text].

THORNBERRY, N. A., H. G. BULL, J. R. CALAYCAY, K. T. CHAPMAN, and A. D. HOWARD et al., 1992  A novel heterodimeric cysteine protease is required for interleukin-1 beta processing in monocytes. Nature 356:768-774[Medline].

VAN DE CRAEN, M., G. VAN LOO, S. PYPE, W. VAN CRIEKINGE, and I. VAN DEN BRANDE et al., 1998  Identification of a new caspase homologue: caspase-14. Cell Death Differ. 5:838-846[Medline].

WANG, Y. and X. GU, 2000  Evolutionary patterns of gene families generated in the early stage of vertebrates. J. Mol. Evol. 51:88-96[Medline].

WANG, L., M. MIURA, L. BERGERON, H. ZHU, and J. YUAN, 1994  Ich-1, an Ice/ced-3-related gene, encodes both positive and negative regulators of programmed cell death. Cell 78:739-750[Medline].

WANG, S., M. MIURA, Y. K. JUNG, H. ZHU, and E. LI et al., 1998  Murine caspase-11, an ICE-interacting protease, is essential for the activation of ICE. Cell 92:501-509[Medline].

WILSON, K. P., J. A. BLACK, J. A. THOMSON, E. E. KIM, and J. P. GRIFFITH et al., 1994  Structure and mechanism of interleukin-1 beta converting enzyme. Nature 370:270-275[Medline].

YUAN, J. Y. and H. R. HORVITZ, 1990  The Caenorhabditis elegans genes ced-3 and ced-4 act cell autonomously to cause programmed cell death. Dev. Biol. 138:33-41[Medline].

YUAN, J., S. SHAHAM, S. LEDOUX, H. M. ELLIS, and H. R. HORVITZ, 1993  The C. elegans cell death gene ced-3 encodes a protein similar to mammalian interleukin-1 beta-converting enzyme. Cell 75:641-652[Medline].




This article has been cited by other articles:


Home page
Mol Biol EvolHome page
J. C. Havird, M. M. Miyamoto, K. P. Choe, and D. H. Evans
Gene Duplications and Losses within the Cyclooxygenase Family of Teleosts and Other Chordates
Mol. Biol. Evol., November 1, 2008; 25(11): 2349 - 2359.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
T. Yuri, R. T. Kimball, E. L. Braun, and M. J. Braun
Duplication of Accelerated Evolution and Growth Hormone Gene in Passerine Birds
Mol. Biol. Evol., February 1, 2008; 25(2): 352 - 361.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
J. P. Townsend
Profiling Phylogenetic Informativeness
Syst Biol, April 1, 2007; 56(2): 222 - 231.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
E. J. Vallender and B. T. Lahn
A primate-specific acceleration in the evolution of the caspase-dependent apoptosis pathway
Hum. Mol. Genet., October 15, 2006; 15(20): 3034 - 3040.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
X. Gu
A Simple Statistical Method for Estimating Type-II (Cluster-Specific) Functional Divergence of Protein Sequences
Mol. Biol. Evol., October 1, 2006; 23(10): 1937 - 1945.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
J. Xu, J. Zhang, L. Wang, J. Zhou, H. Huang, J. Wu, Y. Zhong, and Y. Shi
Solution structure of Urm1 and its implications for the origin of protein modifiers
PNAS, August 1, 2006; 103(31): 11625 - 11630.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
I. Braasch, W. Salzburger, and A. Meyer
Asymmetric Evolution in Two Fish-Specifically Duplicated Receptor Tyrosine Kinase Paralogons Involved in Teleost Coloration
Mol. Biol. Evol., June 1, 2006; 23(6): 1192 - 1202.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
C. Blouin, D. Butt, and A. J. Roger
Impact of Taxon Sampling on the Estimation of Rates of Evolution at Sites
Mol. Biol. Evol., March 1, 2005; 22(3): 784 - 791.
[Abstract] [Full Text] [PDF]


Home page
EndocrinologyHome page
Q. Chen, T. Yano, H. Matsumi, Y. Osuga, N. Yano, J. Xu, O. Wada, K. Koga, T. Fujiwara, K. Kugu, et al.
Cross-Talk between Fas/Fas Ligand System and Nitric Oxide in the Pathway Subserving Granulosa Cell Apoptosis: A Possible Regulatory Mechanism for Ovarian Follicle Atresia
Endocrinology, February 1, 2005; 146(2): 808 - 815.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
C. Guillet-Claude, N. Isabel, B. Pelgas, and J. Bousquet
The Evolutionary Implications of knox-I Gene Duplications in Conifers: Correlated Evidence from Phylogeny, Gene Mapping, and Analysis of Functional Divergence
Mol. Biol. Evol., December 1, 2004; 21(12): 2232 - 2245.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
D. G. Torgerson and R. S. Singh
Rapid Evolution Through Gene Duplication and Subfunctionalization of the Testes-Specific {alpha}4 Proteasome Subunits in Drosophila
Genetics, November 1, 2004; 168(3): 1421 - 1432.
[Abstract] [Full Text] [PDF]


Home page
J Biol RhythmsHome page
E. Tauber, K. S. Last, P. J.W. Olive, and C. P. Kyriacou
Clock Gene Evolution and Functional Divergence
J Biol Rhythms, October 1, 2004; 19(5): 445 - 458.
[Abstract] [PDF]


Home page
GeneticsHome page
Z. Zhang and H. Kishino
Genomic Background Predicts the Fate of Duplicated Genes: Evidence From the Yeast Genome
Genetics, April 1, 2004; 166(4): 1995 - 1999.
[Abstract] [Full Text] [PDF]


Home page
Plant Cell PhysiolHome page
A. Chujo, Z. Zhang, H. Kishino, K. Shimamoto, and J. Kyozuka
Partial Conservation of LFY Function between Rice and Arabidopsis
Plant Cell Physiol., December 15, 2003; 44(12): 1311 - 1319.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. A. Fares, D. Bezemer, A. Moya, and I. Marin
Selection on Coding Regions Determined Hox7 Genes Evolution
Mol. Biol. Evol., December 1, 2003; 20(12): 2104 - 2112.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. A. Fares and K. H. Wolfe
Positive Selection and Subfunctionalization of Duplicated CCT Chaperonin Subunits
Mol. Biol. Evol., October 1, 2003; 20(10): 1588 - 1597.
[Abstract] [Full Text]


Home page
Genome ResHome page
Y. Wu, X. Wang, X. Liu, and Y. Wang
Data-Mining Approaches Reveal Hidden Families of Proteases in the Genome of Malaria Parasite
Genome Res., April 1, 2003; 13(4): 601 - 616.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. Blouin, Y. Boucher, and A. J. Roger
Inferring functional constraints and divergence in protein families using 3D mapping of phylogenetic information
Nucleic Acids Res., January 15, 2003; 31(2): 790 - 797.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
E. A. Gaucher, U. K. Das, M. M. Miyamoto, and S. A. Benner
The Crystal Structure of eEF1A Refines the Functional Predictions of an Evolutionary Analysis of Rate Changes Among Elongation Factors
Mol. Biol. Evol., April 1, 2002; 19(4): 569 - 573.
[Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
B. Knudsen and M. M. Miyamoto
A likelihood ratio test for evolutionary rate shifts and functional divergence among proteins
PNAS, December 4, 2001; 98(25): 14512 - 14517.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
X. Gu
A Site-specific Measure for Rate Difference After Gene Duplication or Speciation
Mol. Biol. Evol., December 1, 2001; 18(12): 2327 - 2330.
[Full Text] [PDF]