| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Genetics, Vol. 173, 9-23, May 2006, Copyright © 2006
doi:10.1534/genetics.105.053249
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Molecular Evolution and Bioinformatics Laboratory, Department of Biology, National University of Ireland, Maynooth, Ireland
1 Corresponding author: Molecular Evolution and Bioinformatics Laboratory, Department of Biology, National University of Ireland, Maynooth, Ireland.
E-mail: mario.fares{at}nuim.ie
| ABSTRACT |
|---|
|
|
|---|
Methods designed to detect adaptive evolution can be based on Bayesian approaches (YANG et al. 2000) or maximum parsimony (SUZUKI and GOJOBORI 1999; FARES et al. 2002a). None of these methods takes into account the evolutionary interdependence between protein residues. A protein's function is, however, the result of the functional and structural communication between sites. Sites constraints are hence dependent on the interactions with other residues of the molecule. Mutations at either nearby sites or functionally related distant sites in the structure will change the selective constraints. The more complex the coevolution network is for a particular site, the greater the selection coefficient may be against a mutation at that site due to the dramatic effect that this mutation would have on other coevolving protein regions. Testing coevolution between sites is hence an essential step to complement molecular selection analyses, providing more biologically realistic results.
Grouping sites for molecular evolution analyses has been previously attempted (HUGHES and NEI 1988; CLARK and KAO 1991). Significant progress has been achieved in building more realistic models (FARES et al. 2002a; SUZUKI 2004; BERGLUND et al. 2005), albeit several problems regarding the molecular evolution of proteins are still unresolved. For instance, linear sliding-window methods are one-dimensional based and assume independence between different window regions irrespective of their three-dimensional proximity. Conversely, classification of amino acids in the same group of evolution based on their three-dimensional proximity (three-dimensional sliding window) will ignore the coevolution between functional regions that are spatially distant. Various reports state that residues can form a physically connected network that links distant functional sites in the tertiary protein structure (SÜEL et al. 2003). In fact, mapping energetic interactions in the PDZ domain family predicts a set of energetically coupled positions for a binding site residue that includes unexpected long-range interactions (LOCKLESS and RANGANATHAN 1999). Coevolution between clusters of sites, which are not in contact, has also been shown (PRITCHARD and DUFTON 2000). Coevolution between distant sites has been observed in sites proximal to regions with critical functions, where coevolution occurs to maintain the structural characteristics around these regions and consequently to maintain the protein conformational and functional stability (GLOOR et al. 2005).
Coevolution of any type has its origin in the covarion hypothesis proposed first by FITCH and MARKOWITZ (1970). This hypothesis states that, at any given time, some sites are invariable due to their functional or structural constraints but, as mutations are fixed elsewhere in the sequence, these constraints may change. Various methods for identifying covariant amino acid pairs at the molecular level have been previously developed (e.g., KORBER et al. 1993; GÖBEL et al. 1994; SHINDYALOV et al. 1994; TAYLOR and HATRICK 1994; TILLIER and COLLINS 1995; CHELVANAYAGAM et al. 1997; POLLOCK and TAYLOR 1997; LOCKHART et al. 1998; TUFFLEY and STEEL 1998; POLLOCK et al. 1999; PRITCHARD et al. 2001; TILLIER and LUI 2003; ANÉ et al. 2004; GALTIER 2004; DUTHEIL et al. 2005). The main limitation of many of these methods has been their inability to separate phylogenetic linkage from functional and structural coevolution. In other methods there was not an assessment of the random noise caused by a limited number of sequences in the alignment or by high pairwise distance. TILLIER and LUI (2003) attempted to remove the phylogenetic coevolution; however, their analysis was biased toward those positions covarying with at most a few others. GLOOR et al. (2005) partially corrected these effects although their method requires alignments of at least 125 sequences to remove stochastic covariation. Further, these methods do not simultaneously take into account the replacement propensity of a site, the background sequence divergence, and the three-dimensional information.
In this work we first develop a novel method for detecting coevolution between sites that allows for more realistic analyses of selective constraints. We then test previously hypothesized and experimentally supported interdomain coevolution in the 90-kDa heat-shock protein (Hsp90), the multimeric Hsp60 (GroEL), and the Gag protein from the human immunodeficiency virus type 1 (HIV-1). We finally apply the new method to uncover coevolution between sites previously detected as under adaptive evolution in GroEL and Gag. We show that coevolution analyses not only provide more realistic results but also highlight the molecular and structural mechanisms shaping the evolution of proteins.
| MATERIALS AND METHODS |
|---|
|
|
|---|
|
![]() | (1) |
The assumption made in Equation 1 is that the different types of amino acid transitions (slight to radical amino acid changes) in a particular site follow a Poisson distribution along time. The greater the time is since the divergence between sequences i and j the greater the probability is of having a radical change. A linear relationship is thus assumed between the BLOSUM values and time. Synonymous substitutions per site (
) are silent mutations, as they do not affect the amino acid composition of the protein. These mutations are therefore neutrally fixed in the gene. Assuming that synonymous sites are not saturated or under constraints, dS is proportional to the time since the two sequences compared diverged. Time (t) therefore is measured as dS. Note that convergent radical amino acid changes at two sites in sequences that have diverged recently have larger weights compared to convergent changes in distantly related sequences.
The next step is the estimation of the mean
-parameter for each site
of the alignment, so that
![]() | (2) |
![]() | (3) |
The variability of each pairwise amino acid transition compared to that of the site column is estimated as
![]() | (4) |
The mean variability for the corrected BLOSUM transition values is
![]() | (5) |
The coevolution between amino acid sites (A and B) is estimated thereafter by measuring the correlation in the pairwise amino acid variability, relative to the mean pairwise variability per site, between them. We thus use the relative variability rather than the absolute variability to measure the correlation between two sites. This ensures making the covariation independent from the differences in the rates of evolution of the sites compared. This covariation is measured as the correlation between their
-values, such as
![]() | (6) |
Here e and k are any two characters at sites A and B. To determine if the correlation coefficient (
AB) is significant, either a resampling or a simulation analysis can be performed. In the first approach we randomly sample K numbers of pairs of sites and compute Equations 16 for each pair. The mean correlation coefficient and its variance are then estimated as
![]() | (7) |
![]() | (8) |
The second approach consists of the Monte Carlo simulation of K sequence alignments. Here the coevolution test is conducted for a number of randomly selected pairs of sites in each simulated alignment or for the complete set of pairs of sites in the random data set computing Equations 16. An average value of the correlation for the simulated alignments and its variance are obtained utilizing Equation 7. Finally, the real correlation coefficients are tested using Equation 8.
The statistical power of the test is optimized by analyzing sites showing
![]() | (9) |
is the parametric value of
from Equation 5 and
is the standard deviation of
.
is calculated as
![]() | (10) |
Removing the phylogenetic coevolution:
Coevolution between amino acid sites can be the result of their structural, functional, or physical interaction; their phylogenetic convergence; and their stochastic covariation. The analysis of simulated data to test for significance removes stochastic effects. To disentangle functional, structural, and interaction coevolution from phylogenetic coevolution, the method is applied to the complete alignment and to subalignments, where specific phylogenetic clades are removed from the tree. Coevolving amino acid sites that are no longer detected following removal of one of the clades will be classified as phylogenetic coevolving sites as they occur in specific branches of the tree. Conversely, coevolving amino acid sites detected irrespective of the tree clades removed will be considered as functional/structural/interaction coevolving sites since they present correlated changes throughout the phylogenetic tree. Note that the latter condition means that when one amino acid changes, the covarying amino acid has necessarily to change. In the former condition, a change in one site does not always (in all branches) involve a change in the covarying site. In other words, our method detects phylogenetic-independent coevolution. Clades for coevolution analyses are defined in terms of their biological coherence and/or statistical support (defined as bootstrap values). Consequently, phylogenetic clades are specified prior to conducting the coevolutionary analysis and they include sequences that are forming either a well-defined biological cluster or alternatively a cluster supported by a high bootstrap value.
Using the atomic distances as additional information in coevolution analyses:
Spatial proximity between coevolving sites can be used to define their structural or functional interaction. In this method coevolution is not always synonymous with physical interaction but also involves structural and functional coevolution, as has been previously described (LOCKLESS and RANGANATHAN 1999; PRITCHARD and DUFTON 2000; SÜEL et al. 2003; GLOOR et al. 2005).
The three-dimensional closeness of two sites is estimated as the vectorial distance between their atomic centers (
). This distance is obtained by comparing the three-dimensional coordinates (X, Y, and Z) of atoms A and B for amino acids i and j:
![]() | (11) |
) between sites i and j is taken:
![]() | (12) |
Simulation studies:
We tested the sensitivity of CAPS using simulated data that allow for the control of the extent of coevolution and the evolutionary history of each site in the alignment. We also compared CAPS with other nonparametric methods that use the information theory or a Bayesian approximation, including the method of Korber [herein called the mutual information criterion (MICK) implemented in our program PIMIC and available on request; KORBER et al. 1993] and the method of TILLIER and LUI (2003) implemented in the program Dependency, as well as with the parametric method of POLLOCK et al. (1999) implemented in the program lnLCorr. While parametric methods can be more powerful than nonparametric ones, incorrect assumptions in the model can yield a high number of false positives (DIMMIC and HUBISZ 2005). We thus compared CAPS to more conservative nonparametric methods and to more powerful parametric methods. We simulated sequence alignments using a model similar to that devised by POLLOCK et al. (1999). Initially an ancestral sequence of 200 amino acids was generated using the amino acid composition corresponding to the equilibrium residue frequencies in naturally occurring proteins (JONES et al. 1992). This sequence was evolved using a Markov chain Monte Carlo simulation along a phylogenetic tree and, simultaneously, 10 pairs of sites were randomly selected to coevolve. Coevolution was established by forcing correlated variation such as the transition between amino acids at both sites had similar
ek as specified in Equation 1. We have also introduced coevolution without forcing this condition but results were unaltered. The phylogenetic trees used for simulations were bifurcated and symmetric (all branches had the same length). The robustness of the coevolution analysis to the sequence phylogenetic divergence was assessed using different levels of background noise and alignment sizes. We fixed the background noise (branch lengths) to evolve the ancestral sequence at 0.1, 0.2, 0.5, and 1 Poisson-corrected substitutions per site. The numbers of sequences tested were 10, 20, and 30 sequences. We simulated and tested multiple simulations for each condition (number of sequences and number of substitutions per site), performing 12 different types of simulation analyses. We measured the sensitivity (SN) of the method as
![]() | (13) |
![]() | (14) |
Analysis of real sequences:
We used CAPS to analyze the Gag protein from HIV-1, the 90-kDa heat-shock protein (Hsp90), and the 60-kDa heat-shock protein (GroEL). These proteins are multimeric and organized in functionally connected domains and are hence perfect for testing interdomain coevolution. To show an example of coevolution between sites that have undergone positive selection we used previously published data demonstrating adaptive evolution in GroEL from endosymbiotic bacteria of insects (FARES et al. 2002b, 2004) and the HIV-1 gag gene (YANG et al. 2003).
Gag protein from HIV-1:
The HIV-1 genome is translated to yield both structural and nonstructural proteins (WAIN-HOBSON et al. 1985). Among these proteins, Gag is a 55-kDa polyprotein that is initially associated with the cell membrane to ease the budding of virus particles from the host cell. Gag is further processed to produce four proteins called matrix (p17), capsid (p24), nucleocapsid (p9), and p6 (GOTTLINGER et al. 1989). We tested whether coevolution exists between specific Gag proteins or amino acid sites in specific HIV-1 lineages. The HIV-1 group M subtypes are thought to have originated from a single ancestor and are currently described by highly supported clades.
Hsp90:
Hsp90 is an ATPase molecular chaperone that assists the conformational maturation of molecules involved in cell-cycle regulation and signal transduction (PRATT 1998; BUCHNER 1999; CAPLAN 1999; MAYER and BUKAU 1999). Hsp90 is translated as a monomeric protein but its function depends on its dimerization. Several functional domains can be identified in the linear sequence of Hsp90 (supplemental Table 1 at http://www.genetics.org/supplemental/). The importance of the complex intramolecular interactions for the Hsp90 function is poorly understood (PRODROMOU et al. 1999; JOHNSON et al. 2000; CHADLI et al. 2000). Here we apply CAPS to test and identify interdomain coevolution within Hsp90.
Analysis of the heat-shock protein 60-kDa GroEL:
The ATPase molecular chaperone GroEL is found specifically in bacteria and the organelles of eukaryotic cells (LANDRY et al. 1993). The multimeric protein GroEL folds 1015% of slow-folding proteins, which are mostly aggregation prone (DEUERLING et al. 1999; THULASIRAMAN et al. 1999). Each GroEL subunit is organized in three domains; apical, equatorial, and intermediate (BRAIG et al. 1994, 1995). Several functionally important intradomain regions in GroEL have been previously identified (supplemental Table 2 at http://www.genetics.org/supplemental/). Here we test if coevolution among sites is crucial for the functional and structural stability of GroEL.
Sequence alignments and phylogenetic inferences:
GenBank accession numbers for the gag, hsp90, and groel sequences are provided in supplemental Tables 3, 4, and 5, respectively, at http://www.genetics.org/supplemental/. Protein sequences were aligned (available from the corresponding author upon request) using CLUSTAL X (JEANMOUGIN et al. 1998). Nucleotide sequences were then aligned, concatenating triplets according to the amino acid sequence alignment.
The phylogeny of the HIV-1 group M subtypes is very well defined (ROBERTSON et al. 2000). Representative sequences were selected in the manner described previously (TRAVERS et al. 2005). For each subtype, all available full-genome sequences were retrieved from the Los Alamos HIV database (http://hiv-web.lanl.gov) and a neighbor-joining tree of each resulting data set was reconstructed using PAUP* 4.0b10 (SWOFFORD 1998). Representative sequences were selected for each subtype on the basis of their spread throughout the subtype tree, resulting in the selection of a diverse range of sequences for each subtype (supplemental Table 3 at http://www.genetics.org/supplemental/). The resulting data set contained 36 taxa.
For Hsp90 we used sequences from unicellular and multicellular eukaryotes comprising a total of 43 taxa. Aligned sequences were subject to phylogenetic analyses using PAUP* 4.0b10. Maximum-likelihood and maximum-parsimony analyses yield the same phylogenetic tree. We used the GroEL phylogenetic tree obtained in a previous work (FARES et al. 2002b).
Analysis of coevolution:
Coevolution analyses were implemented in the program CAPSv1.0 (available from the corresponding author on request). Synonymous substitutions dS were considered to be proportional to the time since the divergence between sequences since no indication of saturation of synonymous sites was observed using SWAPSC v1.0 (FARES 2004). The significance of the correlation coefficients was tested using 10,000 pseudorandom pairs of amino acid sites and a confidence value of (
= 0.001), to minimize type I error. Clades defined in each protein for coevolution analyses are indicated in Figures 2, 3A, and 4A.
|
|
|
| RESULTS |
|---|
|
|
|---|
Although all methods find a high percentage of true coevolutionary amino acid pairs (Figure 2, DF), the alternate methods to which CAPS was compared also identified large numbers of false positives, showing sensitivity values ranging between 8 and 17% in MICK, between 7 and 15% in Dependency, and between 7 and 8% in lnLCorr (Figure 2A). When the alignment size increased to 20 sequences, the sensitivity of MICK improved at all the distances, ranging between 20 and 15% (Figure 2B). Dependency and lnLCorr presented lower sensitivity values compared to MICK, ranging between 8 and 10% (Figure 2B). Using alignments containing 30 sequences, the sensitivity of MICK, Dependency, and lnLCorr decreased as the average sequence pairwise distance increased (Figure 2C). MICK, Dependency, and lnLCorr seem to require very dense phylogenetic trees as previously suggested (POLLOCK et al. 1999; TILLIER and LUI 2003). Using alignments of 200 sequences does not seem to change the sensitivity of CAPS (data not shown). The sensitivity of CAPS increases with the number of sequences in the alignment (using a multivariate test, F = 9.968, P = 0.016), unlike the other methods, which present no evidence for such an increase (Figure 3A). Conversely, the level of pairwise amino acid divergence negatively affects all the methods except MICK (F = 1.446, P = 0.309) (Figure 3B).
We analyzed the effect of two factors, alignment size and amino acid substitutions per site, on the sensitivity of all the methods in general and of each method individually. A multivariate test demonstrates that neither of these factors alone influences the sensitivity of the methods for detecting coevolution (F = 1.951 and F = 2.301 and P = 0.267 and P = 0.090, for the effects of the alignment size and amino acid distance, respectively). The interaction of both factors, however, has a significant effect on the sensitivity of the four methods taken together (F = 131.938; P << 0.001) or individually (data not shown). In summary, a low number of sequences combined with a high pairwise sequence divergence negatively affect the sensitivity of all the methods for detecting coevolution.
Analysis of the specificity of the methods demonstrates that CAPS is more specific than the alternate methods, although specificity was detected to be significant in all four methods used (data not shown).
Phylogenetic and functional coevolution in Gag from HIV-1:
Following coevolution analysis of the complete gag alignment, we identified 21 groups of coevolving sites, containing 73 unique residues (Figure 4A). Of these 73 residues, 42 were observed to have undergone phylogenetic coevolution, in that removal of a particular clade in the analysis resulted in loss of detection of that site in the subsequent analysis using CAPS. In all lineages coevolving sites were spread throughout the gene with the majority of sites present in the functional p17 and p24 regions (Figure 4B). Interestingly, while subtypes A and G, which evolved from a common ancestor, have the highest number of phylogenetically coevolving sites only three residues (E107, I147, and T186) are shared between them, indicating the presence of both lineage-specific and ancestral coevolution within the gag gene (Figure 4B). Also, numerous examples of convergent coevolution were observed between subtypes (Figure 4B).
Two groups of amino acid sites (R15, K28, R91, A120 and E55, V128) maintained their grouping no matter what clade (defined as subtype) was removed, indicating functional/structural/physical interaction (herein called functional groups, FGs) coevolution between these sites. We observed 8 of the 14 positively selected amino acids identified by YANG et al. (2003) as coevolving within the HIV-1 group M phylogeny (R15, K28, G62, Q69, R91, I138, N252, and T280) with 5 of these (K28, G62, Q69, R91, and T280) having been identified by Yang et al. as undergoing adaptive evolution in separate analyses of subtypes A, B, and C. Of the 8 residues overlapping between our study and that of Yang et al., 3 (R15, K28, and R91) are present in one of the two FGs.
Important functional regions in Hsp90 do coevolve:
The clades defined for the coevolution analyses were those defined as biologically distinguishable organisms, including Hsp90 from endoplasmic reticulum, chloroplast, yeast, plants, protozoan parasites, insects, and high eukaryotes (Figure 5A). The application of the coevolution analysis to Hsp90 identified 34 groups of coevolution (G1G34; Figure 5B). A significant proportion of the coevolving sites have an important reported biological function (Table 1 and Figure 5B).
|
|
The mean variance for the amino acid transition in each group of coevolution (
) ranged between 0.531 and 2.084, whereas the mean correlation (
) varied between 0.625 and 0.948 (Table 1). The three-dimensional structure is available only for the amino-terminal and middle segments. The analysis of the atomic distances (AD) identified a certain percentage of coevolving residues within each group as spatially close (Table 1). Examples are the pairs of amino acids [(S113, A112) and (E4, G3, and V163)] in the amino-terminal domain (Figure 5C) and the groups of amino acids [(E351, N382), (Q409, F410), and (K445, S446)] in the middle segment (Figure 5D). A number of coevolving groups exhibit coevolution between sites significantly distant but functionally related (Table 1).
Revealing adaptive coevolution in GroEL:
Application of CAPS to GroEL, taking into account the clusters belonging to different bacterial groups and defined in Figure 6A, identified 17 groups of coevolving amino acids (G1G17; Figure 6B). Seventy-five percent of coevolving sites were previously reported in functional data (supplemental Table 2 at http://www.genetics.org/supplemental/), or by neural network analysis, as functionally or structurally important.
|
) ranged between 0.739 and 2.477, while the mean correlation (
) varied between 0.510 and 0.944 (Table 2). All of the groups detected included sites belonging to the apical and equatorial domains with very few sites belonging to the intermediate domain. Most of the sites not identified by functional data or undetected by predicting neural networks presented significantly short distances to functionally or structurally important sites. Examples of spatially close sites were sites [(K132, L134, K425), (A127, K132), and (S424, K425)] in the equatorial domain and sites (T210, G211, V213) in the apical domain (Figure 6C).
|
| DISCUSSION |
|---|
|
|
|---|
Coevolution within the gag gene phylogeny:
We have identified coevolution between the different functional regions of the gag gene in HIV-1 (Figure 3A). The two groups of amino acid sites exhibiting coevolution regardless of clade removal (R15, K28, R91, A120 and E55, V128) are detected as having a functional or structural dependency and included sites previously detected as having undergone adaptive evolution. Also, 42 of 73 sites were detected as coevolving phylogenetically with most of these coevolving sites being unique to specific lineages, thereby providing evidence for possible selective shifts between subtypes within the HIV-1 group M phylogeny. Interestingly, some of those sites were identified as phylogenetic ancestral coevolution (coevolving residues in the branch leading to the ancestor of two subtypes) or convergent coevolution (between lineages that do not share a recent common ancestor; Figure 4B). These results provide further evidence of the need for a more comprehensive evolutionary analysis of the distinct HIV-1 group M subtypes to obtain a full understanding of past, present, and future HIV-1 dynamical change.
Coevolution between functional domains in Hsp90:
Several studies demonstrate that Hsp90 functions through marked conformational changes that are governed by complex interdomain interactions (e.g., PRODROMOU et al. 1999; CHADLI et al. 2000). Because of this complexity, the number of experimental analyses required to fully understand intramolecular Hsp90 interactions is prohibitively high. Moreover, experimental identification of residues involved in the stability of interdomain interactions due to their spatial proximity to functionally important domains is anything but straightforward. Computational methods are hence instrumental in the detection and a priori identification of regions or domains in which functional interaction should be tested.
CAPS identified 100% of the Hsp90 interdomain interactions proposed in previous studies and uncovered potential interactions that should be tested. We detected coevolution between sites belonging to PI2 and PI3 domains that are involved in binding different protein clients (Table 1). Simultaneous binding of proteins by Hsp90 has been proposed by other authors who suggest a synergistic effect of these proteins in the function of Hsp90 (CHEN et al. 1998). An example of simultaneous substrate binding is that provided by the eNOS activation pathway, where the upstream activator of PKB/Akt, PDK1, binds Hsp90 simultaneously to eNOS (FUJITA et al. 2002). Coevolution between these domains has interesting implications for the simultaneous and regulated binding of proteins with Hsp90. The method also detects coevolved amino acid sites involved in domain dimerization (DD2 and DD3) and cochaperone binding. Sites belonging to DD2 have been associated with the binding of accessory proteins such as Hop and immunophilines that are essential for the interaction and maturation of the complex Hsp90-Hsc70-client proteins (CHEN et al. 1998). Deletion of the region 661677 from the Hsp90 of the chicken Gallus gallus results in loss of Hsp90 dimerization and diminished interactions with all cofactors (CHEN et al. 1998). Within this region, the method detected amino acid S662 (groups 3 and 30), N673 (group 31), and I674 (group 32). Not surprisingly, we also detected coevolution between these sites and others involved in interaction with eNOS, AKT, and glucocorticoid receptors (GRs).
The detected coevolution between sites from the N-terminal domain (A112S115), involved in nucleotide binding, and sites from the middle segment and C-terminal domain (K445 and S446), involved in binding unfolded polypeptides, correlates with their functional relationships (JOHNSON et al. 2000). The coevolution between the N-terminal (S115) and the C-terminal (K636) ATP-binding domains supports previous functional data (SÖTI et al. 2002) and also correlates with the coevolution between the ATP-binding pocket in the N-terminal domain (A112 and D113 in group 17; S115 in group 23) and the ATP modulator domain (E351 in groups 17 and 23). The detected coevolution of sites belonging to distinct binding pockets of accessory elements (e.g., in group 24) supports works pinpointing the temporal regulation and coordinated interaction of cochaperones and Hsp90 (CHEN et al. 1998). Other sites have been identified as coevolving with DD2 including, remarkably, the site W296 in which mutation in humans interferes in the interaction between Hsp90 and Akt (MEYER et al. 2003).
Finally, other sites with no reported functional importance to date have been detected in this study to coevolve with sites involved in protein interaction, ATP binding, or dimerization. All of these sites are exposed in the protein as predicted by the neural network analyses and by the three-dimensional model of Hsp90. These sites might establish additional interactions between Hsp90 domains and protein clients or may be responsible for the stabilization of the Hsp90 dimer through intermonomer interactions.
GroEL is under strong selective constraints of coevolution:
GroEL analysis points to three conclusions: Some of the coevolving sites are spatially close, supporting a functional and maybe structural complicity; the main interdomain coevolution took place between the apical and equatorial domains; and amino acid sites under adaptive evolution have also been identified as coevolving between each other.
Previous works have shown that GroEL has been accumulating advantageous mutations to improve its ability to compensate the effect of slightly deleterious and conformational-destabilizing mutations fixed in the proteome of endosymbionts (FARES et al. 2002b, 2004). Studies have even suggested that GroEL has been fixing slightly deleterious mutations as a result of the genetic drift operating in the genome of these bacteria (HERBECK et al. 2003). The fact that the method has detected amino acids coevolving with spatially close functional sites (Figure 5C) and that these sites underwent adaptive evolution suggests the existence of compensatory changes to maintain the overall structural stability of GroEL. This result can be fully explained under the neutral theory of evolution (KIMURA 1983). Under this model, changes altering the amino acid side-chain spatial distribution will disrupt the already optimized performance of the protein's contribution to the organism's fitness. These disadvantageous mutations can be compensated only by changes in spatially proximal sites, where mutations will be fixed by adaptive evolution. The joint contribution of both mutations to the fitness of the organism will be neutral (FUKAMI-KOBAYASHI et al. 2002).
Advantages and limitations of the method:
Many methods developed to detect coevolution (e.g., GÖBEL et al. 1994; NEHER 1994) have shown an inability to screen out background correlation, particularly in the presence of phylogenetic relatedness between the sequences, and to distinguish between positive and negative correlation (POLLOCK and TAYLOR 1997). As we have shown, CAPS is effective in distinguishing background correlation from true correlations. CAPS analysis can be performed with no knowledge of the phylogenetic relationships among sequences. We have, however, shown that the removal of well-defined clades can serve in the identification of structural/functional coevolution.
Coevolution methods have also been used in an attempt to resolve the docking problem (GÖBEL et al. 1994; PAZOS et al. 1997). Despite their elegance, most of these methods failed to distinguish unambiguously between coevolution and phylogenetic noise. Correction of BLOSUM values by the sequence divergence and the consideration of functional and structural coevolution in CAPS allows isolation of the true coevolutionary events. It is important to mention that, unlike other studies, atomic distances are not used here as evidence of coevolution but rather as additional supporting information in the identification of functional and structural coevolution.
As expected, the method does not lack limitations. For example, saturation of synonymous sites can lead to underestimates of the divergence times, although data sets used in this study did not show such effects. The number of sequences in the alignment also poses a problem when sequences are too divergent, although the sensitivity is improved compared to that of previous methods. Further, constant amino acid sites that are very likely to be functionally important cannot be tested for coevolution using CAPS, although this limitation affects all the methods so far. Moreover, our method assumes that the coevolutionary relationship between a pair of sites remains constant through time. This assumption can be simplistic when analyzing alignments including highly divergent sequences. Nonetheless, the dynamic removal of prespecified phylogenetic clades ameliorates this problem.
Finally, even though coevolutionary analyses can be used to identify proteinprotein interaction interfaces, CAPS is not designed for such a purpose. The reason is that, while interaction would necessarily involve coevolution, coevolution does not imply physical interaction. Detecting amino acids involved in proteinprotein interactions is a more complex problem, requiring the consideration of other parameters such as solvent accessibility, physiochemical amino acid properties, etc. We have shown here that coevolutionary analyses in biologically key molecules add another dimension to selective constraints analyses and provide more interpretable results.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
| LITERATURE CITED |
|---|
|
|
|---|
ANÉ, C., J. G. BURLEIGH, M. M. MCMAHON and M. J. SANDERSON, 2004 Covarion structure in plastid genome evolution: a new statistical test. Mol. Biol. Evol. 22: 914924.[Medline]
BEREZIN, C., F. GLASER, J. ROSENBERG, I. PAZ, T. PUPKO et al., 2004 ConSeq: the identification of functionally and structurally important residues in protein sequences. Bioinformatics 20: 13221324.
BERGLUND, A.-C., B. WALLNER, A. ELOFSSON and D. A. LIBERLES, 2005 Tertiary windowing to detect positive diversifying selection. J. Mol. Evol. 60: 499504.[CrossRef][Medline]
BOISVERT, D. C., J. WANG, Z. OTWINOWSKI, A. L. HORWICH and P. B. SIGLER, 1996 The 2.4 Å crystal structure of the bacterial chaperonin GroEL complexed with ATP gamma S. Nat. Struct. Biol. 3: 170177.[CrossRef][Medline]
BRAIG, K., Z. OTWINOWSKI, R. HEGDE, D. C. BOISVERT, A. JOACHIMIAK et al., 1994 The crystal structure of the bacterial chaperonin GroEL at 2.8Å. Nature 371: 578586.[CrossRef][Medline]
BRAIG, K., P. D. ADAMS and A. T. BRÜNGER, 1995 Conformational variability in the refined structure of the chaperonin GroEL at 2.8Å resolution. Nat. Struct. Biol. 2: 10831094.[CrossRef][Medline]
BUCHNER, J., 1999 Hsp90 & Co.a holding for folding. Trends Biochem. Sci. 24: 136141.[CrossRef][Medline]
CAPLAN, A. J., 1999 Hsp90's secrets unfold: new insights from structural and functional studies. Trends Cell Biol. 9: 262268.[CrossRef][Medline]
CHADLI, A., I. BOUHOUCHE, W. SULLIVAN, B. STENSGARD, N. MCMAHON et al., 2000 Dimerization and N-terminal domain proximity underlie the function of the molecular chaperone heat shock protein 90. Proc. Natl. Acad. Sci. USA 97: 1252412529.
CHELVANAYAGAM, G., A. EGGENSCHWILER, L. KNECHT, G. H. CONNET and S. A. BENNER, 1997 An analysis of simultaneous variation in protein structures. Protein Eng. 10: 307316.
CHEN, S., W. P. SULLIVAN, D. O. TOFT and D. F. SMITH, 1998 Differential interactions of p23 and the TPR-containing proteins Hop, Cyp40, FKBP52 and FKBP51 with Hsp90 mutants. Cell Stress Chaperones 3: 118129.[CrossRef][Medline]
CLARK, A. G., and T. H. KAO, 1991 Excess nonsynonymous substitution of shared polymorphic sites among self-incompatibility alleles of Solanaceae. Proc. Natl. Acad. Sci. USA 88: 98239827.
DEUERLING, E., A. SCHULZE-SPECKING, T. TOMOYASU, A. MOGK and B. BUKAU, 1999 Trigger factor and DnaK cooperate in folding of newly synthesized proteins. Nature 400: 693696.[CrossRef][Medline]
DIMMIC, M. W., and M. J. HUBISZ, 2005 Detecting coevolving amino acid sites using Bayesian mutational mapping. Bioinformatics 21: i126i135.[Abstract]
DUTHEIL, J., T. PUPKO, A. JEAN-MARIE and N. GALTIER, 2005 A model-based approach for detecting co-evolving positions in a molecule. Mol. Biol. Evol. 22: 19191928.
FARES, M. A., 2004 SWAPSC: sliding-window analysis procedure to detect selective constraints. Bioinformatics 20: 28672868.
FARES, M. A., S. F. ELENA, J. ORTIZ, A. MOYA and E. BARRIO, 2002a A sliding window-based method to detect selective constraints in protein-coding genes and its application to RNA viruses. J. Mol. Evol. 55: 509521.[CrossRef][Medline]
FARES, M. A., E. BARRIO, B. SABATER-MUNOZ and A. MOYA, 2002b The evolution of the heat-shock protein GroEL from Buchnera, the primary endosymbiont of aphids, is governed by positive selection. Mol. Biol. Evol. 19: 11621170.
FARES, M. A., A. MOYA and E. BARRIO, 2004 GroEL and the maintenance of bacterial endosymbiosis. Trends Genet. 20: 413416.[CrossRef][Medline]
FITCH, W. M., and E. MARKOWITZ, 1970 An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochem. Genet. 4: 579593.[CrossRef][Medline]
FUJITA, N., S. SATO, A. ISHIDA and T. TSURUO, 2002 Involvement of Hsp90 in signaling and stability of 3-phosphoinositide-dependent kinase-1. J. Biol. Chem. 277: 1034610353.
FUKAMI-KOBAYASHI, K., D. R. SCHREIBER and S. A. BENNER, 2002 Detecting compensatory covariation signals in protein evolution using reconstructed ancestral sequences. J. Mol. Biol. 319: 729743.[CrossRef][Medline]
GALTIER, N., 2004 Sampling properties of the bootstrap support in molecular phylogeny: influence of nonindependence among sites. Syst. Biol. 53: 3846.[CrossRef][Medline]
GLOOR, G. B., L. C. MARTIN, L. M. WAHL and S. D. DUMN, 2005 Mutual information in protein multiple sequence alignment reveals two classes of coevolving positions. Biochemistry 44: 71567165.[CrossRef][Medline]
GÖBEL, U., C. SANDER, R. SCHNEIDER and A. VALENCIA, 1994 Correlated mutations and residue contacts in proteins. Proteins 18: 309317.[CrossRef][Medline]
GOTTLINGER, H. G., J. G. SODROSKI and W. A. HASELTINE, 1989 Role of capsid precursor processing and myristoylation in morphogenesis and infectivity of human immunodeficiency virus type I. Proc. Natl. Acad. Sci. USA 86: 57815785.
HENIKOFF, S., and J. G. HENIKOFF, 1992 Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA. 89: 1091510919.
HERBECK, J. T., D. J. FUNK, P. H. DEGNAN and J. J. WERNEGREEN, 2003 A conservative test of genetic drift in the endosymbiotic bacterium Buchnera: slightly deleterious mutations in the chaperonin groEL. Genetics 165: 16511660.
HUGHES, A. L., and M. NEI, 1988 Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection. Nature 335: 167170.[CrossRef][Medline]
JEANMOUGIN, F., J. D. THOMPSON, M. GOUY, D. G. HIGGINS and T. J. GIBSON, 1998 Multiple sequence alignment with Clustal X. Trends Biochem. Sci. 23: 403405.[CrossRef][Medline]
JONES, D. T., W. R. TAYLOR and J. M. THORNTON, 1992 The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8: 275282.
JOHNSON, B. D., A. CHADLI, S. J. FELTS, I. BOUNHOUCHE, M. G. CATELLI et al., 2000 Hsp90 chaperone activity requires the full-length protein and interaction among its multiple domains. J. Biol. Chem. 275: 3249932507.
KIMURA, M., 1983 The neutral theory of molecular evolution. Cambridge University Press, Cambridge, UK.
KORBER, B. T., R. M. FARBER, D. H. WOLPERT and A. S. LAPEDES, 1993 Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis. Proc. Natl. Acad. Sci. USA 8: 15491560.
LANDRY, S. J., J. ZEILSTRA-RYALLS, O. FAYET, C. GEORGOPOULOS and L. M. GIERASCH, 1993 Characterization of a functionally important mobile domain in GroES. Nature 364: 255258.[CrossRef][Medline]
LOCKHART, P. J., M. A. STEEL, A. C. BARBROOK, D. H. HUSON, M. A. CHARLESTON et al., 1998 A covariatoid model explains apparent phylogenetic structure of oxygenic photosynthetic lineages. Mol. Biol. Evol. 15: 11831188.[Abstract]
LOCKLESS, W., and R. RANGANATHAN, 1999 Evolutionary conserved pathways of energetic connectivity in protein families. Science 286: 295299.
MAYER, M. P., and B. BUKAU, 1999 Molecular chaperones: the busy life of Hsp90. Curr. Biol. 9: R322R325.[CrossRef][Medline]
MEYER, P., C. PRODROMOU, B. HU, C. VAUGHAN, S. M. ROE et al., 2003 Structural and functional analysis of the middle segment of hsp90: implications for ATP hydrolysis and client protein and cochaperone interactions. Mol. Cell 11: 647658.[CrossRef][Medline]
NEHER, E., 1994 How frequent are correlated changes in families of protein sequences? Proc. Natl. Acad. Sci. USA 91: 98102.
PAZOS, F., M. HELMER-CITTERICH, G. AUSIELLO and A. VALENCIA, 1997 Correlated mutations contain information about protein-protein interaction. J. Mol. Biol. 271: 511523.[CrossRef][Medline]
POLLOCK, D. D., and W. R. TAYLOR, 1997 Effectiveness of correlation analysis in identifying protein residues undergoing correlated evolution. Protein Eng. 10: 647657.
POLLOCK, D. D., W. R. TAYLOR and N. GOLDMAN, 1999 Coevolving protein residues: maximum likelihood identification and relationship to structure. J. Mol. Biol. 287: 187198.[CrossRef][Medline]
PRATT, W. B., 1998 The hsp90-based chaperone system: involvement in signal transduction from a variety of hormone and growth factor receptors. Proc. Soc. Exp. Biol. Med. 217: 420434.[Abstract]
PRITCHARD, L., and M. J. DUFTON, 2000 Do proteins learn to evolve? The hopfield network as a basis for the understanding of protein evolution. J. Theor. Biol. 202: 7786.[CrossRef][Medline]
PRITCHARD, L., P. M. O. BLADON, J. J. MITCHELL and M. J. DUFTON, 2001 Evaluation of a novel method for the identification of coevolving protein residues. Protein Eng. 14: 549555.
PRODROMOU, C., S. M. ROE, R. O'BRIEN, J. E. LADBURY, P. W. PIPER et al., 1997 Identification and structural characterization of the ATP/ADP-binding site in the Hsp90 molecular chaperone. Cell 90: 6575.[CrossRef][Medline]
PRODROMOU, C., G. SILIGARDI, R. O'BRIEN, D. N. WOOLFSON, L. REGAN et al., 1999 Regulation of Hsp90 ATPase activity by the tetratricopeptide repeat (TPR)-domain co-chaperones. EMBO J. 18: 754762.[CrossRef][Medline]
ROBERTSON, D. L., J. P. ANDERSON, J. A. BRADAC, J. K. CARR, B. FOLEY et al., 2000 HIV-1 nomenclature proposal. Science 288: 5556.[Medline]
SHINDYALOV, I. N., N. A. KOLCHANOV and C. SANDER, 1994 Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Eng. 7: 349358.
SÖTI, C., A. RACZ and P. CSERMELY, 2002 A nucleotide-dependent molecular switch controls ATP binding at the C-terminal domain of Hsp90. N-terminal nucleotide binding unmasks a C-terminal binding pocket. J. Biol. Chem. 277: 70667075.
SÜEL, G. M., S. W. LOCKLESS, A. A. WALL and R. RANGANATHAN, 2003 Evolutionary conserved networks of residues mediate allosteric communication in proteins. Nat. Struct. Biol. 10: 5969.[Medline]
SUZUKI, Y., 2004 Three-dimensional window analysis for detecting positive selection at structural regions of proteins. Mol. Biol. Evol. 21: 23522359.
SUZUKI, Y., and T. GOJOBORI, 1999 A method for detecting positive selection at single amino acid sites. Mol. Biol. Evol. 16: 13151328.[Abstract]
SWOFFORD, D. L., 1998 PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Sinauer Associates, Sunderland, MA.
TAYLOR, W. R., and K. HATRICK, 1994 Compensating changes in protein multiple sequence alignments. Protein Eng. 7: 341348.
THULASIRAMAN, V., C. F. YANG and J. FRYDMAN, 1999 In vivo newly translated polypeptides are sequestered in a protected folding environment. EMBO J. 18: 8595.[CrossRef][Medline]
TILLIER, E. R., and R. A. COLLINS, 1995 Neighbor-joining and maximum-likelihood with rna sequencesaddressing the interdependence of sites. Mol. Biol. Evol. 12: 715.[Medline]
TILLIER