help button home button Genetics J Clin Inv
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Zanotto, P. M. d. A.
Right arrow Articles by Holmes, E. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zanotto, P. M. d. A.
Right arrow Articles by Holmes, E. C.
Genetics, Vol. 153, 1077-1089, November 1999, Copyright © 1999

Genealogical Evidence for Positive Selection in the nef Gene of HIV-1

Paolo M. de A. Zanottoa, Esper G. Kallasb, Robson F. de Souzaa, and Edward C. Holmesc
a Bioinformatics and Retrovirology Laboratory, Universidade Federal de São Paulo, São Paulo, CEP 05508-900, Brazil
b Laboratory of Immunology DIPA–Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo, CEP 05508-900, Brazil
c The Wellcome Trust Centre for the Epidemiology of Infectious Disease, Department of Zoology, University of Oxford, Oxford OX1 3PS, United Kingdom

Corresponding author: Paolo M. de A. Zanotto, Bioinformatics and Retrovirology Laboratory, Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo, CEP 05508-900, Brazil., pzanotto{at}usp.br (E-mail)

Communicating editor: J. HEY


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

The pattern and process of evolution in the nef gene of HIV-1 was analyzed within and among patients. Using a maximum likelihood method that allows for variable intensity of selection pressure among codons, strong positive selection was detected in a hemophiliac patient over 30 mo of infection. By reconstructing the process of allele substitution in this patient using parsimony, the synapomorphic amino acid changes separating each time point were found to have high probabilities of being under positive selection, with selective coefficients of at least 3.6%. Positive selection was also detected among 39 nef sequences from HIV-1 subtype B. In contrast, multiple pairwise comparisons of nonsynonymous and synonymous substitution rates provided no good evidence for positive selection and sliding window analyses failed to detect most positively selected sites. These findings demonstrate that positive selection is an important determinant of nef gene evolution and that genealogy-based methods outperform pairwise methods in the detection of adaptive evolution. Mapping the locations of positively selected sites may also be of use in identifying targets of the immune response and hence aid vaccine design.


THE nature of the evolutionary interaction between the human immunodeficiency virus (HIV) and the human immune system has been the source of much debate, and increasingly so given the desire to understand how and why resistance appears to combinations of antiviral drugs (LEIGH BROWN and RICHMAN 1997 Down). To some it is a system governed by chance, starting with the random activation of HIV-infected cells by foreign antigens (WAIN-HOBSON 1994 Down, WAIN-HOBSON 1996 Down), followed by a process of allele substitution dominated by genetic drift (LEIGH BROWN 1997 Down; PLIKAT et al. 1997 Down). To others, the immune-driven positive selection of advantageous viral mutants plays the pivotal role, such that the process of within-host viral evolution is characterized by the successive appearance of escape mutants that evade the prevailing immune response (NOWAK et al. 1996 Down; MCMICHAEL and PHILLIPS 1997 Down; MCMICHAEL 1998 Down).

Evidence for the importance of natural selection in HIV evolution comes from studies of both host and virus. On the host side it is well established that the immune response against HIV infection is mainly orchestrated by T lymphocytes, among which the cytotoxic T CD8+ cells (CTLs) play a vital role in recognizing epitopes presented by MHC class I molecules. The importance of CTLs can be inferred from the correlation between CTL activity and the control of HIV-1 viral load, with long-term nonprogressors to AIDS having particularly strong CTL responses (MUSEY et al. 1997 Down; OGG et al. 1998 Down). More recently it was also observed that levels of viremia increased in HIV-1-infected rhesus monkeys following the removal of CD8+ lymphocytes (SCHMITZ et al. 1999 Down). T helper (CD4+) lymphocytes have also been linked to the control of HIV viral load, and possibly influence the entire cellular and humoral immune response (ROSENBERG et al. 1997 Down), perhaps by enabling antigen-presenting cells to mount a stronger CTL response (BENNETT et al. 1998 Down; RIDGE et al. 1998 Down; SCHOENBERGER et al. 1998 Down).

On the virus side there is equally compelling evidence that HIV-1 is able to escape CTL recognition during infection. Several reports suggest that HIV-1 can respond to the selective pressure imposed by CTLs by fixing amino acid point mutations or deletions (KOENIG et al. 1995 Down; BORROW et al. 1997 Down; GOULDER et al. 1997 Down; PRICE et al. 1997 Down). However, the characterization of amino acid changes related to CTL escape is complex and, aside from their appearance, there is often little direct evidence that they are selectively favored.

The controversy over evolutionary mechanism is perhaps most evident with respect to nef, a pleiotropic gene that encodes a transactivating factor (p27), and which may reduce or increase viral replication depending on cell type (WELKER et al. 1996 Down; LEVY 1998 Down). Deletions in nef have been shown to lessen the pathogenic effects of HIV both in monkeys (KESTLER et al. 1991 Down) and perhaps in humans (KIRCHHOFF et al. 1995 Down) and nef-deleted HIV strains have been utilized as vaccine candidates (for a review see LEVY 1998 Down). Evidence that nef might be subject to positive selection comes from the demonstration, within a single patient, of a CTL response followed by escape mutants in a HLA B8 epitope (PRICE et al. 1997 Down). Significantly, this sequence also showed a higher rate of nonsynonymous (dN) to synonymous (dS) substitution per site, an observation that is often given as evidence for positive selection (SHARP 1997 Down). In contrast, a longitudinal study of nef gene evolution in another individual was claimed to provide no evidence for adaptive evolution as dN/dS ratios near 1.0 were observed (PLIKAT et al. 1997 Down). Although this was taken to mean that all amino acid changes are neutral and therefore fixed by genetic drift alone, in reality it seems more likely that dN/dS = 1.0 in this case reflects the interplay of both positive and negative selection pressures (HOLMES and ZANOTTO 1998 Down). An analysis of nef gene sequences from different subtypes of HIV-1 likewise provided no evidence for positive selection (with dN < dS) at epitopes for CTLs, T-helper cells, or monoclonal antibodies, with constraints against amino acid change particularly strong within CTL epitopes (DA SILVA and HUGHES 1998 Down).

Such contrasting observations highlight the need to undertake more detailed investigations of the evolutionary mechanisms shaping genetic diversity in nef. In particular, because the commonly used pairwise methods for estimating dN and dS do not take full account of the genealogical information in data, and so are liable to nonindependence and pseudoreplication, it is important to test theories of evolutionary mechanism using an explicitly phylogenetic approach. Equally, it will be of value to consider genetic variation in nef in population genetic terms, as estimates of the rate of allele fixation and the selection coefficient of any favorable allele will be important given the possible use of nef in future HIV vaccines.

Herein we present a detailed examination of the evolutionary processes acting on the nef gene of HIV-1. We will first analyze the data set of PLIKAT et al. 1997 Down, as well as more divergent HIV isolates, using a genealogical and likelihood-based approach for detecting positive selection (NIELSEN and YANG 1998 Down). We will then develop a cladistic model of allele substitution under positive selection from which we are able to estimate important parameters of nef gene evolution.


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Patient material and primary data:
The analysis described in this article used four complete nef gene data sets. The first comprised 48 sequences from a hemophiliac infected by a contaminated batch of factor IX (PLIKAT et al. 1997 Down). These sequences were obtained from proviral DNA amplified by PCR followed by cloning. Sequences were obtained from three time points—11, 25, and 41 mo postinfection—with CD4+ cell counts of 1204, 922, and 912, respectively. The patient received no antiviral therapy during this time. A single sequence showing a frameshift (isolate 25U52490) was excluded from the analysis. This resulted in a total of 47 nonidentical sequences of 618 bp: 15 sequences from time point 11, 16 from time point 25, and 16 from time point 41. Accession numbers for each sequence can be found in the original publication.

To examine the evolutionary process among more divergent HIV-1 isolates, three other sets of nef sequences were analyzed. The first contained 39 sequences of subtype B (606 bp), a viral clade of mainly European and North American origin. The second data set comprised 10 sequences (621 bp) from the larger M (main) group, thereby incorporating more divergent viral sequences from varied geographical origins—in this case three sequences from subtype A, three from B, three from D, and one from U (unassigned). The final data set contained 11 sequences (585 bp), including 9 group M sequences, one group O (outlier) sequence, and the nef gene sequence from a chimpanzee virus (SIVCPZ), and so covering the deepest parts of the HIV-1 tree. All these sequences were collected from the 1997 release of the Los Alamos HIV database (KORBER et al. 1997B Down).

Sequence alignment and phylogenetic analysis:
All four nef data sets were aligned by hand and checked using the CLUSTALW program (THOMPSON et al. 1994 Down). Alignments are available from the authors on request.

The phylogenetic relationships among sequences from each of these four data sets (in the hemophiliac patient the data from each time point were analyzed separately and in combination) were then reconstructed using a maximum likelihood method. The HKY85 model of nucleotide substitution was used in all cases with optimal values for the transition to transversion ratio and the shape parameter ({alpha}) of a gamma distribution of rate variation among sites (with eight categories), both determined during tree reconstruction. These parameter values are given in Table 1 and Table 2. Finally, to determine the level of support for each node, 1000 bootstrap resamplings of the data were generated on neighbor-joining trees, although utilizing the maximum likelihood substitution model. All analyses were performed using the 4.0d64 test version of PAUP* kindly provided by David L. Swofford.


 
View this table:
[in this window]
[in a new window]

 
Table 1. Maximum likelihood estimates of selection pressures on HIV-1 nef sequences within a single hemophiliac patient


 
View this table:
[in this window]
[in a new window]

 
Table 2. Maximum likelihood estimates of selection pressures on HIV-1 nef sequences from different subtypes and groups

Analysis of selection pressures:
Three maximum likelihood models were used to analyze the evolutionary processes acting on nef, all of which utilize gene genealogies and consider the codon, instead of the nucleotide, as the unit of evolution. The first, "invariant" model (GOLDMAN and YANG 1994 Down) assumes that all codons fall into a single category of sites with a fixed value of dN/dS—the parameter {omega}. The second "neutral" model allows two categories of sites (NIELSEN and YANG 1998 Down). The first category represents strictly neutral sites (p1) that have a fixed dN/dS value ({omega}1) of 1.0, while the second category (p2) denotes sites where nonsynonymous changes are deleterious and so removed by negative selection, so that {omega}2 is zero. The third "positive selection" model incorporates an additional category of positively selected sites (p3) at which {omega}3 can be >1, in which case nonsynonymous substitutions have higher rates of fixation than synonymous substitutions (NIELSEN and YANG 1998 Down). Additionally, individual positively selected sites can be identified by their posterior probabilities of belonging to the category of sites with {omega}3 > 1 using an empirical Bayesian approach: the higher the posterior probability, the more likely that a site is under positive selection. Likewise, sites belonging to the invariant or neutral categories can also be detected using posterior probabilities. All these analyses were performed using the CODEML program from the PAML package (YANG 1997 Down).

The results of this genealogy-based analysis of selection pressures were compared to those obtained using the pairwise method of NEI and GOJOBORI 1986 Down, as implemented in the MEGA sequence analysis package (KUMAR et al. 1993 Down). dN/dS values for individual codons (calculated as the mean of all pairwise comparisons) were estimated using the SNAP program (available at http://hiv-web.lanl.gov/SNAP/WEBSNAP/SNAP.html).

Using genealogies to represent the process of allele substitution:
The process of allele substitution has an explicit phylogenetic representation. Specifically, we can assume that changes on external branches of a gene genealogy (autapomorphies) are evolutionary novelties: alleles not fixed in the population. Conversely, changes on internal branches of the genealogy (synapomorphies) represent alleles that are present in a larger (monophyletic) group of descendants. In general, therefore, the higher the frequency of an allele in a population, the deeper it will be located in the genealogy. Of most interest for our within-patient HIV sequence data are the synapomorphic changes located on the internal branches that separate each time point because these represent alleles that may have been fixed between the sampling events.

Given this framework we can study the substitution process simply by determining the most parsimonious reconstructions (MPRs) for each branch of the maximum likelihood tree linking all time points. This analysis was performed using MacClade (version 3.0, MADDISON and MADDISON 1992 Down). Crucially, we can also determine whether the synapomorphic changes between time points are selectively advantageous by asking whether they reside in codons previously identified as having a high posterior probability of being positively selected. It is important to note in this context that although they use the same input tree, the MPRs and the posterior probability estimations use different optimality criteria and therefore are independent analyses.


*  RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Maximum likelihood analysis of positive selection in the HIV-1 nef gene:
We first reconstructed maximum likelihood trees for samples from within the hemophiliac patient, taking each time point separately and in combination. Three codon-based maximum likelihood models were then applied to see which provided the best fit to these data. Since the positive selection model has two more parameters than the neutral model, the models are nested and their likelihoods can be compared directly using a {chi}2-test with d.f. = 2. As can be seen in Table 1, the positive selection model has a better fit to the data at 25 mo (P < 0.001), with 20.9% falling into the selected category ({omega}3 = 8.126). Although positive selection was not significantly favored at 41 mo postinfection (0.1 > P > 0.05), a high value of {omega}3 (2.671) was obtained for 22.7% of the sites. There was no evidence for positive selection at 11 mo. When successive data points were combined (i.e., 11 plus 25 mo and 25 plus 41 mo) the positive selection model was significantly favored over both competing models, although with fewer positively selected sites. Those sites with high posterior probabilities of being positively selected were also determined and are plotted for the two sets of successive time points in Figure 1. A total 17 substitutions fell into this class and it is interesting that some new positively selected sites appear in the 25- plus 41-mo comparison, most notably a cluster of three at the 5' part of the sequence. No sites with >=90% posterior probability of evolving neutrally were identified in either comparison.




View larger version (36K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1. Location of positively selected substitutions (with >=90% posterior probability) within the nef gene of a hemophiliac patient. (a) 11- and 25-mo time points combined and (b) 25- and 41-mo time points combined. Sites estimated by the maximum likelihood analysis as evolving under positive selection are shown by bars with their posterior probabilities on the Y axis and the consensus Nef protein sequence for this patient along the X axis. No neutrally evolving sites with >=90% posterior probability were identified. Those synapomorphic changes that separate each time point are indicated by arrows (see Figure 5).

The positive selection model also had the highest likelihood for all three time points combined, being much better than the neutral model (P < 0.001), although only 8.8% of sites belonged to the p3 category with high dN/dS ({omega}3 = 6.144). Since the Goldman and Yang constant dN/dS model is also a special case of the positive selection model with p1 = p2 = 0 and P3 = 1, twice the difference in likelihood between these two models (d.f. = 2) also constitutes a valid test statistic. For all data combinations the positive selection model gave high {chi}2 values when tested against the Goldman and Yang model (Table 1).

To determine whether positive selection can be detected at greater evolutionary distances, three more nef data sets were examined (Table 2). For the 39 subtype B sequences the positive selection model provided a significantly better fit to the data than both competing models (P < 0.001), although only 11.40% of codons were selectively favored ({omega}3 = 4.706). The positive selection model also outperformed both the Goldman and Yang and the neutral models in the analysis of the 10 group M sequences and the 11 group M, O, and chimpanzee viruses (P < 0.001 in both cases). However, because the optimal values for {omega}3 were both <1, we cannot formally demonstrate positive selection at these deeper phylogenetic levels.

For the subtype B data we also recorded the locations of those codons with a high posterior probability (>=90%) of being positively selected (Figure 2). A total of 22 positively selected codons were identified, 15 of which (68%) were located within known CTL epitopes. Of the seven remaining sites, two were found within targets of monoclonal antibodies and four represent contiguous amino acids, from positions 8 to 11, suggesting that this region may contain an as-yet-undescribed epitope. Intriguingly, positively selected substitutions at positions 8 and 9 were also identified in the hemophiliac patient, although no information is available regarding human leukocyte antigen (HLA) type of this individual.



View larger version (25K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 2. Location of positively selected substitutions in 39 HIV-1 subtype B sequences. Bars represent the probability of substitutions under positive selection (with >=90% posterior probability) and the amino acid sequence of the SF2 isolate was used as the prototype sequence for comparison (X axis). The continuous line beneath the plot depicts the location of known CTL epitopes from different HLA types, while dashed lines show targets of monoclonal antibodies, and dotted lines represent epitopes of T helper cells. Epitope information was taken from the Los Alamos HIV molecular immunology database (KORBER et al. 1997A Down).

Pairwise methods do not detect positive selection in nef:
No positive selection was detected in the hemophiliac patient when sequences were analyzed using the pairwise method of NEI and GOJOBORI 1986 Down. No statistically significant deviation from strict neutrality (i.e., dN/dS = 1.0 at the 95% confidence interval) was observed at 11 mo (dN/dS = 0.60, t = 1.027, P > 0.1), 25 mo (dN/dS = 0.70, t = 0.775, P > 0.1), or 41 mo postinfection (dN/dS = 0.50, t = 1.1575, P > 0.1). Likewise, selection was not detected when dN/dS was calculated for the 11- and 25-mo time points combined (dN/dS = 0.35, t = 0.92, P > 0.1), the 25- and 41-mo third time points combined (dN/dS = 0.59, t = 1.290, 0.1 > P > 0.05), nor for all time points combined (dN/dS = 0.52, t = 1.027, P > 0.1). Therefore, simple pairwise estimations of dN/dS provide no evidence for positive selection in these data, a conclusion also reached by PLIKAT et al. 1997 Down. The Nei and Gojobori analysis likewise provided no evidence for adaptive evolution among the subtype B sequences, although significantly greater values of dS over dN were observed (dN/dS = 0.60, t = 4.877, P < 0.001), as was the case for the group M sequences (dN/dS = 0.28, t = 7.361, P < 0.001) and those sequences from groups M, O, and chimpanzee combined (dN/dS = 0.22, t = 8.9095, P < 0.001).

To investigate the discrepancy between the genealogical and pairwise methods in more detail we first subtracted dS from dN for each pairwise comparison estimated under the Nei-Gojobori method (Figure 3). Those pairwise comparisons suggesting positive selection (i.e., dN > dS) fall to the right of the vertical line on each histogram that delineates dN - dS = 0. Although, for all time points, the distributions have a mean of dN < dS (but very near zero), 31.66%, 38.97%, and 16.19% of pairwise comparisons fell in the positive rank for the 11-, 25-, and 41-mo time points, respectively. Thus, sequence comparisons with dN > dS are present in the data but are lost, such that positive selection is rejected with a t-test, when an average of all pairwise comparisons is taken.



View larger version (35K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 3. Histograms showing the values of dN - dS for each pairwise comparison in the hemophiliac patient. Although the mean values for each time point are negative (i.e., dN < dS), many of the individual pairwise comparisons provide estimates in which dN > dS, so that they are assigned a positive rank and fall on the right-hand side of the histogram. SD, standard deviation.

Next, we compared dN/dS values along the nef gene sequence using a sliding window of 20 codons, incremented 1 codon at time. Although this analysis revealed some regions where dN > dS, particularly in the 3' part of the sequence (Figure 4), the majority of the positively selected sites identified in the maximum likelihood analysis were not detected. Even more striking is the extreme variation in dS, with some instances of dN > dS clearly due to regional reductions in dS, rather than elevations in dN. If, instead, cases are recorded in which dN is greater than the mean value of dS, two regions appear to be positively selected: the first nine codons of the sequence and codon 169, both of which were contained within the positively selected class in the maximum likelihood analysis. However, all other positively selected sites were not detected and there is no longer any evidence for positive selection in the extreme 3' region of the gene.



View larger version (16K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 4. Sliding window analysis of the numbers of synonymous (dS) and nonsynonymous (dN) substitutions per site, per codon along the nef gene sequence of the hemophiliac patient. Numbers (Y axis) are calculated as the mean of all pairwise comparisons. Mean dS and dN values across all codons are shown as hatched lines.

Reconstructing allele substitutions in the nef gene:
A parsimony method was next used to reconstruct the unambiguous amino acid and nucleotide changes along each branch of the maximum likelihood tree for all three time points combined (Figure 5). A very similar phylogeny was found using maximum parsimony as the initial optimality criterion.



View larger version (16K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 5. Maximum likelihood phylogenetic tree of 30 mo of nef gene evolution in the hemophiliac patient. Branch lengths are proportional to the amount of change at the DNA level, with the most parsimonious reconstruction (MPR) of unambiguous synapomorphic amino acid changes, and corresponding nucleotide substitutions, shown next to the branches leading to the 25-mo and 41-mo time points (sequence coordinates refer to amino acids). The number of bootstrap replications supporting the separation of these time points is also shown. All three synapomorphic changes had high posterior probabilities of being under positive selection (see Figure 1).

Although the sequences from each time point do not form monophyletic groups, because those viruses present at 41 mo appear to be derived from a subset present at 25 mo, the tree is striking in that it clearly depicts a replacement of lineages through time as might be expected under natural selection, a pattern that received good bootstrap support. One silent and one amino acid change (a Arg to Lys substitution at position 105) were reconstructed on the lineage leading to the 25-mo time point, the latter of which had a very high probability (0.9564) of being positively selected. Likewise, one silent and two amino acid substitutions were reconstructed on the branch leading to the 41-mo time point and again both amino acid changes (at positions 8 and 9) had very high probabilities of being under positive selection.

Eleven more substitutions with a high probability of being positively selected were found to be synapomorphic for clusters of sequences within each time point, indicating that they represent mutations that are not yet fixed in the population or that had only a transient advantage. The remaining three putative positively selected changes were autapomorphic, which could also signify recently evolved or transiently advantageous alleles, or even recent deleterious mutations that have yet to be removed by selection (FU and LI 1993 Down).

Population genetic analysis of synapomorphic changes:
Additional evidence for positive selection came from an analysis of various population parameters associated with allele substitution. For each time point within the hemophiliac patient, genetic diversity was quantified as {theta} (2Neµ), estimated using the Metropolis-Hastings Monte Carlo algorithm on ultrametric trees of the data (program FLUCTUATE, version 1.1, KUHNER et al. 1995 Down). These {theta} values—0.023587, 0.0624215, and 0.0277238—for time points 11, 25, and 41 mo postinfection, respectively, were then used to estimate values for the effective population size (Ne) of HIV-1, assuming substitution rates (µ) from 2.3 x 10-5 to 7.0 x 10-6 per genome replication (TEMIN 1993 Down). This resulted in Ne estimates of 513–1685, 1357–4459, and 602–1980, for time points 11, 25, and 41 mo, respectively. Similarly low estimates of Ne have been obtained for other HIV-1-infected patients (LEIGH BROWN 1997 Down).

Under neutrality, the time for a mutation to become fixed by genetic drift in a haploid population on average should be 2Ne generations, so that the expected times to fixation, given our range of Ne estimates, would be 1026–3370, 2714–8918, and 1204–3960 generations. Assuming a generation time for HIV-1 of ~2.6 days (PERELSON et al. 1996 Down) and considering the lowest value of Ne estimated (and hence the fastest fixation time), the synapomorphies we reconstruct on average would need ~89 mo (7.5 yr) to reach fixation by drift alone. That the synapomorphic changes we observed seem to be fixed far more quickly than this—our entire window of observation was only 30 mo—suggests that this substitution process was driven by positive selection.

It is also possible to estimate the selection coefficient (s) of the synapomorphies, assuming that advantageous substitutions in a haploid population reach fixation in ~(2/s)Log e(Ne) generations, although with a large variance (NEI 1987 Down). If we assume that the two selected synapomorphies at 41 mo first appeared at 25 mo, then they reached fixation in 16 mo, which, conservatively assuming that Ne takes the lower value for this period (602), would make s = 0.069. Of course, if any of these synapomorphies were present before they were first sampled, then lower values of s would be obtained, although it is equally likely that these synapomorphic changes first appeared more recently than we assume (i.e., at later intervals along the branches linking time points), in which case selection coefficients would increase. The synapomorphy at 25 mo is slightly harder to interpret because, although it is reconstructed as occurring on the lineage leading to the 25-mo sample, it is not present in all isolates from this time point because there appears to have been a reversion to Arg in some sequences. However, it is found in all sequences at 41 mo, so we may assume it has been fixed by this time. We can therefore conservatively estimate that this mutant took 30 mo to reach fixation, which would mean s = 0.036, assuming the lowest value of Ne calculated during this time (513).


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Positive selection on nef genes:
Our genealogical study of HIV-1 nef gene evolution within and among patients has revealed an important role for positive selection, with high dN/dS values at some codons. Within the hemophiliac patient some of these selected codons were also found to be synapomorphic for samples taken from successive time points and thus fall along the "backbone" of the tree, itself strong evidence that they represent the successful (fixed) alleles from which all other mutants are derived. A similar finding comes from the analysis of strains of influenza A virus collected over many years (and representing many epidemics) where those sites under positive selection are likewise found at antigenically important residues and are located on the main trunk of the tree (FITCH et al. 1991 Down). Finally, the location of the amino acid sites under positive selection in the hemophiliac patient changes with time, consistent with the notion that the immune system may shift its attention among epitopes following the appearance of escape mutants (NOWAK et al. 1995 Down), a process that has been observed in nef (PRICE et al. 1997 Down).

Taken together we believe that these observations represent compelling evidence for the immune-driven positive selection of nucleotide substitutions in nef. A possible alternative explanation is that our "positively selected" substitutions are in fact nearly neutral and fixed by genetic drift when the viral population is small (OHTA 1992 Down). However, the rate of fixation of nearly neutral mutations under drift is the same as that of strictly neutral changes, and we have already shown that this rate is too low for the substitution dynamics we observe.

Limitations of pairwise methods:
Our study also indicates that genealogy-based methods provide a much more sensitive description of selection pressures than those using multiple pairwise comparisons, even when sliding windows are incorporated. Although the various methods for estimating dS and dN based on pairwise comparisons are useful when the sites under positive selection are known a priori (HUGHES and NEI 1988 Down; ZANOTTO et al. 1995 Down; YOKOHAMA and YOKOHAMA 1996 Down) or when there is an overwhelming excess of nonsynonymous changes in particular regions (BONHOEFFER et al. 1995 Down; PRICE et al. 1997 Down; KARLSSON et al. 1998 Down), all are limited by their oversampling of distances associated with deeper branches, the movement of sites between the synonymous and nonsynonymous categories, and the fact that estimations of the two rates are not independent (MUSE 1996 Down). Furthermore, the pairwise methods currently available assume a constant selection pressure among sites and so tend to underestimate nonsynonymous rates (NIELSEN 1997 Down). Finally, although the Nei and Gojobori method works well, given low levels of sequence divergence and equal rates of substitution among bases (MUSE 1996 Down), this evidently limits its applicability to a rapidly evolving organism with a very biased substitution process like HIV-1. As a recent case in point, pairwise comparisons of dN/dS failed to detect positive selection in HIV-1 sequences that were obviously selected for antiviral resistance (CRANDALL et al. 1999 Down).

Our study further confirms that adaptive evolution often occurs at a small number of residues in a polypeptide, in this case most likely CTL epitopes. As a consequence, methods that take average dN/dS values among many sites are necessarily coarse and may miss evidence for very localized selection pressure (SHARP 1997 Down). For example, ENDO et al. 1996 Down considered cases in which dN > dS in 50% of pairwise comparisons to provide good evidence for positive selection, yet the nef gene sequences analyzed here would have clearly been excluded under this criterion, as would some other notable cases of adaptive evolution such as primate lysozymes (MESSIER and STEWART 1997 Down). The camouflaging of positively selected sites will obviously be most acute when more divergent sequences are compared, and even in our maximum likelihood analysis dN/dS decayed through time such that, although the "positive selection" codon evolution model had the highest likelihood in comparisons among HIV-1 group M sequences, the increase in the number of silent changes meant that no codons in which dN > dS could be identified. A lack of sensitivity is also a limitation for sliding window analyses, which, while providing more evidence for positive selection in nef, still failed to detect the majority of the positively selected changes. Furthermore, there are no objective criteria by which to choose either the window or increment sizes and in our study different analyses based on the sliding window gave different interpretations of which sites might be selected. Future studies of positive selection will evidently be most fruitful if they consider closely related sequences where the footprint of adaptive evolution may still be uncovered and if they utilize analytical methods that take account of the phylogenetic relationships of the sequences in question. Ultimately such methods should also be able to recognize the selective advantage conferred by individual mutations.

Cladistic representation of the substitution of nef alleles:
The clear phylogenetic separation of nef sequences from the three time points in the hemophiliac patient was instrumental in our study of evolutionary processes as it allowed us to estimate a number of important population parameters. For example, the synapomorphic changes at 41 mo, if they first appeared at 25 mo, took no more than about 185 generations to reach fixation, some 5.5 times faster than expected under neutrality, given the lowest values of Ne estimated. Even if fixation took the entire 30 mo of the sampling period this substitution process is still 3.0 times faster than the neutral expectation. Likewise, these fixed substitutions have very high selection coefficients, with s at least 0.036 under the most conservative assumptions, and are greater than those estimated for wild-type reverse transcriptase alleles in the absence of treatment with the drug AZT (s = 0.004 to 0.023; GOUDSMIT et al. 1996 Down) and for balancing selection at loci of the human major histocompatibility complex (s = 0.0007 to 0.042; SATTA et al. 1994 Down).

Of course, the estimates of Ne (and hence s) that we present assume neutrality and we have shown here that positive selection has acted on these sequences. However, our point is that even with low values of Ne many more generations than observed are required to explain the rapid substitution of nef alleles by drift alone. Furthermore, larger values of Ne would increase values of s so that the selection coefficients we present are likely to represent lower bounds. Finally, if Ne really is as low as we estimate then our analysis suggests that this is due to the purging action of selectively driven population bottlenecks, rather than high variation in the number of viral progeny produced by infected cells (LEIGH BROWN 1997 Down).

One questionable assumption we do make is that the synapomorphic changes for each time point have truly undergone fixation during the period of sampling, especially since the viral population within hosts may be partitioned by tissue type [although this is debated—see, for example, DELWART et al. 1998 Down]. However, given the enormous census population size of HIV-1, with some 1010 virions produced each day (PERELSON et al. 1996 Down), the fact that all sequences at the 41-mo time point had these synapomorphic substitutions at least argues for their high frequency in the population. Furthermore, each synapomorphy has a high posterior probability of being subject to positive selection. It therefore remains to be seen how different frequencies of these mutations would affect our reconstruction of their substitution dynamics.

Using selection analysis to locate epitopes:
The identification of CTL epitopes is essential if we are to better characterize the cellular response to viral infection. This task, however, is complex. Mathematical models have been used to predict the likelihood of putative CTL peptide sequences within different viral proteins, applying scores before screening with CTL assays using 51Cr (FALK et al. 1991 Down). Novel approaches using ELISPOT or peptide-loaded HLA-tetramers in flow cytometry are potentially easier, although more accurate and flexible methods for the identification of candidate peptides are still desirable. We suggest that examining those amino acid sites under positive selection may be a useful way to identify possible epitope regions, as many of the positively selected sites we detect correspond to CTL epitopes or highlight regions where others may reside. Not only may such an evolutionary approach shorten the time, labor, and cost of these studies, but it may ultimately assist our understanding of the immuno-pathogenesis of AIDS and other infectious diseases.


*  ACKNOWLEDGMENTS

We thank Rasmus Nielsen, Ziheng Yang, Yun-Xin Fu, and Takashi Gojobori for their suggestions and comments. Two anonymous referees also made useful suggestions concerning an earlier version of this manuscript. P.M.A.Z. was funded by a Conselho Nacional de Pesquisa (CNPq) productivity grant (300188/98-6) and by Programa Nacional de Excelência (PRONEX) grant 139/96. E.C.H. was funded by The Royal Society (U.K.) and The Wellcome Trust. R.F.S. was funded by Coordenação de Aperfeiçoamento de Pesquisa e Ensino Superior (CAPES) and E.G.K. by PRONEX grant 139/96.

Manuscript received March 16, 1999; Accepted for publication July 26, 1999.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

BENNETT, S. R., F. R. CARBONE, F. KARAMALIS, R. A. FLAVELL, and J. F. MILLER et al., 1998  Help for cytotoxic-T-cell responses is mediated by CD40 signaling. Nature 393:478-480[Medline].

BONHOEFFER, S., E. C. HOLMES, and M. A. NOWAK, 1995  Causes of HIV diversity. Nature 376:125[Medline].

BORROW, P., H. LEWIKI, X. WEI, M. S. HORWITZ, and N. PEFFER et al., 1997  Antiviral pressure exerted by HIV-1 specific cytotoxic T lymphocytes (CTLs) during primary infection demonstrated by rapid selection of CTL escape virus. Nat. Med. 3:205-211[Medline].

CRANDALL, K. A., C. R. KELSEY, H. IMAMICHI, H. C. LANE, and N. P. SALZMAN, 1999  Parallel evolution of drug resistance in HIV: failure of nonsynonymous/synonymous substitution rate ratio to detect selection. Mol. Biol. Evol. 16:372-382[Abstract].

DA SILVA, J. and A. L. HUGHES, 1998  Conservation of cytotoxic T lymphocyte (CTL) epitopes as a host strategy to constrain parasite adaptation: evidence from the nef gene of human immunodeficiency virus 1 (HIV-1). Mol. Biol. Evol. 15:1259-1268[Abstract].

DELWART, E. L., J. I. MULLINS, P. GUPTA, G. H. LEARN, JR., and M. HOLODNIY et al., 1998  Human immunodeficiency virus type 1 populations in blood and semen. J. Virol. 72:617-623[Abstract/Free Full Text].

ENDO, T., K. IKEO, and T. GOJOBORI, 1996  Large-scale search for genes on which positive selection may operate. Mol. Biol. Evol. 13:685-690[Abstract].

FALK, K., O. ROTZSCHKE, S. STEVANOVIC, G. JUNG, and H. G. RAMMENSEE, 1991  Allele-specific motifs revealed by sequencing of self-peptides eluted from MHC molecules. Nature 351:290-296[Medline].

FITCH, W. M., J. M. E. LEITER, X. LI, and P. PALESE, 1991  Positive Darwinian evolution in human influenza A viruses. Proc. Natl. Acad. Sci. USA 88:4270-4274[Abstract/Free Full Text].

FU, Y. X. and W.-H. LI, 1993  Statistical tests of neutrality of mutations. Genetics 133:693-709[Abstract].

GOLDMAN, N. and Z. YANG, 1994  A codon-based method of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11:725-736[Abstract].

GOUDSMIT, J., A. DE RONDE, D. D. HO, and A. S. PERELSON, 1996  Human immunodeficiency virus in vivo: calculations based on a single zidovudine resistance mutation at codon 215 of reverse transcriptase. J. Virol. 70:5662-5664[Abstract/Free Full Text].

GOULDER, P. J. R., R. E. PHILLIPS, R. A. COLBERT, S. MCADAM, and G. OGG et al., 1997  Late escape from an immunodominant cytotoxic T-lymphocyte response associated with progression to AIDS. Nat. Med. 3:212-217[Medline].

HOLMES, E. C. and P. M. DE A. ZANOTTO, 1998  Genetic drift of human immunodeficiency virus type 1? J. Virol. 72:886-887[Free Full Text].

HUGHES, A. L. and M. NEI, 1988  Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals positive selection. Nature 335:367-370.

KARLSSON, A. C., S. LINDBACK, H. GAINES, and A. SONNERBORG, 1998  Characterization of the viral population during primary HIV-1 infection. AIDS 12:839-847[Medline].

KESTLER, H. W., D. J. RINGLER, K. MORI, D. L. PANICALI, and P. K. SEHGAL et al., 1991  Importance of the nef gene for maintenance of high virus loads and for development of AIDS. Cell 65:651-662[Medline].

KIRCHHOFF, F., T. C. GREENOUGH, D. B. BRETTLER, J. L. SULLIVAN, and R. C. DESROSIERS, 1995  Absence of intact nef sequences in a long-term survivor with nonprogressive HIV-1 infection. N. Engl. J. Med. 332:228-232[Free Full Text].

KOENIG, S., A. J. CONLEY, Y. A. BREWAH, G. M. JONES, and S. LEATH et al., 1995  Transfer of HIV-1 specific cytotoxic T lymphocytes to an AIDS patient leads to selection for mutant HIV variants and subsequent disease progression. Nat. Med. 1:330-336[Medline].

KORBER, B., C. BRANDER, B. F. HAYNES, J. P. MOORE, R. KOUP et al., 1997a HIV Molecular Immunology Database 1997. Los Alamos National Laboratory, Los Alamos, NM.

KORBER, B., B. FOLEY, T. LEITNER, F. MCCUTCHAN, B. HAHN et al., 1997b Human Retrovirus and AIDS 1997. Los Alamos National Laboratory, Los Alamos, NM.

KUHNER, M. K., J. YAMATO, and J. FELSENSTEIN, 1995  Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. Genetics 140:1421-1430[Abstract].

KUMAR, S., K. TAMURA and M. NEI, 1993 MEGA: Molecular Evolutionary Genetics Analysis, version 1.01. The Pennsylvania State University, University Park, PA.

LEIGH BROWN, A. J., 1997  Analysis of HIV-1 env gene sequences reveals evidence for a low effective number in the viral population. Proc. Natl. Acad. Sci. USA 94:1862-1865[Abstract/Free Full Text].

LEIGH BROWN, A. J. and D. D. RICHMAN, 1997  HIV-1: gambling on the evolution of drug resistance? Nat. Med. 3:268-271[Medline].

LEVY, J. A., 1998 HIV and the Pathogenesis of AIDS, Ed. 2. ASM Press, Washington, DC.

MADDISON, W. P., and D. R. MADDISON, 1992 MacClade: Analysis of Phylogeny and Character Evolution. Version 3.0. Sinauer Associates, Sunderland, MA.

MCMICHAEL, A., 1998  T cell responses and viral escape. Cell 93:673-676[Medline].

MCMICHAEL, A. J. and R. E. PHILLIPS, 1997  Escape of human immunodeficiency virus from immune control. Annu. Rev. Immunol. 15:271-296[Medline].

MESSIER, W. and C.-B. STEWART, 1997  Episodic adaptive evolution of primate lysozymes. Nature 385:151-154[Medline].

MUSE, S. V., 1996  Estimating synonymous and nonsynonymous substitution rates. Mol. Biol. Evol. 13:105-114[Abstract].

MUSEY, L., Y. HU, L. ECKERT, M. CHRISTENSEN, and T. KARCHMER et al., 1997  HIV-1 induces cytotoxic T lymphocytes in the cervix of infected women. J. Exp. Med. 185:293-303[Abstract/Free Full Text].

NEI, M., 1987 Molecular Evolutionary Genetics. Columbia University Press, New York.

NEI, M. and T. GOJOBORI, 1986  Simple methods for estimating the number of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418-426[Abstract].

NIELSEN, R., 1997  The ratio of replacement to silent divergence and tests of neutrality. J. Evol. Biol. 10:217-231.

NIELSEN, R. and Z. YANG, 1998  Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148:929-936[Abstract/Free Full Text].

NOWAK, M. A., R. M. MAY, R. E. PHILLIPS, S. ROWLAND-JONES, and D. G. LALLOO et al., 1995  Antigenic oscillations and shifting immunodominance in HIV-1 infections. Nature 375:606-611[Medline].

NOWAK, M. A., R. M. ANDERSON, M. C. BOERLIJST, S. BONHOEFFER, and R. M. MAY et al., 1996  HIV-1 evolution and disease progression. Science 274:1008-1010[Medline].

OGG, G. S., X. JIN, S. BONHOEFFER, P. R. DUNBAR, and M. A. NOWAK et al., 1998  Quantitation of HIV-1 specific cytotoxic T lymphocytes and plasma load of viral RNA. Science 279:2103-2106[Abstract/Free Full Text].

OHTA, T., 1992  The nearly neutral theory of molecular evolution. Annu. Rev. Ecol. Syst. 23:263-286.

PERELSON, A. S., A. U. NEUMANN, M. MARKOWITZ, J. M. LEONARD, and D. D. HO, 1996  HIV-1 dynamics in vivo: virion clearance rate, infected cell life-span, and viral generation time. Science 271:1582-1586[Abstract].

PLIKAT, U., K. NIESELT-STRUWE, and A. MEYERHANS, 1997  Genetic drift can determine short-term human immunodeficiency virus type 1 nef quasispecies evolution in vivo. J. Virol. 71:4233-4240[Abstract].

PRICE, D. A., P. J. R. GOULDER, P. KLENERMAN, A. K. SEWELL, and P. J. EASTERBROOK et al., 1997  Positive selection of HIV-1 cytotoxic T lymphocyte escape variants during primary infection. Proc. Natl. Acad. Sci. USA 94:1890-1895[Abstract/Free Full Text].

RIDGE, J. P., F. DI ROSA, and P. MATZINGER, 1998  A conditioned dendritic cell can be a temporal bridge between a CD4+ T-helper and a T-killer cell. Nature 393:474-478[Medline].

ROSENBERG, E. S., J. M. BILLINGSLEY, A. M. CALIENDO, S. L. BOSWELL, and P. E. SAX et al., 1997  Vigorous HIV-1-specific CD4+ T cell responses associated with control of viremia. Science 278:1447-1450[Abstract/Free Full Text].

SATTA, Y., C. O'HUIGIN, N. TAKAHATA, and J. KLEIN, 1994  Intensity of natural selection at the major histocompatibility complex loci. Proc. Natl. Acad. Sci. USA 91:7184-7188[Abstract/Free Full Text].

SCHMITZ, J. E., M. J. KURODA, S. SANTRA, V. G. SASSEVILLE, and M. A. SIMON et al., 1999  Control of viremia in simian immunodeficiency virus infection by CD8+ lymphocytes. Science 283:857-860[Abstract/Free Full Text].

SCHOENBERGER, S. P., R. E. TOES, E. I. VAN DER VOORT, R. OFFRINGA, and C. J. MELIEF, 1998  T-cell help for cytotoxic T lymphocytes is mediated by CD40-CD40L interactions. Nature 393:480-483[Medline].

SHARP, P. M., 1997  In search of molecular Darwinism. Nature 385:111-112[Medline].

TEMIN, H. M., 1993 The high rate of retrovirus variation results in rapid evolution, pp. 219–233, in Emerging Viruses, edited by S. S. MORSE. Oxford University Press, Oxford.

THOMPSON, J. D., D. G. HIGGINS, and T. J. GIBSON, 1994  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680[Abstract/Free Full Text].

WAIN-HOBSON, S., 1994 Is antigenic variation of HIV important for AIDS and what might be expected in the future? pp. 185–209, in The Evolutionary Biology of Viruses, edited by S. S. MORSE. Raven Press, New York.

WAIN-HOBSON, S., 1996  Running the gamut of retroviral variation. Trends Microbiol. 4:135-141[Medline].

WELKER, R., H. KOTTLER, H. R. KALBITZER, and H.-G. KRÄUSSLISCH, 1996  Human immunodeficiency virus type 1 Nef protein is incorporated into virus particles and specifically cleaved by the viral proteinase. Virology 219:228-236[Medline].

YANG, Z., 1997 Phylogenetic Analysis by Maximum Likelihood (PAML), Version 1.4. Department of Integrative Biology, University of California, Berkeley.

YOKOHAMA, S. and R. YOKOHAMA, 1996  Adaptive evolution of photoreceptors and visual pigments in vertebrates. Annu. Rev. Ecol. Syst. 27:543-567.

ZANOTTO, P. M. DE A., G. F. GAO, T. GRITSUN, M. S. MARIN, and W. R. JIANG et al., 1995  An arbovirus cline across the Northern hemisphere. Virology 210:152-159[Medline].




This article has been cited by other articles:


Home page
BioinformaticsHome page
S. McCauley, S. de Groot, T. Mailund, and J. Hein
Annotation of selection strengths in viral genomes
Bioinformatics, November 15, 2007; 23(22): 2978 - 2986.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. de Groot, T. Mailund, and J. Hein
Comparative annotation of viral genomes with non-conserved gene structure
Bioinformatics, May 1, 2007; 23(9): 1080 - 1089.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
A. C.-C. Shih, T.-C. Hsiao, M.-S. Ho, and W.-H. Li
Simultaneous amino acid substitutions at antigenic sites drive influenza A hemagglutinin evolution
PNAS, April 10, 2007; 104(15): 6283 - 6288.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
C. Seoighe, F. Ketwaroo, V. Pillay, K. Scheffler, N. Wood, R. Duffet, M. Zvelebil, N. Martinson, J. McIntyre, L. Morris, et al.
A Model of Directional Selection Applied to the Evolution of Drug Resistance in HIV-1
Mol. Biol. Evol., April 1, 2007; 24(4): 1025 - 1031.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
Z. Yang, W. S.W. Wong, and R. Nielsen
Bayes Empirical Bayes Inference of Amino Acid Sites Under Positive Selection
Mol. Biol. Evol., April 1, 2005; 22(4): 1107 - 1118.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
S. Williamson, S. M. Perry, C. D. Bustamante, M. E. Orive, M. N. Stearns, and J. K. Kelly
A Statistical Characterization of Consistent Patterns of Human Immunodeficiency Virus Evolution Within Infected Patients
Mol. Biol. Evol., March 1, 2005; 22(3): 456 - 468.
[Abstract] [Full Text] [PDF]


Home page
J. Gen. Virol.Home page
J. Cristina, F. Lopez, G. Moratorio, L. Lopez, S. Vasquez, L. Garcia-Aguirre, and A. Chunga
Hepatitis C virus F protein sequence reveals a lack of functional constraints and a variable pattern of amino acid substitution
J. Gen. Virol., January 1, 2005; 86(1): 115 - 120.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
R. D. Emes, M. C. Riley, C. M. Laukaitis, L. Goodstadt, R. C. Karn, and C. P. Ponting
Comparative Evolutionary Genomics of Androgen-Binding Protein Genes
Genome Res., August 1, 2004; 14(8): 1516 - 1529.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
T. de Oliveira, M. Salemi, M. Gordon, A.-M. Vandamme, E. J. van Rensburg, S. Engelbrecht, H. M. Coovadia, and S. Cassol
Mapping Sites of Positive Selection and Amino Acid Diversification in the HIV Genome: An Alternative Approach to Vaccine Design?
Genetics, July 1, 2004; 167(3): 1047 - 1058.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
R. D. Emes, S. A. Beatson, C. P. Ponting, and L. Goodstadt
Evolution and Comparative Genomics of Odorant- and Pheromone-Associated Genes in Rodents
Genome Res., April 1, 2004; 14(4): 591 - 602.
[Abstract] [Full Text]