Progress in understanding the evolution of variation at the MHC has been slowed by an inability to assess the relative roles of mutation vs. intragenic recombination in contributing to observed polymorphism. Recent theoretical advances now permit a quantitative treatment of the problem, with the result that the amount of recombination is at least an order of magnitude greater than that of mutation in the history of class II genes. We suggest that this insight allows progress in evaluating the importance of other factors affecting the evolution of the MHC. We investigated the evolution of MHC class II Eβ sequence diversity in the genus Peromyscus. We find evidence for extensive recombination in the history of these sequences. Nevertheless, it appears that intragenic recombination alone is insufficient to account for evolution of MHC diversity in Peromyscus. Significant differences in silent variation among subgenera arose over a relatively short period of time, with little subsequent change. We argue that these observations are consistent with the effects of historical population bottleneck(s). Population restrictions may explain general features of MHC evolution, including the large amount of recombination in the history of MHC genes, because intragenic recombination may efficiently regenerate allelic polymorphism following a population constriction.
THE class I and II molecules of the major histocom-patibility complex (MHC) responsible for antigen presentation to T-cells are characteristically extremely polymorphic within populations. This variation is believed to be maintained by selection for resistance to pathogens evolving to avoid presentation by MHC molecules (and thereby escape detection by the host immune system; Doherty and Zinkernagel 1975). Theoretical studies have demonstrated the potential of such selection to maintain the extremely long persistence of polymorphic alleles at the MHC (Takahata and Nei 1990; Takahataet al. 1992). Empirical support includes evidence for diversifying selection targeted to sites of the peptide-binding region (PBR) for both class I and II genes, in numerous taxa (reviewed in Hughes and Yeager 1998). These sites collectively show a significant excess of nonsilent substitutions consistent with diversifying (or balancing) selection for altered antigen presentation.
The relative importance of different mechanisms responsible for generating extensive allelic polymorphism at the MHC has been a contentious issue. Although site variation must arise as de novo mutations, intragenic recombination can generate novel sequences with potentially altered antigen-binding properties. If intragenic recombination is sufficiently frequent, it may be the predominant source of allelic polymorphism (Satta 1997; Takahata and Satta 1998b). While many authors have argued for an important role for intragenic recombination at antigen presenting genes of the MHC (Gyllenstenet al. 1991; Bergstromet al. 1998), others have argued that mutation and selective convergence cannot be neglected (Takahata 1995). In the absence of agreement on the relative importance of recombination vs. mutation in generating allelic polymorphism at the MHC, different investigators have approached the problem of the evolution of MHC polymorphism with differing assumptions, ranging from the assumption that the effects of intragenic recombination are negligible, as in the case of genealogical analyses of trans-specific sharing of allelic polymorphism (Kleinet al. 1993), to the view that recombination is capable of extensive homogenization of MHC variation (Bergstromet al. 1998; Mikkoet al. 1999).
Recent theoretical advances now permit a quantitative approach to the problem (Fearnhead and Donnelly 2001; McVeanet al. 2001; Hudson 2002), with the unambiguous result that the amount of recombination (per base pair) has been at least an order of magnitude greater than that of mutation in the history of class II genes (Richmanet al. 2003). This insight has broad implications for progress in understanding other factors affecting the evolution of the MHC, including demography. Despite the central role of effective population size in affecting genetic variation, quantifying its effects on genetic variation at the MHC has proved controversial (Takahata and Satta 1998a), again largely due to the unresolved debate over the relative importance of intragenic recombination. Although a number of studies have documented severe reductions in MHC polymorphism within species (Ellegrenet al. 1993; Mikko and Andersson 1995; Mikkoet al. 1999; Hedricket al. 2000; van der Waltet al. 2001), these relatively recent events shed little light on the evolutionary consequences of such reductions. By contrast, most comparative studies of MHC diversity in related taxa have found extensive shared polymorphism, suggesting that, in particular, demographic constrictions have been rare in the history of diversification of these taxa (Kleinet al. 1993; Edwardset al. 1997; Vinceket al. 1997). These inferences are clouded by the debate over the importance of intragenic recombination, which may greatly affect genealogical inferences (Schierupet al. 2001). To what extent have the conclusions of comparative studies been affected by their failure to take recombination into account?
Here we present an analysis of comparative data on MHC diversity and allelic polymorphism among a group of closely related rodent species in the genus Peromyscus. We find evidence for extensive recombination in the history of these sequences and show that apparent reciprocal monophyly of MHC polymorphism among subgenera is an artifact of intragenic recombination. However, it appears that intragenic recombination alone is not sufficient to explain the evolution of MHC diversity in Peromyscus. There are large and significant differences in the amount of silent variation among, but not within, subgenera that arose in a relatively short period of time preceding diversification within the subgenera, consistent with the effects of one or more historical population bottleneck(s). More generally, we propose that the interaction of demography and intragenic recombination may explain several puzzling observations on the evolution of MHC diversity, because the recovery of MHC polymorphism following a population bottleneck may differ greatly from the original process of allelic diversification.
Sample collections: Peromyscus maniculatus and P. eremicus were live trapped along a single 400-m transect at San Quintin, Baja California Norte, Mexico. Animals were anesthetized and killed, and spleen and liver tissue samples were removed and immediately frozen in liquid nitrogen. Tissue samples of P. leucopus and P. californicus were furnished by the Peromyscus Stock Center (http://stkctr.biol.sc.edu/).
Sequence acquisition: MHC class II β sequences were obtained by reverse transcriptase (RT)-PCR from spleen cDNAs, using locus-specific primers to amplify a sequence spanning most of exon 2 and including the entirety of the PBR. We have previously shown that our methods amplify single-locus variation in Peromyscus (Richmanet al. 2002). Sequence heterogeneity of amplification products was assessed using denaturing gradient gel electrophoresis (DGGE). PCR amplifications were reamplified as before, with the downstream primer modified by the 5′ addition of a GC-rich sequence called a GC clamp. The resulting amplifications were electrophoresced through an acrylamide gel along an increasing gradient of urea (40–60%), which acts as a denaturant. Double-stranded PCR products exhibit a sharp slowdown in their rate of migration as they undergo partial denaturation at a position along the gradient determined by their primary sequence, forming distinct bands. The different bands were excised from the gel, reamplified, and directly sequenced using the original amplification primers (Richmanet al. 2002). Sequences reported here are based on results of both forward and reverse sequencing reactions. In those instances where a sequence was recovered from a single individual in the sample, multiple (and therefore independent) isolations of each sequencing template were analyzed.
Estimates of recombination, mutation, and selection: Estimates of population recombination (ρ = 4Nr) from a set of aligned sequences were obtained using the composite-likelihood (CLE) method (Hudson 2002) implemented in LDhat (McVeanet al. 2001). The method estimates population recombination conditioned on the estimate of population mutation (θ = 4Nμ), obtained using the finite-series Watterson estimator. Rate variation among sites is not considered in the likelihood model used to estimate recombination, and the method therefore attempts to control for such variation by restricting analysis to those sites showing variation for two alternative alleles, where the frequency of the alternative allele is at least 0.20. Finally, the method-likelihood model used by LDhat to estimate ρ also does not include selection. The performance of LDhat with respect to estimating ρ for sequences evolving under balancing selection was evaluated by computer simulation (A. D. Richman, L.G. Herrera, D. Nash and M. L. Schierup, unpublished results). These simulation results indicate that LDhat does not yield biased estimates of ρ and θ even given the very large amounts of recombination and mutation inferred in the present study.
The amount of population selection (S) was estimated as (Takahataet al. 1992), where Kn is the average number of nonsynonymous substitutions at PBR sites, and γ is the intensity of selection, estimated as the ratio of the average nonsynonymous substitutions per nonsynonymous site to the number of synonymous substitutions per synonymous site (Kn/Ks) for recently diverged pairs of alleles. The latter estimate of γ is necessitated by extensive saturation of nonsynonymous sites in the PBR for highly divergent allelic comparisons. It was not possible to obtain estimates of S for P. californicus or P. leucopus due to limited samples that did not permit estimation of γ. Estimates of S are not strictly comparable to those presented by Takahata et al. (1992) because of differences in the amount of sequence data available used to estimate Ks.
Pairwise statistics: A hierarchical analysis of sequence variation within and among subgenera (see Figure 1) was performed using AMOVA, implemented in Arlequin (Schneideret al. 2001). The structuring of genetic variation within vs. among subgenera was assessed using F-statistics, which in this context measure the proportional reduction in variation due to hierarchical structure. The significance of variance components and their corresponding F-statistics was estimated by bootstrap resampling.
The proportional amounts of substitution at silent (ps) and nonsilent (pn) sites were estimated using the method of Nei and Gojobori (1986), implemented in MEGA2 (Kumaret al. 2001). The estimated numbers of silent and nonsilent substitutions were not corrected for multiple substitutions, because this requires additional assumptions regarding the evolution of these sequences, resulting in more uncertain estimates. Standard errors of ps and pn were obtained by bootstrap resampling, also implemented in MEGA2. In counting the number of silent and nonsilent substitutions between a pair of aligned coding sequences, the method of Nei and Gojobori (1986) considers all possible evolutionary pathways (excluding paths that include termination codons) leading from one codon to another. For some comparisons there are multiple possible evolutionary pathways, which were weighted equally. The evolutionary pathway method makes fewer assumptions than other methods and is therefore expected to provide conservative (minimum) estimates of numbers of substitutions.
Genealogical analyses: The (unrooted) genealogy for Eβ sequences was obtained using the minimum evolution method implemented in PAUP* (Swofford 1998), using pairwise distances estimated using the Hasegawa-Kishino-Yano (HKY) model of sequence evolution (Hasegawaet al. 1987). The genealogy using only synonymous substitutions was obtained using the program FITCH implemented in PHYLIP (Felsenstein 1996), using pairwise distances estimated using the method of Nei and Gojobori (1986).
Allelic polymorphism: A total of 14 class II Eβ alleles were identified in P. eremicus (20 individuals and 37 chromosomes examined), compared to 16 alleles in P. maniculatus (22 individuals and 41 chromosomes examined; Richmanet al. 2001). The relatively limited numbers of sequences obtained for P. californicus and P. leucopus (five and six, respectively) were obtained from individuals from captive breeding colonies. Inferred amino acid sequences for all allelic sequences analyzed in this study are shown in Figure 1. Nucleotide sequences were deposited in GenBank (accession nos. AF312748, AF312750–AF312764, and AY219801–AY219825).
Estimates of recombination, mutation, and selection: Estimates of population recombination (ρ) are large for all Peromyscus taxa, exceeding the maximum amount of recombination evaluated (Table 1). Other methods for detecting recombination find similarly abundant evidence for recombination in these sequences (Richmanet al. 2003). As expected, estimates of both ρ and θ greatly exceed estimates for nuclear genes not experiencing balancing selection (McVeanet al. 2001). More surprisingly, the estimate of ρ greatly exceeds that of θ for all taxa, indicating that the amount of recombination per base pair greatly exceeds that of mutation in the history of MHC variation.
Estimates of population selection (S) indicate as expected that selection is the most important factor in accounting for observed MHC diversity within species (Table 1). Estimates of the intensity of selection (γ) are similarly high in P. maniculatus and P. eremicus (5.7 and 5.9, respectively), indicating strong balancing selection. Differences in S are therefore largely due to differences in the average number of nonsynonymous substitutions at PBR sites (17.3 and 13.8, respectively, for P. maniculatus and P. eremicus).
Extensive recombination detected, inferred in our study, has implications for estimates of S, which does not include the effects of recombination. Although the estimate of γ is not sensitive to recombination (Satta 1997), the estimate of average number of pairwise differences between alleles does show a moderate decline with increasing recombination (Satta 1997), with the implication that estimates of S may be (somewhat) overestimated.
Hierarchical AMOVA: The majority of all variation (57%) is found at the sites of the PBR, which account for just 24 of the 99 (amino acid) sites in the analysis, and nearly 80% of all PBR variation is found within species (Table 2). Non-PBR sites show a similar pattern, with the overall result that most sequence variation is found within rather than among species, accounting for 72.5% of all site variation.
Despite the fact that the large majority of variation is found within rather than among species, there is nevertheless significant structuring of sequence variation at all hierarchical levels, for both PBR and non-PBR sites. There is also a consistent difference in the amount of structure between PBR and non-PBR sites, with PBR sites showing more limited structure at all levels (Table 2).
Nonsynonymous and synonymous substitution: There are significant differences in the amount of average pairwise silent sequence variation among, but not within, subgenera (Table 3). P. eremicus and P. californicus (subgenus Haplomylomys) show significantly reduced synonymous variation at both PBR and non-PBR sites compared to P. maniculatus and P. leucopus (subgenus Peromyscus). There is no similar trend for differences among the subgenera in nonsynonymous variation, at either PBR or non-PBR sites.
Genealogical analyses: The unrooted genealogy for Peromyscus Eβ sequences indicates complete reciprocal monophyly among Peromyscus subgenera, with high bootstrap support (Figure 2). A similar topology was obtained using only synonymous pairwise distances (Figure 3). The position of the root is not shown because its position is uncertain. A root is not required to infer complete monophyly of Eβ polymorphism in at least one of the subgenera.
Inference of complete allelic turnover is artifactual: Genealogical analysis indicates reciprocal monophyly of Eβ sequences among Peromyscus subgenera, suggesting independent derivation of MHC variation in one or both taxa (Figure 2). However, this result appears to be an artifact of extensive recombination in the history of these sequences. If all variation within one or both subgenera were independently derived, this predicts extensive differences among the subgenera at degenerate sites at shared amino acid polymorphism. In fact, just 1 of 19 polymorphic amino acid positions shared among at least three species (Figure 1) shows variation consistent with this expectation. For the remaining 18 sites where all taxa share identical codon usage, we determined the probability of this observation assuming independent derivation of shared polymorphisms, using the frequency of usage for alternative codons for all taxa. This procedure requires identifying which polymorphism represents the derived state for each site; we chose the polymorphism with the lowest probability of independent derivation as the ancestral state, a very conservative approach. The resulting probability is nevertheless extremely small (P < 0.0001). We conclude that significant evidence for abundant ancestral shared polymorphism contradicts the inference from genealogical analysis.
The preceding argument indicates that extensive recombination has grossly affected inference of the allelic genealogy. In fact, the amounts of recombination estimated here (Table 1; see also A. D. Richman, L. G. Herrera, D. Nash and M. L. Schierup, unpublished results) well exceed those inferred in a previous study of recombination at the MHC (Schierupet al. 2001). However, that study was not able to make quantitative estimates of recombination and therefore inferred ρ indirectly, from distortion of the shape of the allelic genealogy (Uyenoyama 1997). One possibility is that this indirect approach underestimates the amount of recombination, presumably due to saturation of highly diverged sequences. Schierup et al. (2001) also concluded that sites under balancing selection may be difficult to identify by their increased variability, due to linkage to other sites that are not the target of balancing selection, unless the amount of recombination is “unrealistically high,” i.e., ρ > 1. Here we estimate ρ > 100 for all taxa, suggesting that the sites of balancing selection will in fact exhibit significantly increased variability. This of course is in accord with our results, given that sites of the PBR show conspicuously higher levels of substitution compared to other sites within exon 2 (Table 3).
Despite evidence for extensive intragenic recombination in the history of Peromyscus taxa, this process alone appears insufficient to account for the evolution of sequence differences among Peromyscus subgenera. If recombination alone were responsible for the observation of reciprocal monophyly of Eβ sequence polymorphism among the subgenera, we would expect to observe a similar pattern of lineage sorting within the subgenera as well. The species within subgenera diverged ∼3.5 and 4 MYA (Figure 4) and therefore have been evolving in isolation for a total of 7 and 8 million years. Nevertheless, species within subgenera do not show reciprocal monophyly for Eβ sequences (Figure 2). By contrast, the sequence differences responsible for the inference of reciprocal monophyly among the subgenera arose within the period preceding speciation within the subgenera, a total of ∼5.5 million years (Figure 4). It appears that importance of intragenic recombination in homogenizing MHC variation has varied over time, with its effects more pronounced in the period preceding diversification of species within the subgenera compared to more recently.
Evolutionary change in MHC sequences was apparently much more rapid during the period preceding diversification of species within the subgenera compared to more recently. Large, significant differences in silent diversity among but not within subgenera (Table 3) are due almost entirely to variation shared between species within the subgenera, indicating these differences are ancestral; estimates of silent diversity are little affected by removal of site variation found in single species (not shown). Differences in silent variation among the subgenera must therefore have arisen in the period preceding speciation within the subgenera, a short period of time (5.5 million years) compared to the total divergence time of species within the subgenera (15 million years). Comparing differences in PBR silent diversity within vs. among subgenera (Table 3), minimally 10 times the amount of change occurred between vs. within the subgenera, in little more than one-third the time, suggesting a difference in evolutionary rate within vs. among subgenera of ∼27-fold. A similar argument can be made regarding the evolution of nonsilent fixed differences, which are largely responsible for the inference of reciprocal monophyly of Eβ polymorphism among subgenera. The three fixed amino acid differences observed between the subgenera must have arisen prior to diversification of the subgenera. This suggests that about eight to nine fixed differences should be observed within the subgenera, but none are observed (Figure 1).
One possible explanation for variation in the rate of evolution over time is that interspecific hybridization resulting in extensive introgression at the Eβ locus within but not among the subgenera caused an apparent reduction in the rate of evolution more recently, by reintroducing variation that had become fixed in different species. This explanation requires that extensive introgression occurred relatively recently within both subgenera; otherwise there would be an expected accumulation of fixed differences in the subgenera comparable to that observed among the subgenera. The conspicuous difference in the amount of shared lineages among vs. within subgenera (see Figure 2) also suggests that introgression would have to have occurred relatively recently. However, there is no evidence for recent introgression at the MHC Eβ locus in either subgenus, as indicated by the very large differences among alleles found in different species. Most importantly, a hypothesis of hybridization within subgenera cannot explain the evolution of large differences in the amount of genetic variation among the subgenera (Tables 2 and 3).
In contrast to the preceding hypothesis of interspecific hybridization, directional selection could account for accelerated evolution of Eβ sequences among the subgenera. However, we argue that this explanation is implausible given abundant evidence of balancing selection in the history of these sequences (Table 1). Balancing selection would oppose the decrease in variation due to directional selection. The large reduction in silent variation among subgenera suggests that directional selection would have to be both strong enough to overwhelm balancing selection (Sattaet al. 1994) and maintained for a sufficiently long period to fix variation among populations, an event for which there is no precedent.
A more likely explanation for rapid evolution of the MHC in Peromyscus is an event for which there is ample precedent, an historic change in effective population size resulting in a significant reduction in MHC polymorphism (Ellegrenet al. 1993; Mikkoet al. 1999; O'Brien and Yuhki 1999). In particular, rapid evolution of large differences in silent variation among the subgenera suggests a significant difference in effective population size in the distant past, consistent with the effects of a population bottleneck. Although a bottleneck is not expected to discriminate between silent and nonsilent variation, balancing selection (and recombination) will distribute nonsilent variation, which is the direct target of balancing selection, relatively evenly among alleles. Silent variation will not be as evenly distributed (Schierupet al. 2001) and is therefore less likely to be sampled given a reduction in allelic polymorphism. Similarly, subsequent regeneration of MHC polymorphism is expected to result in a relatively even distribution among alleles of sequence variation, which is the target of balancing selection, but again silent variation will not be as evenly distributed. If a bottleneck event does not result in the complete elimination of allelic polymorphism, then both processes may contribute to the disproportionate loss of silent variation. In addition, evidence for the increased importance of intragenic recombination at the same time also suggests a historical bottleneck event, because the effective rate of intragenic recombination is expected to increase following a population constriction (see below).
To explain our result that the importance of intragenic recombination has varied over time, we propose that demographic history modulates the importance of intragenic recombination at the MHC. All else being equal, the selective advantage of a novel allelic specificity under balancing selection is an inverse function of the number of alleles maintained in a population. This means that the relative selective advantage of a novel specificity arising given limited polymorphism resulting from a population bottleneck is greater than that of a new specificity arising under equilibrium conditions. Therefore the effective rate of incorporation of these new alleles will vary as a function of the strength of selection for new allelic specificities, which in turn is a function of the demographic history of the population. If new specificities arise partly or mainly as a result of intragenic recombination, the (effective) rate of recombination will also vary as a function of the demographic history of the population.
The proposed interaction between recombination and population bottlenecks may explain the relatively large variation in the amount of silent diversity at MHC genes among closely related taxa (Table 3). As described above, silent variation, which is not the direct target of balancing selection, is both less likely to be sampled during a bottleneck event and less likely to be evenly redistributed during recovery of MHC polymorphism, compared to selected variation. Other studies of MHC polymorphism provide empirical support for this hypothesis. Madagascan lemurs are believed to have undergone a severe bottleneck in MHC diversity as a result of overwater colonization and show a significant reduction in silent polymorphism at MHC class II DRB genes compared to chimps (Goet al. 2002). Limited silent variation in the presence of extensive nonsilent variation at MHC class II genes in ungulates has been attributed to the effects of recombination alone (Mikkoet al. 1999). However, this explanation predicts very low silent MHC diversity for all taxa, which is not observed, or higher rates of recombination in ungulates, for which there is no evidence. We note that a severe reduction in MHC polymorphism in moose resulted in the complete loss of silent variation in this ungulate species (Mikko and Andersson 1995). We conclude that limited silent variation at polymorphic MHC genes may therefore be best explained as the persistent evidence of historical reductions in MHC polymorphism.
A conspicuous result in this study is the very large amount of recombination compared to mutation in the history of samples of MHC polymorphism. Although it is possible that the recombination rate is simply higher than the mutation rate, we suggest that an additional contributing factor may be a selective advantage for novel intragenic recombinants under balancing selection at the MHC. One possibility is that intragenic recombination is more likely than de novo mutation to generate allelic sequences that differ by multiple substitutions from other allelic sequences already represented in the population. If this translates into an altered spectrum of antigens presented to T-cells, novel intragenic recombinants may be more likely to be selected for as a result (Doherty and Zinkernagel 1975; Satta 1997). This selective advantage may increase following a population bottleneck that results in greatly reduced polymorphism, because recombination between potentially highly divergent allelic sequences may be more likely than the equilibrium state to generate selectively distinct alleles.
An alternative explanation for apparent high rates of recombination at class II genes is a site-specific signal for recombination (Gyllenstenet al. 1991). However, this proposal has been criticized on theoretical grounds as insufficient to support observed polymorphism at MHC genes (Sattaet al. 1999). Moreover, detailed analysis of recombination within MHC class II genes shows that extensive recombination is distributed throughout the exon (Richmanet al. 2003). It has been pointed out previously that the extreme age of many MHC alleles implies that they will experience extensive recombination even if the rate of recombination does not differ from that of other genes (Schierupet al. 2001), suggesting that a particular mechanistic explanation for extensive recombination at MHC genes is not required. The present hypothesis also suggests that a particular mechanistic explanation may not be required to explain relatively high rates of recombination within MHC genes.
In conclusion, we have proposed that population bottleneck(s) is responsible for both rapid evolutionary change and an episodic increase in the effective rate of recombination at the MHC class II Eβ gene in Peromyscus. Conversely, the rate of evolutionary change and recombination may be relatively slow given extensive polymorphism at the MHC maintained in the absence of demographic constrictions, because of the reduced selective advantage of novel allelic specificities. The persistence of historical differences in MHC diversity among Peromyscus subgenera over millions of generations is consistent with this latter expectation, suggesting that demographic constrictions are rare event(s) in the history of Peromyscus taxa. More generally, the persistence of historical differences in MHC diversity for millions of years indicates that MHC variation is far from being an ahistorical document overwritten by repeated intragenic recombination events. We suggest that the task now is to determine how best to read this historical record, taking into account quantitative estimates of intragenic recombination.
The authors thank the following for research and/or collecting permission: The Secretary of Foreign Relations of Mexico (Permit DAN 02578); the Secretary of Environment, Natural Resources, and Fisheries of Mexico (Permit DOO 02.5075); and the U.S. Forest Service (Permit CHIS-98-011). We thank Angelica Narvaez of the Office of the Environment, Science, and Technology Affairs in the U.S. Embassy in Mexico City for logistical assistance; and M. Schierup and Y. Satta for comments. Grant support from the U.S. National Science Foundation and National Geographic Society is gratefully acknowledged.
Communicating editor: N. Takahata
- Received August 15, 2002.
- Accepted February 5, 2003.
- Copyright © 2003 by the Genetics Society of America