Genetics, Vol. 166, 1897-1907, April 2004, Copyright © 2004

Evolutionary Relationships of Major Histocompatibility Complex Class I Genes in Simian Primates

Hiromi Sawaia, Yoshi Kawamotob, Naoyuki Takahataa, and Yoko Sattaa
a Department of Biosystems Science, Graduate University for Advanced Studies (Sokendai), Hayama, Kanagawa 240-0193, Japan
b Primate Research Institute, Kyoto University, Inuyama, Aichi 484-8506, Japan

Corresponding author: Yoko Satta, Graduate University for Advanced Studies, Hayama, Kanagawa 240-0193, Japan., satta{at}soken.ac.jp (E-mail)

Communicating editor: S. YOKOYAMA


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

New World monkeys (NWMs) occupy a critical phylogenetic position in elucidating the evolutionary process of major histocompatibility complex (MHC) class I genes in primates. From three subfamilies of Aotinae, Cebinae, and Atelinae, the 5'-flanking regions of 18 class I genes are obtained and phylogenetically examined in terms of Alu/LINE insertion elements as well as the nucleotide substitutions. Two pairs of genes from Aotinae and Atelinae are clearly orthologous to human leukocyte antigen (HLA) -E and -F genes. Of the remaining 14 genes, 8 belong to the distinct group B, together with HLA-B and -C, to the exclusion of all other HLA class I genes. These NWM genes are classified into four groups, designated as NWM-B1, -B2, -B3, and -B4. Of these, NWM-B2 is orthologous to HLA-B/C. Also, orthologous relationships of NWM-B1, -B2, and -B3 exist among different families of Cebidae and Atelidae, which is in sharp contrast to the genus-specific gene organization within the subfamily Callitrichinae. The other six genes belong to the distinct group G. However, a clade of these NWM genes is almost equally related to HLA-A, -J, -G, and -K, and there is no evidence for their orthologous relationships to HLA-G. It is argued that class I genes in simian primates duplicated extensively in their common ancestral lineage and that subsequent evolution in descendant species has been facilitated mainly by independent loss of genes.


GLYCOPROTEINS encoded by genes in the major histocompatibility complex (MHC) region in vertebrates trigger the acquired immune system by presenting non-self-peptides to T cells (KLEIN 1987 Down; KLEIN and HOREJSI 1997 Down). The number of bona fide MHC genes seems to have been optimized by dual functions of MHC molecules: T-cell restriction by self-peptides and presentation of non-self-peptides (TAKAHATA 1995 Down; CELADA and SEIDEN 1996 Down; WEGNER et al. 2003 Down). MHC genes are divided into classes I and II with respect to their structure and function. Each class is further subdivided into classical and nonclassical according to the pattern of gene expression or the extent of polymorphism (KLEIN 1987 Down). Classical class I genes are ubiquitously expressed and generally exhibit high degrees of polymorphism. Nonclassical class I genes are expressed mainly in restricted tissues or organs and exhibit relatively low extents of polymorphism. Some nonclassical MHC molecules function as a ligand to a natural killer cell receptor (NK receptor) and send a signal to prevent the cell lysis by the NK cell (BRAUD et al. 1998 Down; LEE et al. 1998 Down; LEIBSON 1998 Down).

Table 1 lists class I homologs of human leukocyte antigen (HLA) genes of simian primates. Note that not all of these homologs are orthologous to HLA class I genes. Although there is the 1:1 orthology among humans and great apes of A locus, there is no 1:1 orthologous relationship between HLA-A and Patr-AL (ADAMS et al. 2001 Down). In Old World monkeys (OWMs), there are many A-like genes in addition to the so-called AG gene (BOYSON et al. 1996 Down, BOYSON et al. 1997 Down). Since these A-related genes duplicated specifically in the lineage of OWMs (BOYSON et al. 1996 Down, BOYSON et al. 1997 Down), they are orthologous to hominoid A, but they are not 1:1 orthologs to HLA-A. In more distantly related New World monkeys (NWMs), no A-like genes have been found.


 
View this table:
In this window
In a new window

 
Table 1. Evolutionary relationships among nonhuman primate class I genes with 10 HLA loci

For the B and C loci, the 1:1 orthology is demonstrated within hominoids. HLA-C orthologs are found in African apes and orangutans (CHEN et al. 1992 Down; ADAMS et al. 1999 Down) and this is consistent with the suggestion that the duplication of HLA-B and -C occurred >40 million years ago (MYA; KULSKI et al. 1997 Down; but see PIONTKIVSKA and NEI 2003 Down). Since this estimate predates the divergence between hominoids and OWMs (MARTIN 1993 Down), it is likely that the ancestor once possessed the HLA-C ortholog. Thus, the HLA-C ortholog seems to have been lost in gibbons and OWMs. On the other hand, orangutans and OWMs experienced gene duplication of the B locus independently, so that these genes within each species as a whole are orthologous to HLA-B. CADAVID et al. 1997 Down found two B-like genes in spider monkeys (Ateles belzebuth) and saki monkeys (Pithecia pithecia). However, these B-like genes in NWMs are paraphyletic with respect to HLA-B/C (CADAVID et al. 1997 Down) and their orthologous relationships are not demonstrated.

As for the nonclassical class I loci, E and F are well conserved and their 1:1 orthologies are largely established among primates (OTTING and BONTROP 1993 Down; KNAPP et al. 1998 Down). By contrast, the orthologous relationships of G can be seen only between hominoids and OWMs (BOYSON et al. 1997 Down; CASTRO et al. 2000 Down). In NWMs, G-like genes have been reported in several subfamilies. However, the orthology of these G-like genes to HLA-G has not been established. Furthermore, in the subfamily Callitrichinae (cotton-top tamarins and marmosets), G-like genes have duplicated in genus-specific manners and may function as classical class I in the absence of classical HLA orthologs (WATKINS et al. 1990 Down; CADAVID and WATKINS 1997 Down; CADAVID et al. 1997 Down, CADAVID et al. 1999 Down). In other subfamilies, there are few studies about the origin and evolution of MHC class I genes (ADAMS and PARHAM 2001 Down).

To understand the evolutionary dynamics of primate MHC genes, it is essential to establish the orthologous relationships of various MHC loci, particularly B- and G-related genes. To this end, we design PCR primers to amplify not only exons but also the 5'-flanking region of NWM class I genes. The 5'-flanking region contains a number of phylogenetically informative insertion elements. We use this information as well as the nucleotide substitutions to study the evolutionary relationships of class I genes in simian primates.


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

HLA class I sequences:
We retrieved genomic sequences of 10 HLA loci: HLA-A (GenBank accession nos. AP000519 and AP000520), HLA-B (AP000507), HLA-C (AP000508), HLA-E (AP000514), HLA-F (AP000521), HLA-G (AP000521), HLA-H/54 (AP000520), HLA-J/59 (AP000519), HLA-K/70 (AP000520), and HLA-L/92 (AP000516). The length of the 5'- and 3'-flanking regions is ~10 kb each and that of the coding region is ~3 kb.

Samples:
We used the owl monkey (Aotus trivirgatus), the tufted capuchin (Cebus apella), and the spider monkey (A. belzebuth) as representatives of subfamilies of Aotinae, Cebinae, and Atelinae, respectively. We also used the rhesus monkey (Macaca mulatta) because its MHC is best studied among OWMs. We purchased the rhesus monkey genomic DNA from CLONTECH (Palo Alto, CA) and prepared the NWM genomic DNA from a 2-ml blood sample of each individual kept at the Primate Research Institute by using a blood and cell culture DNA kit (QIAGEN, Chatsworth, CA).

PCR and sequencing:
To amplify the 5'-flanking and coding sequences of class I loci, we designed PCR primers specific to certain groups of HLA loci: pAluE, 5'-GACCCTGTCTCTCTAAACAACAGCA-3'; pAluB, 5'-AGGCATCCTAAYCAGTGCAA-3'; and pAluG, 5'-CTCTGTATAAGCCTGAAGGAG-3'. We used each of these PCR primers in combination with an exon 2 primer (pEX2, 5'-AACTGCGTGTCGTCCACGTA-3'). PCR conditions were slightly different among primer sets (available upon request). The amplified PCR fragments were ~3 kb and were purified [PCR purification kit (QIAGEN), S.N.A.P. gel purification kit (Invitrogen, San Diego)]. Purified fragments were cloned (TOPO cloning kit; Invitrogen). To avoid sequencing errors, we sequenced three or more clones for each PCR fragment in both directions by about six sets of sequence primers. We performed sequencing reactions by using the dye terminator cycle sequencing method [DNA sequence kit (ABI, Columbia, MD)] and the DNA sequencer (ABI377; ABI). According to the MHC designation system (KLEIN et al. 1990 Down; BONTROP and KLEIN 1997 Down), MHC genes in M. mulatta, A. trivirgatus, C. apella, and A. belzebuth are subsequently prefixed by Mamu, Aotr, Ceap, and Atbe, respectively.

Sequence analysis:
To identify homologous regions between pairs of DNA sequences, we used Dotter (SONNHAMMER and DURBIN 1995 Down) and aligned homologous regions by Clustal W (THOMPSON et al. 1994 Down). We further modified the alignment manually wherever necessary. We constructed phylogenetic trees by the neighbor-joining (NJ) method (SAITOU and NEI 1987 Down) implemented in PHYLIP, version 3.572 (FELSENSTEIN 1993 Down). We used the p-distances (the observed numbers of nucleotide differences per site) to determine the topological relationships, as recommended by SAITOU and NEI 1987 Down, and the d-distances (the estimated numbers of nucleotide substitutions; KIMURA 1980 Down) to estimate the divergence times of class I loci.


*  RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

HLA loci and their flanking regions:
In general, the topological relationships among class I genes strongly depend on the region and the length of sequences used in the phylogenetic analysis (PARHAM et al. 1989 Down, PARHAM et al. 1995 Down). We compared the 5'- and 3'-flanking sequences of the 10 HLA loci in addition to their coding sequences. However, because the alignment in the 3'-flanking region is difficult to make, particularly for HLA-E and -F, we excluded the 3'-flanking region from further analysis. We analyzed the 5'-flanking region, introns, exons 2–3, and exons 4–8 separately and evaluated the reliability of the resulting phylogenetic trees in terms of the topology and the bootstrap value (BV). As expected from extensive nonsynonymous substitutions in exons 2–3 driven by balancing selection (HUGHES and NEI 1988 Down; TAKAHATA 1995 Down), the tree topology in these exons appears to be much affected by homoplasy and significantly different from that in other regions. In addition, the phylogenetic relationships are least reliable in terms of BVs (Fig 1C). On the other hand, the remaining three regions show a more or less similar tree topology, and the BVs are satisfactorily high for the 5'-flanking region and introns (Fig 1). Importantly, the 5'-flanking region contains a number of nucleotide insertions and deletions (indels), as well as Alu, L1, and L2 insertion elements. It turns out that these insertion elements and indels are phylogenetically informative, so that we subsequently focus on the 5'-flanking region.



View larger version (24K):
In this window
In a new window
Download PPT slide
 
Figure 1. NJ trees of 10 HLA loci based on the p-distances in (A) the 5'-flanking region (1105 bp), (B) introns (1559 bp), (C) exons 2–3 (542 bp), and (D) exons 4–8 (462 bp). For the 5'-flanking region, HLA-E is excluded. Only bootstrap values >65% are shown.

Among phylogenetically informative insertion elements (Fig 2), there are four Alus, which allow us to classify the 10 HLA loci into four groups: B (HLA-B, -C, and -L), A (HLA-A and -H), G (HLA-G, -J, -K, and -F), and E (HLA-E). Alus are primate-specific short interspersed elements, and on the basis of diagnostic sites Alus are classified into four subfamilies: FLAM or FRAM, AluJ, AluS, and AluY (JURKA and MILOSAVLJEVIC 1991 Down; BATZER and DEININGER 2002 Down). Since these subfamilies disseminated in different evolutionary times (WILLARD et al. 1987 Down; JURKA and MILOSAVLJEVIC 1991 Down; KAPITONOV and JURKA 1996 Down; BATZER and DEININGER 2002 Down), we know, even though approximately, when the Alus under study were integrated into the primate genome. FLAM_C is a subfamily of the left arm of the Alu monomer and is thought to have disseminated earliest (JURKA and MILOSAVLJEVIC 1991 Down; KAPITONOV and JURKA 1996 Down). HLA-B, -C, and -L share an FLAM_C by which they are grouped in B. In addition to this FLAM_C, there is an AluJo that can discriminate HLA-B/C from HLA-L within group B. This AluJo is inserted in a 6- to 9-kb region upstream from exon 1 and is not depicted in Fig 2. There is an AluJb that can define a single clade of group A and group G. These groups can be distinguished from each other by an AluY specific to group A and by another AluJo specific to group G. Group E or HLA-E is highly unusual in that five Alus are specifically inserted in the 5'-flanking region. The PCR primers in MATERIALS AND METHODS are designed to include these insertion elements and we use them as phylogenetic markers to identify class I genes in OWMs and NWMs.



View larger version (37K):
In this window
In a new window
Download PPT slide
 
Figure 2. Insertion sites of Alu/LINE elements, indels, and PCR primer positions in four groups of HLA class I genes. The three PCR primers of pAluE, pAluB, and pAluG are designed to include group-specific Alu sequences. Hatched rectangles indicate exons 1 and 2; open rectangles, Alus; stippled rectangles, L1/L2 elements; thick arrows, PCR primer positions; thin arrows within rectangles, the 5'–3' direction of an insertion element; vertical lines across different loci, the same Alu elements; open and solid triangles above each line, insertions and deletions, respectively; and the number above a triangle, the size of an indel.

Classification of Mamu class I genes:
When the PCR primer set was applied to M. mulatta, it yielded five different 5'-flanking sequences each of 3–4 kb in length. From the comparison of insertion elements with those of HLA, these sequences in M. mulatta could be immediately classified into three groups: one into group E, one into group B, and three into group A or G (Fig 3).



View larger version (22K):
In this window
In a new window
Download PPT slide
 
Figure 3. Five different configurations of Alu/LINEs inserted in the 5'-flanking region of Mamu class I genes. Symbols are the same as those in Fig 2.

We could identify the group E sequence as Mamu-E (BOYSON et al. 1995 Down) from the presence of four Alus and one L1, of which the insertion sites are identical to those of HLA-E. For the other single Mamu sequence in group B, we found that the insertion sites of one FLAM_C and three L1s are identical to those in HLA-L (Fig 2 and Fig 3). Therefore this sequence is designated as Mamu-L. To distinguish whether the three sequences belong to group A or G, the group-A-specific AluY is not a useful marker since it is found mainly in hominoids and rarely in OWMs or NWMs (BATZER and DEININGER 2002 Down). Instead, we distinguished group G from group A by the presence or absence of the group-G-specific AluJo. In other words, if this AluJo is present, a sequence is identified as a member of group G, and if not, it is of group A. In this way, two of the three Mamu sequences are classified into group G and one into group A. The latter is designated as Mamu-A (MILLER et al. 1991 Down) and this designation is also supported by BLAST search (see below).

Furthermore, according to indels that are shared with HLA-K or -F, the two Mamu sequences in group G are identified as Mamu-K and -F. Mamu-K shares a single 95-bp insertion with HLA-K, while Mamu-F (OTTING and BONTROP 1993 Down) shares nine indels (one 2-bp, one 6-bp, and one 120-bp insertion, and two 1-bp, two 3-bp, and two 4-bp deletions) with HLA-F. Although our failure to detect Mamu-B and -G (BOYSON et al. 1995 Down, BOYSON et al. 1996 Down) is probably owed to insufficient screening of PCR products, we did newly identify Mamu-L and -K in addition to known Mamu-A, -F, and -E. These locus assignments by insertion elements and indels are also supported by BLAST search. Using each of the 5'-flanking sequences of Mamu class I genes as a query, we obtained the corresponding HLA or Patr ortholog as the best hit. Similarly, BLAST search of an ~150-bp exon 1–2 sequence showed that the Mamu-E and -F sequences are 99% identical to those in the database, respectively, and that the Mamu-A sequence is 88% identical to the Mamu-A*11 sequence. However, the Mamu-K and -L exon sequences turn out to be >90% identical to Mamu-B*08, suggesting either some confusion in locus identification or loose linkage between the 5'-flanking region and exons 1–2.

Classification of NWM class I genes:
The same PCR primer set as for M. mulatta yielded 18 NWM sequences: 7 from A. trivirgatus, 5 from C. apella, and 6 from A. belzebuth (Table 2). Since the insertion sites of group-specific Alus are identical to those in HLA genes (Fig 4), it is possible to classify the 18 sequences into three groups: E, B, and G. Two sequences are classified into group E by L1 and AluSg, which are shared with HLA-E; 8 into group B by FLAM_C, which are shared with HLA-B, -C, and -L; and the remaining 8 into group G by AluJo and L2, which are shared with HLA-G, -J, -K, and -F. The apparent lack of group A is due to either PCR primer mismatches or its true absence in the NWM genome.



View larger version (28K):
In this window
In a new window
Download PPT slide
 
Figure 4. Nine different configurations of Alu/LINEs inserted in the 5'-flanking region of NWM class I genes. A dagger above an indel symbol shows that the indel is not shared among the sequences at a given locus. The dashed line in group E indicates that sequencing was not done. Symbols are the same as those in Fig 2.


 
View this table:
In this window
In a new window

 
Table 2. Classification of HLA loci and nonhuman primate MHC genes

The two group E sequences from A. trivirgatus and A. belzebuth (Table 2) are unambiguously designated as Aotr-E and Atbe-E by insertion elements (Fig 2 and Fig 4), which is consistent with the finding of conservation of E genes in simian primates (KNAPP et al. 1998 Down). Within group B sequences, there are four different constellations (B1–B4) of Alu/LINEs. In addition, there are eight informative indels ranging from 1 to 10 bp within the 1.5-kb region upstream from exon 2. Most of these consistently support the clustering between Aotr-B1 and Atbe-B1, between Aotr-B2 and Atbe-B2, and between Aotr-B3 and Ceap-B3 (Fig 4). Since these three pairs occur between different subfamilies or families, it is likely that they are orthologous. However, some exceptional indels do not support the sister relationships of Atbe-B3 and Aotr-B3/Ceap-B3. Ceap-B4 is also exceptional in that it has a number of unique indels.

Among eight group G sequences (Table 2), Atbe-F and Aotr-F differ from the rest of group G sequences in that a diagnostic AluJo splits into two parts by a 1-kb insertion sequence of unknown origin (Fig 4). These two genes are orthologous to HLA-F and Mamu-F (OTTING and BONTROP 1993 Down), as supported also by the presence of eight indels (one 2-bp insertion and two 1-bp, two 3-bp, two 4-bp, and one 6-bp deletions), which are uniquely shared among these four sequences (Fig 4). Among the remaining six sequences in group G, there are four informative indels, all of which are shared only among Ceap-G1, Ceap-G1*, and Aotr-G1. Ceap-G1* is unique in possessing AluSc. However, since Ceap-G1 and Ceap-G1* are sampled from a single individual of C. apella and polymorphic Alus in MHC alleles are not rare (KULSKI et al. 2001 Down), they are likely to be allelic. For the remaining three genes, Aotr-G2, Ceap-G2, and Atbe-G2, no phylogenetic informative indels can further distinguish these genes from each other.

In short, insertion elements and indels have successfully identified HLA-E and -F orthologs (WATKINS et al. 1990 Down; KNAPP et al. 1998 Down) as well as group B (CADAVID et al. 1997 Down) and group G sequences in NWMs (WATKINS et al. 1990 Down). However, this method did not unambiguously classify all the amplified sequences into loci. For this reason and to make quantitative arguments, we carried out the phylogenetic analysis of class I sequences spanning from the 5'-flanking region to exon 2.

Phylogenetic analysis of NWM class I genes:
In the phylogenetic analysis of the 5'-flanking region, we excluded all group E sequences because no reliable alignment of these is feasible with other group sequences. The analysis of the remaining sequences shows that group B and G sequences form two distinct clades with 100% BVs. The common node of group B separates the L sequences from all other B-like sequences and that of group G separates the F sequences from the rest within this group (data not shown, but see Fig 5). However, the evolutionary relationships among NWM sequences within group B or G are not completely unambiguous. Therefore, we further analyzed group B and G separately to increase the number of nucleotide sites that can be aligned. Fig 5A shows the NJ tree of 12 sequences in group B and Fig 5B shows the NJ tree of 14 group G and 3 group A sequences (HLA-A, -H, and Mamu-A).



View larger version (22K):
In this window
In a new window
Download PPT slide
 
Figure 5. NJ trees of primate group B and G sequences based on the p-distances in the 5'-flanking region. Group E sequences are excluded since the number of nucleotide sites that can be compared becomes substantially small. The phylogenetic analysis is carried out separately for groups B and G. The number of nucleotide sites compared is (A) 1021 bp for group B sequences and (B) 1743 bp for group G sequences. In total, 12 sequences are in group B and 17 sequences are in group G. At a node, the bootstrap values >65% are given. An open diamond indicates gene duplication and the root of the group B or G tree is assumed to be the same as that placed when all sequences are used. As in text and Table 2, the MHC symbol is followed by a four-letter abbreviation system (KLEIN et al. 1990 Down; BONTROP and KLEIN 1997 Down).

In group B (Fig 5A), the eight NWM sequences form a single cluster together with HLA-B and -C (100% BV) to the exclusion of HLA-L and Mamu-L. Apart from the HLA-B/C lineage, there are four distinct clades of the NWM sequences: B1 is [Aotr-B1, Atbe-B1], B2 is [Aotr-B2, Atbe-B2], B3 is [Aotr-B3, Ceap-B3, Atbe-B3], and B4 is [Ceap-B4]. Both B1 and B2 clades are supported by 100% BVs, while the B3 clade is supported by 83% BV. These results are consistent with our previous assumptions that Atbe-B3 is orthologous to Aotr-B3 and Ceap-B3 and that Ceap-B4 is an independent locus. Importantly, Fig 5A suggests that the sequences in the NWM B2 clade (hereafter designated as NWM-B2 for the locus) are more closely related to HLA-B/C than those in the NWM B1, B3, and B4 clades (NWM-B1, -B3, and -B4, respectively). It is possible that some NWM loci in group B are orthologous to HLA-B (CADAVID et al. 1997 Down), since HLA-B and -C duplicated in the stem lineage leading to OWMs and HLA-B-related genes are more prevalent than HLA-C in primates. Although the clade of NWM-B2 and HLA-B/C is only weakly supported by BVs, it does suggest that there is an ortholog of HLA-B in NWMs. Furthermore, the NWM-B1, -B2, and -B3 genes are shared between different families of Cebidae and Atelidae.

Of 14 group G sequences (Fig 5B), Atbe-F and Aotr-F are orthologous to HLA-F and Mamu-F and the clade of these four genes is supported by 100% BV. The remaining NWM sequences form two clades: the G1 clade consisting of Aotr-G1, Ceap-G1, and Ceap-G1* (88% BV) and the G2 clade consisting of Aotr-G2, Ceap-G2, and Atbe-G2 (94% BV). The G1 and G2 clades are also supported by deletions and are regarded as representing different loci (NWM-G1 and -G2). Like NWM-B1, -B2, and -B3, NWM-G1 and -G2 are shared among two or three different subfamilies. It should be noted that the total clade of G1 and G2 is supported by 99% BV to the exclusion of HLA-A, -G, -J, -K, and -F, suggesting that all of these loci are paralogous to each. Ceap-G1 and Ceap-G1* are more closely related to each other than to any other sequences and are clustered in a single clade (Fig 5B). The p-distance (3.2%) between Ceap-G1 and Ceap-G1* is relatively large, but it is within a range of the observed allelic diversity at MHC loci (SATTA 1993 Down).

To examine to what extent the phylogenetic tree based on exons or introns is consistent with that of the 5'-flanking region, we sequenced the entire coding regions of four genes of A. trivirgatus (Aotr-G2, Aotr-F, Aotr-B1, and Aotr-B2). These genes individually represent members of major clades in Fig 5. The phylogenetic analysis of these coding sequences with nine HLA (excluding HLA-E) loci reveals that the tree topology is almost the same as that for the 5'-flanking region. The topology based on the 5'-flanking region is also in good agreement with that based on introns (data not shown), suggesting that phylogenetic signals in the 5'-flanking region and introns are not shuffled by recombination. However, it is substantially different from the topology based on exons 2–3.

It is generally accepted that exon 4–8 sequences represent locus specificity better than the remaining coding region (PARHAM et al. 1989 Down, PARHAM et al. 1995 Down). We compared the phylogenetic tree for exons 2–3 or exons 4–8 with that for the 5'-flanking region. Exon 4–8 sequences make HLA-B/C, Aotr-B1, and Aotr-B2 cluster with 97% BV and make HLA-F and Aotr-F cluster with 86% BV, although the evolutionary relationships within group B and G sequences cannot be resolved (data not shown). On the other hand, exon 2–3 sequences show more complicated phylogenetic relationships than do exon 4–8 sequences. In particular, group B and G sequences do not form two distinct clades. These analyses have suggested that neither exons 2–3 nor exons 4–8 contain reliable phylogenetic signals of discriminating primate class I loci.

Class I loci in the simian primate ancestor:
The pairwise p-distances among four NWM paralogous B-related loci (B1B4 in Fig 5A) range from 11.8 to 14.3%. Similarly, the distances between L (HLA-L and Mamu-L) and four NWM B-related loci range from 15.2 to 19.3%. These p-distances are still >10%, a value averaged over 20 pairs of intron sequences between humans and NWMs (O'HUIGIN et al. 2002 Down), suggesting that the divergence of these loci is prior to the divergence of these species. Thus, there were at least five group B loci (four NWM B-related and L loci) before the present simian primates began to differentiate.

Likewise, the pairwise p-distances among HLA-A, -G, -J, and -K range from 11.2 to 12.0%. It is therefore likely that these loci also had already differentiated in an early stage of primate evolution. This ancient origin of HLA-A is supported by the presence of relatively old AluJo, but it contradicts the absence of HLA-A-related sequences in NWMs and the recent work by PIONTKIVSKA and NEI 2003 Down who estimated the emergence of HLA-A as recently as 35 MYA. On the other hand, the p-distances between NWM-G1 and -G2 genes range from 9.5 to 11.5%. It is not easy to judge whether they duplicated before or after the emergence of NWMs. To be conservative in the following discussion, however, we assume that NWM-G1 and -G2 loci duplicated shortly after the split of humans and NWMs. Thus, in addition to F (Fig 5B), there were at least six paralogous loci (four HLA, G1/G2, and F loci) in groups A and G in the stem lineage of simian primates.


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Divergence times of class I loci:
To date the sequence divergences within group B or G, we constructed NJ trees on the basis of the d-distances rather than on the p-distances and examined the molecular clock hypothesis by the two-cluster test (TAKEZAKI et al. 1995 Down). The test reveals that four group G sequences (HLA-F, Mamu-F, Aotr-F, and Atbe-F) and three group B sequences (Atbe-B2, Aotr-B2, and Ceap-B4) have evolved significantly (P < 0.01) faster than other sequences within each group. However, once we removed these sequences, no rate heterogeneity remained. Thus, we compute the average height (the average d-distances from the tips to the latest common node) of groups B and G as 9.1 ± 0.5% and 7.3 ± 0.3%, respectively. If we assume that the silent nucleotide substitution rate at primate MHC loci is 10–9/site/year (SATTA et al. 1993 Down), the emergence of groups B and G can be dated as ~90 and 70 MYA, respectively. These divergences greatly predate the split between humans and NWMs, or even the emergence of prosimians (MARTIN 1993 Down).

Recently, FLUGGE et al. 2002 Down and GO et al. 2003 Down extensively examined prosimian class I genes, and their exon sequences have revealed the monophyletic relationships with respect to simian class I genes. It would be interesting to see if Alu and LINE markers can be consistent with this observation. We found AluJ, FLAM, and FRAM in the 5'-flanking region of group B, G, and E sequences. Since these Alus were dispersed in the primate genome ~80 MYA (BATZER and DEININGER 2002 Down), it is likely that they were integrated into the prosimian genome as well. Our molecular clock analysis has also consistently provided the ancient differentiation of group B and G sequences. Therefore, identification of prosimian class I loci by this method is feasible and even legitimate to fully understand the evolution of primate class I loci.

Oldest class I locus in primates:
In this study, we have shown that most of primate class I loci diverged from each other before the split between humans and NWMs. But which primate class I locus diverged first? SHIINA et al. 1999 Down proposed the hypothesis that the HLA region was shaped by successive duplications of a block encompassing a single HLA locus. They concluded that a block containing HLA-F is the oldest, which is also supported by the recent work by PIONTKIVSKA and NEI 2003 Down. However, SHIINA et al. 1999 Down excluded HLA-E from their consideration, because the gene content of the E block is totally different from that of other blocks (KULSKI et al. 2000 Down). On the other hand, PIONTKIVSKA and NEI 2003 Down used exon 2–4 sequences in their phylogenetic analysis. Although they carefully examined the molecular clock hypothesis among various MHC genes, the phylogeny has low BVs at critical nodes and appears to be influenced by homoplasy.

On the other hand, the NJ tree of intron sequences reveals that HLA-E diverged first when dog and pig sequences are used as outgroups (data not shown). The first divergence of HLA-E from other HLA loci is supported by insertion elements in both 5'- and 3'-flanking regions of HLA-E. In the 5'-flanking region, HLA-E does not share any insertion elements with other HLA and among them is FRAM, the oldest Alu. Similarly in the 3'-flanking region of HLA-E, no insertion elements are shared with other HLA. It is thus concluded that HLA-E diverged first and has long evolved in isolation from the others.

Gene duplication rate in primate class I loci:
In Fig 6, we hypothesize contraction and expansion models of group B and G loci in humans and NWMs, respectively. In group B, there are one L and at least four B-related loci in the ancestral species of humans and NWMs. As discussed, the NWM-B2 locus is likely orthologous to HLA-B and this locus is retained in both humans and NWMs. However, the NWM-B1, -B3, and -B4 loci are lost in humans and HLA-B duplicated to produce HLA-C in the stem lineage of hominoids and OWMs (a diamond in Fig 5A). By contrast, all four B loci are retained but L may be lost in NWMs. In group G, there are at least five loci (HLA-G, -J, -K, -F, and NWM-G1/G2) in addition to HLA-A in the stem lineage of simian primates (Fig 5B). While the F locus is retained in both humans and NWMs, three orthologs of HLA-G, -J, and -K may become extinct in NWMs. Similarly, the ortholog of NWM-G1/G2 is lost in humans. Subsequently, there is only one duplication to generate NWM-G1 and -G2 loci (a diamond in Fig 5B). If we assume 45–55 million years of the divergence between humans and NWMs (MARTIN 1993 Down; KUMAR and HEDGES 1998 Down; TAKAHATA 2001 Down), two gene duplications generating HLA-B/C and NWM-G1/G2 took place in the descendant lineages. In addition, HLA-H duplicated from HLA-A after the divergence of human lineage from OWMs. Consequently, the duplication rate is ~3/(55 x 2) = 0.027 to 3/(45 x 2) = 0.033/genome/million years during this later stage of primate evolution.



View larger version (17K):
In this window
In a new window
Download PPT slide
 
Figure 6. Evolution of B- and G-related genes in humans and NWMs. The evolutionary process is divided into phases I and II. Phase I corresponds to the time period between the mammalian radiation and the divergence of humans and NWMs. It is assumed that there was a single locus of the extant primate class I genes 80–100 MYA. The ancestral gene duplicated into proto-E, -B, and -G genes. The presence of the proto-A gene is likely, but uncertain. Subsequently, proto-B and -G genes expanded to generate five group B and five group G genes, respectively, and there were at least eight gene duplications during phase I. On the other hand, phase II corresponds to the time period from the divergence of humans and NWMs to the present. During this phase, there were only one duplication for HLA-B and -C in the lineage leading to humans and one for G1 and G2 in the stem lineage of NWMs. The number beside an arrow stands for the minimum number of gene duplications.

For the duplication rate in the early stage of primate evolution, we note that there is no orthologous relationship between H-2 and HLA class I genes and therefore assume that there is only one common ancestral class I gene when primates diverged 80–100 MYA (KUMAR and HEDGES 1998 Down; ADKINS et al. 2001 Down). The ancestral gene might have diverged into an E-like locus and an ancestral gene of group A, B, and G loci, and this ancestor may have later expanded to generate one group A, five group B, and five group G loci. Then, at least eight duplications must have taken place during the period of 25–55 million years between the divergence of primates and rodents (80–100 MYA) and the divergence of humans and NWMs (45–55 MYA). The gene duplication rate then becomes 8/55 = 0.15 to 8/25 = 0.32 per genome per million years. This rate is nearly 10 times higher than that during the later stage of primate evolution. Thus, the gene organization was basically remolded by extensive duplication during the early stage of primate evolution and the present repertoire has been shaped mainly by loss of loci. Such a tempo and mode of evolution in primate class I loci surely awaits further scrutiny to determine its biological significance.


*  FOOTNOTES

Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos. AB113090, AB113091, AB113092, AB113093, AB113094, AB113095, AB113096, AB113097, AB113098, AB113099, AB113100, AB113101, AB113102, AB113103, AB113104, AB113105, AB113106, AB113107, AB113108, AB113109, AB113110, AB113111, AB113112 and AB113202, AB113203, AB113204, AB113205. Back


*  ACKNOWLEDGMENTS

This work is supported in part by a grant (no. 12304046) from the Japan Society for the Promotion of Science and in part by the Cooperation Research Program of Primate Research Institute, Kyoto University.

Manuscript received November 5, 2003; Accepted for publication November 16, 2003.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

ADAMS, E. J. and P. PARHAM, 2001  Species-specific evolution of MHC class I genes in the higher primates. Immunol. Rev. 183:41-64.[CrossRef][Medline]

ADAMS, E. J., G. THOMSON, and P. PARHAM, 1999  Evidence for an HLA-C-like locus in the orangutan Pongo pygmaeus.. Immunogenetics 49:865-871.[CrossRef][Medline]

ADAMS, E. J., S. L. COOPER, and P. PARHAM, 2001  A novel, non-classical MHC class I molecule specific to the common chimpanzee. J. Immunol. 167:3858-3869.[Abstract/Free Full Text]

ADKINS, R. M., E. L. GELKE, D. ROWE, and R. L. HONEYCUTT, 2001  Molecular phylogeny and divergence time estimates for major rodent groups: evidence from multiple genes. Mol. Biol. Evol. 18:777-791.[Abstract/Free Full Text]

BATZER, M. A. and P. L. DEININGER, 2002  Alu repeats and human genomic diversity. Nat. Rev. Genet. 3:370-379.[CrossRef][Medline]

BONTROP, R. E., and J. KLEIN, 1997 Nomenclature for the MHCs and alleles of different nonhuman primate species, pp. 552–555 in Molecular Biology and Evolution of Blood Group and MHC Antigens in Primates, edited by A. BLANCHER, J. KLEIN and W. W. SOCHA. Springer-Verlag, Berlin.

BOYSON, J. E., S. N. MCADAM, A. GALLIMORE, T. G. GOLOS, and X. LIU et al., 1995  The MHC class E locus in macaques is polymorphic and is conserved between macaques and humans. Immunogenetics 41:59-68.[CrossRef][Medline]

BOYSON, J. E., C. SHUFFLEBOTHAM, L. F. CADAVID, J. A. URVATER, and L. A. KNAPP et al., 1996  The MHC class I genes of the rhesus monkey. Different evolutionary histories of MHC class I and II genes in primates. J. Immunol. 156:4656-4665.[Abstract]

BOYSON, J. E., K. K. IWANAGA, T. G. GOLOS, and D. I. WATKINS, 1997  Identification of the rhesus monkey HLA-G ortholog: Mamu-G is a pseudogene. J. Immunol. 157:5428-5437.

BRAUD, V. M., D. S. J. ALLAN, C. A. O'CALLAGHAN, K. SÖDERSTRÖM, and A. D'ANDREA et al., 1998  HLA-E binds to natural killer cell receptors CD49/NKG2A, B and C. Nature 391:795-799.[CrossRef][Medline]

CADAVID, L. F., and D. I. WATKINS, 1997 MHC class I genes in nonhuman primates, pp. 339–357 in Molecular Biology and Evolution of Blood Group and MHC Antigens in Primates, edited by A. BLANCHER, J. KLEIN and W. W. SOCHA. Springer-Verlag, Berlin.

CADAVID, L. F., C. SHUFFLEBOTHAM, F. J. RUIZ, M. YEAGER, and A. L. HUGHES et al., 1997  Evolutionary instability of the major histocompatibility complex class I loci in New World primates. Proc. Natl. Acad. Sci. USA 94:14536-14541.[Abstract/Free Full Text]

CADAVID, L. F., B. E. MEJIA, and D. I. WATKINS, 1999  MHC class I genes in a New World primate, the cotton-top tamarin (Saguinus oedipus), have evolved by an active process of loci turnover. Immunogenetics 49:196-205.[CrossRef][Medline]

CASTRO, M. J., P. MORALES, J. MARTINEZ-LASO, L. ALLENDE, and R. ROJO-AMIGO et al., 2000  Lack of MHC-G4 and soluble (G5, G6) isoforms in the higher primates, Pongidae.. Hum. Immunol. 61:1164-1168.[CrossRef][Medline]

CELADA, F. and P. E. SEIDEN, 1996  Affinity maturation and hypermutation in a simulation of the humoral immune response. Eur. J. Immunol. 26:1350-1358.[Medline]

CHEN, Z. W., S. N. MCADAM, A. L. HUGHES, A. L. DOGON, and N. L. LETVIN et al., 1992  Molecular cloning of orangutan and gibbon MHC class I cDNA. The HLA-A and -B loci diverged over 30 million years ago. J. Immunol. 148:2547-2554.[Abstract]

FELSENSTEIN, J., 1993 PHYLIP (Phylogeny Inference Package), Version 3.572. Department of Genetics, University of Washington, Seattle.

FLÜGGE, P., E. ZIMMERMANN, A. L. HUGHES, E. GÜNTHER, and L. WALTER, 2002  Characterization and phylogenetic relationship of prosimian MHC class I genes. J. Mol. Evol. 55:768-775.[CrossRef][Medline]

GO, Y., Y. SATTA, Y. KAWAMOTO, G. RAKOTOARISOA, and A. RANDRIANJAFY et al., 2003  Frequent segmental sequence exchanges and rapid gene duplication characterize the MHC class I genes in lemurs. Immunogenetics 55:450-461.[CrossRef][Medline]

HUGHES, A. L. and M. NEI, 1988  Patterns and nucleotide substitution at major histocompatibility complex class I loci reveal overdominant selection. Nature 335:167-170.[CrossRef][Medline]

JURKA, J. and A. MILOSAVLJEVIC, 1991  Reconstruction and analysis of human Alu genes. J. Mol. Evol. 32:105-121.[CrossRef][Medline]

KAPITONOV, V. and J. JURKA, 1996  The age of Alu subfamilies. J. Mol. Evol. 42:59-65.[CrossRef][Medline]

KIMURA, M., 1980  A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111-120.[CrossRef][Medline]

KLEIN, J., 1987 The Natural History of the Major Histocompatibility. Blackwell, Oxford.

KLEIN, J., and V. HOrEJsÍ, 1997 Immunology. Blackwell, Oxford.

KLEIN, J., R. E. BONTROP, R. L. DAWKINS, H. A. ERLICH, and U. B. GYLLENSTEN et al., 1990  Nomenclature for the major histocompatibility complexes of different species: a proposal. Immunogenetics 31:217-219.[Medline]

KNAPP, L. A., L. F. CADAVID, and D. I. WATKINS, 1998  The MHC-E locus is the most well conserved of all known primate class I histocompatibility genes. J. Immunol. 160:189-196.[Abstract/Free Full Text]

KULSKI, J. K., S. GAUDIERI, M. BELLGARD, L. BALMER, and K. GILES et al., 1997  The evolution of MHC diversity by segmental duplication and transposition of retroelements. J. Mol. Evol. 45:599-609.[CrossRef][Medline]

KULSKI, J. K., S. GAUDIERI, and R. L. DAWKINS, 2000  Using Alu J elements as molecular clocks to trace the evolutionary relationships between duplicated HLA class I genomic segments. J. Mol. Evol. 50:510-519.[Medline]

KULSKI, J. K., P. MARTINEZ, N. LONGMAN-JACOBSEN, W. WANG, and J. WILLIAMSON et al., 2001  The association between HLA-A alleles and an Alu dimorphism near HLA-G.. J. Mol. Evol. 53:114-123.[Medline]

KUMAR, S. and S. B. HEDGES, 1998  A molecular timescale for vertebrate evolution. Nature 392:917-920.

LEE, N., M. LIANO, M. CARRETERO, A. ISHITANI, and F. NAVARRO et al., 1998  HLA-E is a major ligand for the natural killer inhibitory receptor CD49/NKGA2. Proc. Natl. Acad. Sci. USA 95:5199-5204.[Abstract/Free Full Text]

LEIBSON, P. J., 1998  Cytotoxic lymphocyte recognition of HLA-E: utilizing a nonclassical window to peer into classical MHC.. Immunity 9:289-294.[CrossRef][Medline]

MARTIN, R. D., 1993  Primate origins: plugging the gaps. Nature 363:223-234.[CrossRef][Medline]

MILLER, M. D., H. YAMAMOTO, A. L. HUGHES, D. I. WATKINS, and N. L. LETVIN, 1991  Definition of an epitope and MHC class I molecule recognized by gag-specific cytotoxic T lymphocytes in SIVmac-infected rhesus monkeys. J. Immunol. 147:320-329.[Abstract]

O'HUIGIN, C., Y. SATTA, N. TAKAHATA, and J. KLEIN, 2002  Contribution of homoplasy and of ancestral polymorphism to the evolution of genes in anthropoid primates. Mol. Biol. Evol. 19:1501-1513.[Abstract/Free Full Text]

OTTING, N. and R. E. BONTROP, 1993  Characterization of the rhesus macaque (Macaca mulatta) equivalent of HLA-F.. Immunogenetics 38:141-145.[Medline]

PARHAM, P., D. A. LAWLOR, C. E. LOMEN, and P. D. ENNIS, 1989  Diversity and diversification of HLA-A, B, C alleles. J. Immunol. 142:3937-3950.[Abstract]

PARHAM, P., E. J. ADAMS, and K. L. ARNETT, 1995  The origins of HLA-A, B, C polymorphism. Immunol. Rev. 143:141-180.[CrossRef][Medline]

PIONTKIVSKA, H. and M. NEI, 2003  Birth-and-death evolution in primate MHC class I genes: divergence time estimates. Mol. Biol. Evol. 20:601-609.[Abstract/Free Full Text]

SAITOU, N. and M. NEI, 1987  The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406-425.[Abstract]

SATTA, Y., 1993 Balancing selection at HLA loci, pp. 129–149 in Mechanisms of Molecular Evolution, edited by N. TAKAHATA and A. G. CLARK. Sinauer Associates, Sunderland, MA.

SATTA, Y., C. O'HUIGIN, N. TAKAHATA, and J. KLEIN, 1993  The synonymous substitution rate at the primate Mhc loci. Proc. Natl. Acad. Sci. USA 90:7480-7484.[Abstract/Free Full Text]

SHIINA, T., G. TAMIYA, A. OKA, N. TAKISHIMA, and T. YAMAGATA et al., 1999  Molecular dynamics of MHC genesis unraveled by sequence analysis of the 1,796,938-bp HLA class I region. Proc. Natl. Acad. Sci. USA 96:13282-13287.[Abstract/Free Full Text]

SONNHAMMER, E. L. and R. DURBIN, 1995  A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis. Gene 167:GC1-GC10.[CrossRef][Medline]

TAKAHATA, N., 1995  MHC diversity and selection. Immunol. Rev. 143:225-247.[CrossRef][Medline]

TAKAHATA, N., 2001 Molecular phylogeny and demographic history of humans, pp. 299–305 in Humanity From African Naissance to Coming Millenia, edited by P. V. TOBIAS, M. A. TAATH, J. MOGGI-CECCHI and G. A. DOYLE. Firenze University Press, Johannesburg, South Africa.

TAKEZAKI, N., A. RZHETSKY, and M. NEI, 1995  Phylogenetic test of the molecular clock and linearized tree. Mol. Biol. Evol. 12:823-833.[Abstract]

THOMPSON, J. D., D. G. HIGGINS, and T. J. GIBSON, 1994  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680.[Abstract/Free Full Text]

WATKINS, D. I., Z. W. CHEN, A. L. HUGHES, M. G. EVANS, and T. F. TEDDER et al., 1990  Evolution of the MHC class I genes of a New World primate from ancestral homologues of human non-classical genes. Nature 346:60-63.[CrossRef][Medline]

WEGNER, K. M., M. KALBE, J. KURTZ, T. B. H. REUSCH, and M. MILINSKI, 2003  Parasite selection for immunogenetic optimality. Science 301:1343.[Free Full Text]

WILLARD, C., H. T. NGUYEN, and C. W. SCHMID, 1987  Existence of at least three distinct Alu subfamilies. J. Mol. Evol. 26:180-186.[CrossRef][Medline]




This article has been cited by other articles:


Home page
Proc. Natl. Acad. Sci. USAHome page
K. Fukami-Kobayashi, T. Shiina, T. Anzai, K. Sano, M. Yamazaki, H. Inoko, and Y. Tateno
Genomic evolution of MHC class I region in primates
PNAS, June 28, 2005; 102(26): 9230 - 9234.
[Abstract] [Full Text] [PDF]