The frequency and HLA-A allelic associations of a HERVK9 DNA structural polymorphism located in close proximity to the highly polymorphic HLA-A gene within the major histocompatibility complex (MHC) genomic region were determined in Japanese, African Americans, and Australian Caucasians to better understand its human population evolutionary history. The HERVK9 insertion or deletion was detected as a 3′ LTR or a solo LTR, respectively, by separate PCR assays. The average insertion frequency of the HERVK9.HG was significantly different (P < 1.083e−6) between the Japanese (0.59) and the African Americans (0.34) or Australian Caucasians (0.37). LD analysis predicted a highly significant (P < 1.0e−5) linkage between the HLA-A and HERVK9 alleles, probably as a result of hitchhiking (linkage). Evolutionary time estimates of the solo, 5′ and 3′ LTR nucleotide sequence divergences suggest that the HERVK9 was inserted 17.3 MYA with the first structural deletion occurring 15.1 MYA. The LTR/HLA-A haplotypes appear to have been formed mostly during the past 3.9 MY. The HERVK9 insertion and deletion, detected by a simple and economical PCR method, is an informative genetic and evolutionary marker for the study of HLA-A haplotype variations, human migration, the origins of contemporary populations, and the possibility of disease associations.
THE major histocompatibility complex (MHC) genomic region on human chromosome 6.21.3 is characterized by extensive nucleotide and indel polymorphisms and multicopy gene families, such as the HLA class I, class II, and C4 class III genes (Dawkins et al. 1999; Shina et al. 2004; Stewart et al. 2004). Many of the MHC genes are involved with the regulation of the immune system against infection and the MHC class I and class II genes have a central role in the immune response via antigen recognition and presentation to T cells (Kulski and Inoko 2003; Prugnolle et al. 2005). The MHC is also a human endogenous retrovirus (HERV)-rich region consisting of at least 12 different HERV family members, including 16 duplicated copies of the LTR16/HERV-16 sequences within the class I region (Kulski et al. 1999) and the HERVK(C4) structural polymorphism (absent or present) within intron 9 of the duplicated complement C4 genes within the class III region (Dangel et al. 1994; Tassabehji et al. 1994; Schneider et al. 2001a). The HERVK(C4) insertion that contributes to the long form of the C4 gene in humans and nonhuman primates (Schneider et al. 2001a) expresses antisense transcripts that may act against exogenous retroviral infections (Schneider et al. 2001b; Mack et al. 2004).
A recent comparative study of the genomic sequences of two different MHC haplotypes has shown that another HERVK sequence is potentially a common structural polymorphism within the MHC class I region where it is present in the PGF cell line with the gene haplotype HLA-A3-B7-Cw7 and deleted from the COX cell line with the gene haplotype HLA-A1-B8-Cw7 (Stewart et al. 2004). This HERVK polymorphic sequence is a member of the HERVK9 (alias HERV-K HML-3) family (Mager and Medstrand 2003; Mayer and Meese 2005) and it is located between the HLA-H and -G genes (HG locus) ∼62.6 kb telomeric of the HLA-A gene (Stewart et al. 2004). The family of HERVK9 endogenous retroviruses has ∼150 full-length copies distributed in the human genome (Mayer and Meese 2005) and it is transcriptionally active in different normal and diseased tissues (Medstrand and Blomberg 1993; Seifarth et al. 2005). The HERVK9 internal sequence is flanked by a 5′ and 3′ LTR sequence, called the MER9 element (Kulski et al. 1999; Kapitonov et al. 2004), and single copies of the MER9 sequence, a solo LTR, are found more frequently in the genome than the internal proviral sequence (Mager and Medstrand 2003). The deletion of HERV internal sequences from the genome usually generates solitary LTR sequences at the deletion loci as a consequence of homologous recombination between the two LTR flanking the provirus (Hughes and Coffin 2004). HERVK9 sequences appear to have been first fixed in the genome ∼35 MYA, before the emergence of the rhesus macaque (Mayer and Meese 2005). Because of its presence in apes, the HERVK9.HG structural polymorphism located between HLA-H and -G (the HG locus) is considered to be a deletion polymorphism (Kulski et al. 2004, 2005) rather than an insertion polymorphism (Barbulescu et al. 1999; Turner et al. 2001).
To better understand the role and evolutionary history of the HERVK9.HG structural polymorphism in the MHC, information is needed about the population frequencies and characteristics of the HERVK9.HG structural polymorphism, particularly as none has yet been compiled. The aim of this study was to determine the frequency of the deletion polymorphism of the HERVK9.HG retroviral sequence within the α-block of the MHC class I region and its association with HLA-A alleles in the DNA samples of 100 Japanese, 100 African Americans, 174 Australian Caucasians, and 50 homozygous B-lymphoblastoid cell lines of different ethnic origins.
MATERIALS AND METHODS
A reference set of 100 Japanese DNA samples genotyped for HLA alleles at the HLA-A, -B, and -DR loci by DNA sequencing was obtained from the Department of Legal Medicine, Shinshu University School of Medicine, Matsumoto, Nagano, Japan. This reference set of DNA samples represents a Japanese population of registered donors from the Nagano region in the Japanese unrelated bone marrow donor registry (Moriyama et al. 2006). A reference set of 174 Australian–Caucasian DNA samples genotyped for HLA alleles at the HLA class I gene loci by DNA sequencing was obtained from The Department of Clinical Immunology and Biochemical Genetics, Royal Perth Hospital, Perth, Western Australia. This reference set of samples represents a predominantly Caucasian (99.6%) population from the seaside town of Busselton in Western Australia (http://www.busseltonhealthstudy.com/). A panel of 100 African–American DNA samples was purchased from Coriell Cell Repositories as Human Variation panel HD100AA (http://ccr.coriell.org/nigms/nigms_cgi/panel.cgi?id=2&query=HD100AA). Another 50 DNA samples, extracted from B-lymphoblastoid cell lines of different ethnic origins and genotyped and/or serotyped for homozygous HLA alleles at the HLA-A, -B and -DR loci, were purchased from the European Collection of Cell Cultures (http://www.ecacc.org.uk/). Additional information about these homozygous cell lines (Table 1) can be obtained at http://www.ebi.ac.uk/imgt/hla/help/cell_help.html. Following HERVK9 PCR and HLA typing (as described below), we renamed the cell lines nos. 32 and 42 in Table 1 from TISI [International Histocompatibility Workshop (IHW) no. 9042] to PMA–TISI and from SSTO (IHW no. 9302) to PMA–SSTO, respectively, because we found that these cell-line DNA products were originally mislabeled. Ethics approval for the use of the human DNA samples in this study was obtained from the Tokai University Ethics Committee as ethics approval no. 07I-38.
The Japanese and Australian–Caucasian DNA samples were previously genotyped for HLA-A alleles to two or four digits by direct sequencing (Moriyama et al. 2006). The African–American DNA samples were genotyped for HLA-A alleles to two or four digits by the PCR–SSOP–Luminex method as previously described (Itoh et al. 2005).
HERVK9 PCR primers:
Two sets of PCR primer pairs were designed for the detection of the HERVK9 deletion and insertion, respectively (Figures 1 and 2). One primer pair (1Se1/3ASe2) was for the detection of the HERVK9 deletion as a solo LTR (MER9) sequence using the sense primer 1Se1 (5′-GTCACCCCCTAGAAGGAGACC-3′) and antisense primer 3ASe2 (5′-CAGAAGACTCAGGATGGAGTCTCC-3′). The other primer pair (3Si2/3ASe2) was for the detection of the HERVK9.HG insertion as a HERVK9-linked-3′ LTR (3′ MER9) sequence using the sense primer 3Si2 (5′-AGATGCAGATCCCGATTCCTGC-3′) and the antisense primer 3ASe2 (5′-CAGAAGACTCAGGATGGAGTCTCC-3′). The PCR primer sets were designed using the MHC genomic sequences that were determined for the COX and PGF cell lines (Stewart et al. 2004). The deletion PCR assay produced an amplified product size of 556 bp, whereas the insertion PCR assay produced an amplified product size of 625 bp. The HERVK9.HG insertion at 6.2 kb is too large to be amplified by the deletion PCR assay.
HERVK9 PCR genotyping:
Each PCR assay was performed in 10-μl aliquots using 2 pmol of each primer (200 nmol/liter), 1 ng of genomic DNA, 0.25 units of TaKaRa LA Taq polymerase, 0.8 μl of dNTP mixture (2.5 mm each), and 5 μl of 2xGC reaction buffer one with 5 mm MgCl2 purchased from TaKaRa, Shiga, Japan. The PCR was performed in eight strips of 0.2-ml thin-walled PCR tubes (QSP) using a GenAmp 9700 thermal cycler (Applied Biosystems, Foster City, CA) programmed for 35 cycles with a denaturation (at 96° for 30 sec) and annealing (at 62° for 3 min) step at each cycle. The reaction products were stained with ethidium bromide and sizes were compared with molecular size markers by horizontal gel electrophoresis in 2% agarose using tris–borate–EDTA running buffer (Figure 2). Control samples (without DNA template) were run to ensure that there was no amplification of contaminating DNA. Reference control DNA from the COX and PGF cell lines were used to verify the identified polymorphisms.
Sequencing of LTR (MER9) PCR products:
Homozygous PCR products of HERVK9 deletions (solo LTR) and HERVK9 insertions (3′ LTR and flanking HERVK9 sequence) were amplified from selected cell lines and sequenced directly with BigDye terminator Cycle Sequencing FS ready reaction kit, Version 3.1 (Applied Biosystems) according to the instructions provided by the manufacturer using the sense and antisense PCR primers as sequencing primers. The sequences were analyzed using an automated DNA sequencer (ABI PRISMTM 3130 DNA sequencer; Applied Biosystems). The MER9 LTR sequenced in this study will appear in the DDBJ/EMBL/GenBank nucleotide sequence databases with the successive accession nos. AB443932–AB443937.
LTR (MER9) sequence data from GenBank:
MER9-LTR DNA sequences for SNP analysis and association with HLA-A alleles were also obtained by extracting the MER9 from genomic sequences that were available within the public DNA database GenBank at NCBI (http://www.ncbi.nlm.nih.gov/). The accession numbers (cell-line name, MHC class I allele) of previously sequenced MER9-LTR within extended genomic sequences that were downloaded for analysis were solo MER9 at the HG locus: BX284699 (SSTO, HLA-A32), CR388220 (DBB, HLA-A2), AL671561 (COX, HLA-A1), AL845454 (QBL, HLA-A26), and BX927141 (MANN, HLA-A29); and for 5′ and 3′ MER9 at the HG locus: CU104658 (GogoA), AL645929 (PGF, HLA-A3), and AC192848 (PatrA).
DNA sequence alignment analysis:
The nucleotide positions of the MER9.HG were first located within the genomic sequence of different accession numbers by using RepeatMasker v3.1.6 (http://www.repeatmasker.org/cgi-bin/WEBRepeatMasker). The MER9 sequences were then manually extracted from the genomic sequences using the BLAST extraction tool at NCBI (http://www.ncbi.nlm.nih.gov/). Multiple alignments of MER9 DNA sequences were examined using the multiple alignment programs provided by the CLC Free Workbench v4 (http://www.clcbio.com/), GeneDoc (http://www.nrbsc.org/gfx/genedoc/index.html), and the CLUSTAL W 1.8 program at DDBJ (http://clustalw.ddbj.nig.ac.jp/top-e.html) with the default settings for “DNA” type. Needle, a Needleman–Wunsch algorithm and part of the EMBOSS Pairwise Alignment Algorithms at EMBL-EBI (http://www.ebi.ac.uk/emboss/align/index.html), was also used to calculate the percentage similarity and to identify the SNP and gap positions between the two MER9 DNA sequences as required.
Divergence date estimations:
The percentage divergence between pairs of LTR (MER9) sequences was calculated by counting and converting the number of nucleotide differences to a percentage difference of their entire length and excluding regions containing deletions (gaps). Corrections were made to account for the presence of multiple mutations at the same site, back mutations and convergent substitutions using the Kimura two-parameter model (Kimura 1980), and the computation algorithm provided as part of the CLUSTALW analysis at DDBJ (http://clustalw.ddbj.nig.ac.jp/top-e.html). Mutation rates of homologous sequence pairs for each solo, 5′, or 3′ LTR element at the HG locus were compared to estimate the duplication times. Because LTRs are identical at the time of retroelement integration, divergence distances were calculated between the 5′ and 3′ LTR of the same element at the HG locus to estimate the time of integration. Comparison of the solo LTR sequences were compared with 5′ and 3′ LTR sequences to estimate their time of deletion. The divergence dates were estimated on the basis that the percentage of divergence rate between pairs of LTR sequences within the primate lineage was on average 10% for synonymous sites with a divergence date of 28 MYA for human and Old World monkeys (Purvis 1995; Goodman et al. 1998; Takahata 2001). The divergence date of 28 MYA for the human and Old World monkeys corresponds to an average nucleotide substitution rate of 3.6 × 10−9 substitutions/site for each year (Tristem 2000; Hughes and Coffin 2004). These time estimations do not necessarily represent exact dates, but provide relative approximations.
The crossing-over percentage (CO%) was calculated as a percentage ratio of the lowest number of HERVK9 insertions or deletions divided by the total number of HERVK9 alleles that were associated with a particular HLA-A allele.
Allele frequencies (AF) were calculated using the formula: AF equals the sum of each individual allele/2N, where N equals the total number of individuals. The test for deviation of the HERVK9 deletion from Hardy–Weinberg equilibrium (HWE) and the allele-frequency difference among Japanese, African Americans, and Australian Caucasians was performed using the web computer program at http://ihg2.helmholtz-muenchen.de/cgi-bin/hw/hwa1.pl (Sasieni 1997) and GenPop software at http://genepop.curtin.edu.au. Heterozygosity (H) was estimated as 1 − (p2 + q2), where p and q are the allele frequencies (Ott 1992). The number and percentage of the HERVK9 insertion or deletion associated with each HLA-A allele (configured as a two-digit A allele, such as HLA-A01 (or A1), HLA-A02 (or A2), and HLA-A11 (or A11), was manually counted and converted to a percentage of the total number of each particular HLA-A allele. Fisher's exact test was used to assess whether a HERVK9 insertion or deletion was associated with particular HLA-A alleles. The statistical methods used to calculate the significance between the HLA-A alleles and the structural polymorphism were corrected for multiple comparisons using the Bonferroni-corrected P-value (pc).
Analyses were carried out in SPlus 7.0 (Insightful, Seattle). The linkage disequilibrium (LD) value D′ was calculated as a pairwise analysis of the association between HLA-A alleles and HERVK9 alleles using the Haploview v4 software (Barrett et al. 2005) downloaded from http://www.broad.mit.edu/mpg/haploview/.
RESULTS AND DISCUSSION
Location of HERVK9.HG polymorphism:
Figure 1 shows a genomic map of the HERVK9.HG deletion and insertion polymorphism located between the HLA-H and -G genes within the MHC class I region of two different haplotypes, HAPL A and HAPL B, respectively. The location of the HERVK9.HG polymorphism is 62.6 kb upstream of the classical HLA-A gene, 5.4 kb upstream of the HLA-H pseudogene, and 44.9 kb downstream of the nonclassical HLA-G gene. The HERVK9.HG internal sequence in HAPL B is flanked by the 5′ and 3′ LTR-MER9 sequences, whereas HAPL A has the solitary LTR-MER9.HG sequence as a remnant of the HERVK9.HG internal sequence deletion. The deletion appears to have occurred ancestrally as a homologous recombination involving the 5′ and 3′ LTR sequences of the HERVK9, leaving behind a solitary copy of the LTR, designated here as solo LTR.HG or sLTR (Figure 2).
PCR detection of the HERVK9.HG insertion and deletion polymorphism:
Because the insertion product size of the internal HERVK9 sequence (HERVK9i) is ∼5.15 kb in length and beyond the amplification efficiency of normal PCR protocols, we developed two separate PCR assays to detect (1) the insertion by amplifying a fragment of the HERVK9i sequence using an HERVK9 internal and external primer set (3Si2 and 3ASe2) and (2) the deletion by amplifying the solitary LTR-MER9.HG sequence using primer sets (1Se1 and 3ASe2) that flank the LTR-MER9 sequence. Figure 2 shows an example of the results of the electrophoresis of the PCR products obtained for Japanese samples and the COX and PGF control DNA. The specificity of the PCR assays was confirmed by using the PAC clones PAC 544A6 that has the duplicated LTR-MER9.AK locus (no amplification) and PAC 779F20 that has the LTR-MER9.HG locus (amplification). The amplification bands were easily visualized, genotyped, and scored as deletion and insertion homozygotes or heterozygotes by employing these two primer sets.
The frequency and the result of the HWE test for the HERVK9 insertion and deletion genotypes in the MHC class I region of 99 Japanese, 97 African American, and 174 Australian–Caucasian DNA samples is shown in Table 2. We were unable to amplify either deletion or insertion PCR products in 1 of the 100 Japanese samples and in 3 of the 100 African–American samples, possibly because of the poor quality of the template DNA or because of nucleotide mutations at the primer binding sites within the template DNA not allowing for hybridizations to occur between the primer(s) and the DNA template. These negative samples were excluded from our frequency analysis of the HERVK9.HG polymorphisms.
The HERVK9 insertion frequency was 0.59 for Japanese, 0.34 for African Americans, and 0.37 for Australian Caucasians with no deviation from HWE (P > 0.05), confirming the reliability of the genotyping method and that the HERVK9 dimorphism is distributed normally in the investigated populations. However, the difference between the Japanese and African Americans or Australian Caucasians in allele frequency for the HERVK9 insertion or deletion was statistically significant (P = 1.083e−06, Pearson's goodness-of-fit chi square, d.f. = 1, χ2 = 23.77). The heterozygosity value for the HERVK9 allele frequencies was estimated to be 0.4838 for the Japanese, 0.4488 for the African Americans, and 0.4662 for the Australian Caucasians.
HLA-A allelic polymorphisms in Japanese, African Americans, and Australian Caucasians:
Table 3 shows the HLA-A allele types, numbers, and percentage of the total number of HLA-A alleles in the 100 Japanese, 100 African American, and 174 Australian–Caucasian DNA samples. HLA-A genotyping by the PCR–SSOP–Luminex method was unsuccessful for eight African–American DNA samples and the HERVK9 PCR failed in three of these samples. Overall, the number of different HLA-A alleles detected was 8 in Japanese with 25 homozygous samples, 16 in Australian Caucasians, and 20 in African Americans. HLA-A homozygosity was more frequent in Japanese (25% of 100 samples) than in the Australian Caucasians (13.8% of 241 samples) or African Americans (13% of 92 samples). The percentage frequency of HLA-A allele distribution for the 100 Japanese samples was fairly typical of previous findings for the Japanese population (Moriyama et al. 2006) with HLA-A24 (38.5%), -A2 (25%), -A11 (12%), -A31 (10.5%), and -A26 (10%) being the five most common alleles. In comparison, the five most common alleles for the African Americans were HLA-A2 (16.5%), -A30 (14%), -A23 (7.5%), -A33 (7%), and -A68 (6%), similar to the findings of a different study (Tu et al. 2007). The African Americans had one or other of the Japanese HLA-A alleles, except for HLA-A20, whereas the Japanese had 7 of the 18 (38.9%) African–American HLA-A alleles. The Australian Caucasians had similar HLA-A allele frequencies as Caucasians from Europe and North America (Middleton et al. 2007; http://www.allelefrequencies.net).
HERVK9 and HLA-A haplotypes in homozygous cell lines:
To test the reliability of the HERVK9.HG PCR assay and examine the linkage between the HLA-A and HERVK9 alleles, we first analyzed a DNA reference set of 50 B-lymphoblastoid cell lines of various ethnic origins that were homozygous for different HLA-A alleles. Table 1 lists the individual homozygous cell lines by name, ID, ethnicity, HLA-A alleles and HLA-B alleles, and the HERVK9 insertion and deletion PCR results. Essentially there was a significant (P < 0.05, two-tail binomial probability test) 100% linkage between the HERVK9 insertion and HLA-A3 (seven samples) or HLA-A24 (five samples) and a 100% linkage between the HERVK9 deletion and HLA-A1 (10 samples) or HLA-A2 (14 samples). The sample numbers for the other HERVK9 and HLA-A linkages were too few to assess statistically.
On the basis of discrepancies in the expected HERVK9 PCR results, we found that two of our commercially purchased cell-line DNA samples, PMA–TISI and PMA–SSTO (cell-line nos. 32 and 42 in Table 1), were mislabeled. The cell-line no. 42, originally named as SSTO, had the HERVK9 insertion instead of the expected HERVK9 deletion as represented by the SSTO DNA sequence with the GenBank accession no. BX284699. The cell-line SSTO was therefore retyped for HLA alleles and found to have HLA-A31, -B15, and -DRB1*08 instead of the previously reported HLA-A32, -B4402, and -DRB1*0403 alleles that were originally linked with SSTO. We therefore renamed this previously mislabeled cell-line DNA sample as PMA–SSTO, although its retyped alleles suggest that it is similar to the cell-line DNA sample no. 7, named SPL. Similarly, we found that the PMA–TISI DNA sample reference no. 32 (originally named TIS) is composed of the allele combination, HLA-A1, -B57, and -DRB1*7 and the HERVK9 deletion instead of the expected HLA-A2402, B3508, and DRB1*1103 and the HERVK9 insertion. These two examples highlight the potential value of using the HERVK9 PCR for detecting mislabeled or contaminated HLA DNA samples.
HERVK9 and HLA-A allelic associations in Japanese, African Americans, and Australian Caucasians:
Table 4 provides a summary of the statistical significance (corrected and uncorrected for multiple comparisons) of association between the HERVK9 insertion or deletion with particular HLA-A alleles in the Japanese, African–American, and Australian–Caucasian DNA samples as determined by the Fisher's exact test for association or for equality of proportions of HERVK9 insertion or deletion homozygotes with HLA-A type. The HLA-A3, -A11, -A23, and -A24 alleles were associated significantly (P < 0.05) with the HERVK9 insertion, whereas HLA-A1 and -A2 were associated significantly (P < 0.05) with the HERVK9 deletion in one or the other of the three populations.
Homozygous HERVK9 deletions were detected in 17 Japanese represented by four different HLA-A alleles: one HLA-A1, 11 HLA-A2, six HLA-A11, and five HLA-A26. Homozygous HERVK9 insertions were found in 32 Japanese individuals represented by six different HLA-A alleles: one HLA-A2, six HLA-A11, 27 HLA-A24, one HLA-A26, 11 HLA-A31, and four HLA-A33. All 13 individuals homozygous for the HLA-A24 allele and all 51 heterozygous individuals with an HLA-A24 allele had the HERVK9 internal insertion. The African–American and Australian–Caucasian association analysis matched the Japanese trends to a large degree, but also showed that HLA-A23 and -A33 were associated significantly (P < 0.05) with the HERVK9 insertion in African Americans while HLA-A1 was associated significantly (P < 0.05) with the HERVK9 deletion in Australian Caucasians. The HERVK9 and HLA-A allelic associations observed for the three populations (Table 4) are consistent with the results of the homozygous HLA-A cell line study (Table 1).
LD and crossing-over analysis of HERVK9/HLA-A haplotypes:
LD analysis using the GenPop software revealed a highly significant (P < 1.0e−5) linkage between the HLA-A and the HERVK9 alleles in the Japanese and African Americans or Australian Caucasians. The multi-allelic D′ estimation by the Haploview software was 0.87 for Japanese, 0.86 for the African Americans, and 0.96 for the Australian Caucasians. Table 5 shows the haplotypes, haplotype frequency, D′ values, and crossing-over percentage between the two loci in the pairwise LD analysis of the HERVK9 and the HLA-A haplotypes in the Japanese, African Americans, Australian Caucasians, and a combination of all three populations. There were 33 HLA-A/HERVK9 haplotypes for all three populations with 25 haplotypes for the African Americans, 19 for the Australian Caucasians, and 12 for the Japanese. The two highest frequencies for all three populations were haplotype 2 (HLA-A2/HERVK9 deletion) at 0.2294 and haplotype 11 (HLA-A24/HERVK9 insertion) at 0.1648. The highest haplotype frequency was haplotype 2 for African Americans and Australian Caucasians and haplotype 11 for Japanese. The results of the LD analysis (Table 5) support the association results shown in Table 4. Taken together, the HERVK9 deletions in the cell lines (Table 1) and the Japanese, African–American, and Australian–Caucasian samples were strongly haplotypic for HLA-A1, -A2, -26, and -A68. The homozygote HERVK9 insertions were strongly haplotypic for HLA-A3, -A23, -A24, -A31, and -A33.
The crossing-over analysis of haplotypes revealed a degree of variability in the crossing-over event between the HLA-A and the HERVK9 alleles. The common HLA-A alleles, such as HLA-A1, HLA-A2, and HLA-A24, had low crossing-over percentages (CO% <1.7%) with the HERVK9 alleles, suggesting that haplotypes 1, 2, and 11 are well established or fixed in the three populations. Most CO% were <7%, suggesting a relatively low level of crossing over between the loci, but some HLA-A alleles showed a high CO% such as 44.4% for HLA-A34, 29% for HLA-A11, and 25% for HLA-A66.
The HLA-A2 allele is distributed widely in humans and found in most ethnic groups of Caucasian, Asian, and African origin, suggesting that it is one of the oldest HLA-A alleles (Tanaka et al. 1997; Tu et al. 2007). There was only one HLA-A2 sample that was associated with the Japanese or Australian–Caucasian homozygous HERVK9 insertion rather than with the more common haplotypic association between HLA-A2 and the HERVK9 deletion, which may be the result of a historical recombination event at a genomic site between the HLA-A locus and the HERVK9 locus, which are ∼62.6 kb apart. The HLA-A68 allele in the African Americans and Australian Caucasians is closely related to the HLA-A2 alleles in sequence and phylogenetic analysis (data not shown), but is less strongly associated than HLA-A2 with the HERVK9 deletion, possibly as a result of the statistically smaller sample numbers. Since the HLA-A68 frequency was relatively similar in the African Americans and Australian Caucasians, but absent in the Japanese, this allele might have evolved more rapidly in Africans and Caucasians than in Japanese as a consequence of more frequent migrations and interbreeding events.
HLA-A30 was the second most frequent allele (14%) after HLA-A2 (16.5%) in the African Americans in our study. Some HLA-A30 alleles were linked to a HERVK9 insertion as in the LBF cell line (HLA-A3001-Cw6-B13-DR7-DQ2) and others to a HERVK9 deletion as in the cell line EJ32B (HLA-A3002-Cw5-B18-DR3-DQ2). In this regard, the different associations and/or linkages between HLA-A30 and HERVK9.HG support previous reports that the HLA-A30 allelic group has evolved into at least two major subgroups of two different ancestral haplotypes (Bodmer et al. 1997; Tanaka et al. 1997; Kulski et al. 2001).
The HLA-A11 allele was associated almost equally with the HERVK9 insertion and deletion allele in the Japanese, defining at least two distinct and relatively frequent HLA-A11 haplotypes. In contrast, HLA-A11 was associated only with the HERVK9 insertion in the Australian Caucasians. A similar variability between the HLA-A11 allele and a polymorphic retroelement was found previously between the HLA-A11 alleles and the AluyHG polymorphic marker in the Japanese (Kulski et al. 2001). The associations between HLA-A11 and HERVK9.HG might vary in the Japanese and not in the Australian Caucasians because (1) the original HERVK9.HG insertion that was previously linked to HLA-A11 was deleted from one or more founder individuals, (2) the original linkage between the HLA-A11 and HERVK9.HG deletion or insertion was lost due to recombinational crossing over in one or more founder individuals, or (3) HLA-A11 was originally linked with the HERVK9.HG deletion, but then a new HERVK9 insertion occurred at exactly the same location by recombination or a new retrotransposition event. While recombinational crossing over is the most likely explanation for the association of a HERVK9 insertion or deletion with a particular HLA-A allele, more extensive population and family studies are needed to support this hypothesis using other haplotype markers in addition to the HERVK9 marker. Nevertheless, the examples of the association between the HERVK9.HG deletion and the different HLA alleles, such as the HLA-A2, -A11, -A30, and -A68 haplotypes, demonstrates the potential usefulness of HERVK9.HG as an informative polymorphic or haplotype marker for population studies of human migration and the origins of contemporary populations.
Estimated dates for HERVK9 insertion and deletion based on the sequence divergence of the LTR-MER9:
The DNA nucleotides of the two LTRs flanking a HERV are considered to be identical in sequence at the time of insertion, but then mutate and diverge in sequence with evolutionary time (Hughes and Coffin 2004). Similarly, the solitary LTR, which results from a hybridization of the 5′ and 3′ LTR sequences with the deletion of the internal HERV sequence, will mutate and diverge in sequence with evolutionary time. If the sequence divergence can be determined for the solo, 5′, and 3′ LTR at a polymorphic locus, then evolutionary dates for the HERV insertion and deletion can be calculated on the basis of the assumption that an average nucleotide divergence of 10% for synonymous sites corresponds to 28 MY for human and nonhuman primates (Tristem 2000; Hughes and Coffin 2004).
Table 6 shows the number and percentage of base differences between the solo, 5′, and 3′ LTR of the HERVK9.HG sequences and the calculated evolutionary time for HERVK9 insertion and deletion events. The percentage of sequence differences between the 5′ and 3′ MER9.HG in humans with the HLA-A3 allele and the gorilla and chimpanzee was 6.2% on average, suggesting that the HERVK9.HG integration occurred ∼17.3 MYA, which is well after the split between the human and the Old World monkey lineages ∼28 MYA (Goodman et al. 1998). This evolutionary date for the HERVK9.HG integration and its evolution in the great apes is confirmed by the presence of the ERVK9 sequence in the HG locus of the chimpanzee (Kulski et al. 2005), gorilla (Sanger Institute, NCBI accession no. CU104658), and orangutan (M. Yawata, personal communication), but not in the rhesus macaque (Kulski et al. 2004), which is an Old World monkey (Goodman et al. 1998).
In addition, the percentage of sequence differences between a representative solo MER9 sequence from a person with the HLA-A1 allele and the 5′ and 3′ MER9.HG in a person with the HLA-A3 allele and the gorilla and chimpanzee was 5.4% on average, suggesting that the HERVK9.HG deletion occurred ∼15.1 MYA in a founding ancestor, which, from an evolutionary perspective, is soon after the HERVK9 insertion. The evolutionary date of 15.1 MYA for the HERVK9.HG deletion can be confirmed in future detailed population studies of the Mhc-A and MER9/ERVK9 sequence variations in the nonhuman great apes, chimpanzee, gorilla, and orangutan.
Despite the large sequence divergence between the solo MER9 and 3′ or 5′ MER9, the sequence divergence of the solo MER9 for six different solo MER9/HLA-A allelic haplotypes is small at an average of 0.53% and only up to 1.3% sequence divergence that has evolved during the past 3.6 MY, which is at a time well after the emergence of the chimpanzee and the gorilla (Goodman et al. 1998; Takahata 2001). Similarly, there is relatively small sequence divergence (average of 0.56%) between different human 3′ MER9 sequences, which suggests that the solo MER9 and the 3′ MER9 sequences have co-emerged in the modern human population possibly as a result of population bottlenecks contributing to minor sequence divergence during the period of 3.6 MY. In comparison, the sequence divergence between the human 3′ MER9/HLA-A3 haplotype and the 3′ MER9 sequence of the chimpanzee (Patr) or gorilla (Gogo) was greater at 2.9 and 2.3%, respectively, which suggests a sequence divergence of 6.4–8.1 MY between the human and the chimpanzee or the gorilla.
Hitchhiking effect and concluding remarks:
This study has provided a unique insight into the haplotypic association between a HERV insertion/deletion polymorphism and HLA-A alleles within the MHC genomic region. A complexity of population genetics and evolutionary factors involving random genetic drift, migration, interbreeding, balancing selection, and hitchhiking effect have probably contributed to the difference in the HERVK9 deletion frequency between the Japanese and African Americans or Australian Caucasians. The strong haplotypic association of the HERVK9.HG polymorphism with particular HLA-A alleles, at loci separated by 62.6 kb, suggests that the HERVK9 deletion frequency is largely dependent on the hitchhiking effect (linkage) of the HLA-A locus, which is under balancing selection. Genetic hitchhiking occurs when alleles at neutral loci are changed in frequency because of a strong association or linkage with alleles at a selected locus (Kojima and Schaffer 1967; Hedrick 1980; Asmussen and Clegg 1981). In this regard, the HLA-A gene is considered to be under dominant balancing or heterozygous selection and the formation of polymorphisms at the HLA-A locus favors fitness or resistance to infections (Takahata et al. 1992; Black and Hedrick 1997). Consequently, the HERVK9 allelic frequency appears to be strongly dependent on its linkage to particular HLA-A alleles that are under balancing selection. This correlation is supported by our results where the HLA-A alleles associated with the HERVK9 deletions have occurred at a total frequency of 35.4% in the Nagano Japanese population and at 63.5% in the African Americans or Australian Caucasians, which is at about the same frequency as the HERVK9 deletion in the respective populations. The association of the HERVK9.HG deletion with the HLA-B alleles in Japanese or the cell lines (data not shown) was more random or mixed than the association between HERVK9.HG and HLA-A alleles, probably because of the greater likelihood of recombination breakpoints in the genomic sequence between the HLA-B and HERVK9.HG loci. It is evident from our analysis that while there is a strong hitchhiking effect between HERVK9 and the HLA-A locus, there is no obvious hitchhiking relationship between the frequency of the HERVK9 deletion and the alleles at HLA-B or HLA-DR (data not shown).
The hitchhiking effect at the HLA-A locus might be complicated by the effect of selection on the HLA-G alleles (Tan et al. 2005) and by competing selection forces acting on the HLA-A and HLA-G genes. Since the HERVK9 locus is slightly closer to the HLA-G than the HLA-A locus, the HERVK9 polymorphism might also be used as a haplotypic and evolutionary marker to examine its linkage with HLA-G alleles. Future analyses could be extended to an examination of disease associations and the haplotypic and evolutionary relationships among HERVK9.HG, HLA-A, HLA-G, and other polymorphic retroelements, such as the haplo-specific Alu elements, which are within close vicinity to the HLA-A gene (Kulski and Dunn 2005). The HERVK9 PCR genotyping is a simple and economical method that can be easily applied in conjunction with the more difficult and expensive HLA-A or HLA-G typing methods in population, evolutionary, and disease studies.
Communicating editor: P. J. Oefner
- Received April 15, 2008.
- Accepted July 9, 2008.
- Copyright © 2008 by the Genetics Society of America