| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Corresponding author: Wen-Hsiung Li, Department of Ecology and Evolution, University of Chicago, 1101 E. 57th S., Chicago, IL 60637., whli{at}uchicago.edu (E-mail)
Communicating editor: Y.-X. FU
| ABSTRACT |
|---|
An
6.6-kb region located upstream from the melanocortin 1 receptor (MC1R) gene and containing its promoter was sequenced in 54 humans (18 Africans, 18 Asians, and 18 Europeans) and in one chimpanzee, gorilla, and orangutan. Seventy-six polymorphic sites were found among the human sequences and the average nucleotide diversity (
) was 0.141%, one of the highest among all studies of nuclear sequence variation in humans. Opposite to the pattern observed in the MC1R coding region, in the present region
is highest in Africans (0.136%) compared to Asians (0.116%) and Europeans (0.122%). The distributions of
,
, and Fu and Li's F-statistic are nonuniform along the sequence and among continents. The pattern of genetic variation is consistent with a population expansion in Africans. We also suggest a possible phase of population size reduction in non-Africans and purifying selection acting in the middle subregion and parts of the 5' subregion in Africans. We hypothesize diversifying selection acting on some sites in the 5' and 3' subregions or in the MC1R coding region in Asians and Europeans, though we cannot reject the possibility of relaxation of functional constraints in the MC1R gene in Asians and Europeans. The mutation rate in the sequenced region is 1.65 x 10-9 per site per year. The age of the most recent common ancestor for this region is similar to that for the other long noncoding regions studied to date, providing evidence for ancient gene genealogies. Our population screening and phylogenetic footprinting suggest potentially important sites for the MC1R promoter function.
STUDIES of human genetic variation provide a powerful means for elucidating the genetic, evolutionary, and demographic factors shaping the human genome. Such studies have recently been greatly facilitated by the advent of fast sequencing techniques and the abundance of human genomic sequence data, thanks to the efforts of the Human Genome Project. Recent surveys of human genetic polymorphism at the DNA sequence level can be divided into two major groups. The first group investigated regions of the human genome with no known or predicted genes, i.e., noncoding regions (e.g., ![]()
![]()
![]()
![]()
![]()
![]()
![]()
In the present study we examined genetic variation in a 6.6-kb region (on chromosome 16) that is noncoding but is located immediately upstream from the coding region of a well-studied gene, the melanocortin 1 receptor (MC1R). Our purpose is threefold. First, MC1R is a key regulator of melanin synthesis (MC1R expression increases the eumelanin to phaeomelanin ratio in skin) and is the only gene identified thus far to contribute to normal skin pigmentation variation in humans (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
The 6.6-kb region we studied has been partially characterized. It contains a 3.2-kb previously published sequence located upstream from the MC1R coding region (![]()
600-bp sequence immediately upstream from the start codon of MC1R. Binding of a basal transcription factor SP-1 at three sites within the 1200 bp upstream from the start codon was shown by gel shift assay (![]()
![]()
![]()
5 kb of noncoding sequence located farther upstream was available in GenBank (accession no.
AC008145). A 3' untranslated region (UTR) of the KIAA1049 protein gene, expressed in the brain (![]()
We analyzed the 6.6-kb region in 54 humans (18 Africans, 18 Asians, and 18 Europeans) and three outgroups (chimpanzee, gorilla, and orangutan). Additional evolutionary comparisons were made with the mouse MC1R promoter region (![]()
= 4Neµ, and the age of the most recent common ancestor (MRCA) of the sequences in a sample estimated from this region different from those of other regions? (6) Can we identify sites with potential regulatory function in the sequenced region from population screening and comparison with the outgroup primate and mouse sequences?
| MATERIALS AND METHODS |
|---|
DNA samples:
DNA used for this study was from 18 Africans (5 Nigerians, 4 South African Bantu speakers, 2 Biaka pygmies, 2 Mbuti pygmies, 1 !Kung, 1 Kenyan, 1 Kikuyu, 1 Zulu, and 1 Ghanian), 18 Asians (6 Chinese, 5 Indians, 3 Japanese, 2 Vietnamese, and 2 Cambodians), and 18 Europeans (2 French, 2 Germans, 2 Russians, 2 Italians, 2 Swedes, 2 Ukrainians, 1 Finn, 1 Hungarian, 1 Spaniard, 1 Portugese, 1 Norwegian, and 1 Dutch-Irish). To obtain outgroup sequences we used DNA from the common chimpanzee (Pan troglodytes), gorilla (Gorilla gorilla), and orangutan (Pongo pygmaeus).
PCR and sequencing:
We sequenced a total of 6.6 kb of noncoding segments between 3'-UTR of the KIAA 1049 protein gene and the MC1R gene coding region (Fig 1). A minisatellite of
1 kb in length and an
0.3-kb region including parts of the two middle AluS repeats were not sequenced. Both regions proved to be difficult templates. The distribution of repeats within the sequenced region is shown in Fig 1. There are five Alu repeats: four AluS's located next to each other and one AluY.
|
A 6.6-kb noncoding region was amplified in five parts (sequences of the PCR and sequencing primers are presented in the TABLE A11) by touchdown PCR (![]()
![]()
ABI DNA sequence analysis 3.0 was used for lane tracking and base calling. The data were proofread manually and heterozygous sites were detected as double peaks. For each individual, sequences were assembled separately using SeqMan in DNAStar (DNAStar, Madison, WI). Assembled files were carefully checked by eye. Consensus sequences for all individuals were then aligned using MegAlign. Fluorescent traces for each variant site were rechecked again in all individuals. Additionally, all singleton, doubleton, and tripleton sites (variants that appear, respectively, only once, twice, or three times in the total sample) were verified by reamplification and resequencing. No errors were found. All sequences were submitted to GenBank under accession nos. AF387914, AF387915, AF387916, AF387917, AF387918, AF387919, AF387920, AF387921, AF387922, AF387923, AF387924, AF387925, AF387926, AF387927, AF387928, AF387929, AF387930, AF387931, AF387932, AF387933, AF387934, AF387935, AF387936, AF387937, AF387938, AF387939, AF387940, AF387941, AF387942, AF387943, AF387944, AF387945, AF387946, AF387947, AF387948, AF387949, AF387950, AF387951, AF387952, AF387953, AF387954, AF387955, AF387956, AF387957, AF387958, AF387959, AF387960, AF387961, AF387962, AF387963, AF387964, AF387965, AF387966, AF387967, AF387968, AF387969.
Statistical analyses:
The more frequent nucleotide at each polymorphic site in the pooled sample of 54 sequences was selected for the human consensus sequence. The human ancestral sequence was inferred from comparison with the outgroup sequences using parsimony. Nucleotide diversity (
) and its standard error (derived from sampling variance) within and between continents were calculated using DnaSP ver. 3.50 (![]()
and its standard error (derived from sampling variance assuming no recombination) per site were estimated from S (the total number of polymorphic sites) using DnaSP. The distributions of
and
along the sequence were computed using the sliding window option of DnaSP ver. 3.50 with the window size of 750 bp and step size of 25 bp. The distribution of KJC, the average number of nucleotide substitutions per site between species (human, chimpanzee, and gorilla) with the Jukes-Cantor correction, along the sequence was calculated with DnaSP using the same sliding window and step size as above. Repeats were identified with RepeatMasker (http://ftp.genome.washington.edu/RM/RepeatMasker.html). FST was estimated according to Wright. Statistical significance of differences in allele frequencies at individual sites among the three continents was computed using a
2 test with Bonferroni correction for multiple tests.
The HKA test (![]()
![]()
The average numbers of nucleotide differences between human and outgroup sequences were calculated using DAMBE (![]()
) was calculated according to
= d/(2t), where d is the number of nucleotide substitutions per nucleotide site between two sequences and t is the divergence time between the two species. The mutation rate per sequence per generation was calculated as µ =
gL, where L is the sequence length (bp) and g is the generation time (human g = 20 yr). WATTERSON'S (1975) and TAJIMA'S (1983) methods were used to estimate
= 4Neµ, where Ne is the effective population size.
The age of the most recent common ancestor of the sequences in a sample was calculated using FU's (1996) and FU and LI's (1996, 1997) methods. The mode, mean, and 95% confidence interval were computed in terms of years.
Potential binding sites of transcription factors in the human consensus and variant sequences were predicted using TRANSFAC (![]()
![]()
| RESULTS |
|---|
Pattern of sequence variation:
We sequenced
6660 bp of the selected region in 54 humans, one chimpanzee, and one gorilla, and 5789 bp in one orangutan. The GC contents of the sequences are
59%, which is much higher than the genome average of
42%.
A total of 76 variant sites were found among the 108 human chromosomes (Table 1). This included 72 nucleotide substitution sites (95%) and 4 insertions/deletions (indels; 5%). All 72 nucleotide substitution sites had only two alternative nucleotides: 58 (81%) were transitions and 14 (19%) were transversions. Among the four indels, there were two one-nucleotide indels and two two-nucleotide indels. On average 11 or 12 variant sites were found per 1000 bp within the studied region. Among the 76 variant sites (Table 1), 40 (including two indels) were mutations found only in one sequence in the sample (singletons), 6 (including one indel) were found in two sequences (doubletons), and 30 (including one indel) were found in more than two sequences (others). There was an excess of low frequency variants (singletons and doubletons) compared to the other types of variants (46 vs. 30).
|
Two estimates of nucleotide variation were calculated (Table 2). The nucleotide diversity (
), i.e., the average pairwise sequence difference between two random sequences in a sample, was 0.141% per site. The average estimate of
, which is based on the observed number of polymorphic sites in a sample, was 0.211% per site. Under the neutral Wright-Fisher model, these two estimates should be equal. A higher value of
than
(and consequently a negative Tajima's D) implies an excess of low frequency variants compared to high frequency variants, though the excess was not statistically significant (Table 5).
|
|
|
|
Distribution of diversity among continents:
The numbers of variant sites in African, Asian, and European sequences were 58, 32, and 33, respectively. Indians were grouped with East Asians as "Asians" and not with the Europeans on the basis of geography and this grouping is supported by nucleotide diversity (
) between populations calculated from the data. The nucleotide diversity between the Indian and East Asian sequences studied (0.128%) was slightly lower than that between the Indian and European sequences (0.136%). Altogether, the 36 African sequences had 58 variant sites, whereas the 72 non-African sequences had only 42 variant sites (Table 1). There were 34 unique (not present in the other continents) variant sites (including 26 singletons)among the African sequences, while there were only 7 (all singletons) and 9 (7 singletons) unique variant sites among the Asian and European sequences, respectively. Thus, Africans had the largest proportion of variant sites (
2 = 10.63, d.f. = 2, P < 0.005) and the largest proportion of unique variants among variant sites (
2 = 34.14, d.f. = 2, P < 0.001).
In Africans, the number of low frequency variants (Table 1) was only slightly higher than the number of other variants (30 vs. 28), whereas in non-Africans there were almost 50% fewer low frequency variants than the other variants (15 vs. 27), resulting in a positive though nonsignificant Tajima's D (Table 5). The proportion of singletons was higher in Africans (26 singletons out of 58 polymorphic sites) than in non-Africans (14 singletons out of 42 polymorphic sites), but the difference was not statistically significant (
2 = 1.35, d.f. = 1, P = 0.25).
The average pairwise nucleotide diversity (
) was highest in Africans (0.136; 95% C.I. = 0.1140.158), intermediate in Europeans (0.122; 95% C.I. = 0.1040.140), and lowest in Asians (0.116; 95% C.I. = 0.0940.138; Table 2), though all differences were not statistically significant. The
-value in non-Africans (when Asians and Europeans were considered together) was equal to that in Africans (0.136%). In contrast, the
-value (Table 2) was almost twice as high in Africans (0.209; 95% C.I. = 0.0750.343) as in Asians (0.114; 95% C.I. = 0.0360.192) or Europeans (0.110; 95% C.I. = 0.0340.186); however, the differences again were not significant.
We examined differences among the continents at each polymorphic site (excluding singletons and doubletons; Table 3). Some variants were restricted to particular continents. Sites 564, 1028, 1051, and 5313 were polymorphic only in Africans, site 4172 was polymorphic only in Europeans, sites 2973 and 3013 were polymorphic only in Asians and Europeans, site 665 was polymorphic only in Africans and Europeans, and site 4206 was polymorphic only in Africans and Asians. However, the frequencies of the less common variant alleles at these sites were usually low (<0.1; the notable exceptions are sites 1051, 3013, and 4206). Also, the allele frequencies at individual sites were different among the continents. In fact, at 10 polymorphic sites the difference in allele frequencies among continents was statistically significant (Table 3). We calculated Wright's FST to measure the amount of differentiation among the continents (Table 3). The average FST for 76 polymorphic sites was 0.057 (it was 0.112 with singletons and doubletons excluded). FST at individual sites ranged from 0.002 to 0.297.
Distribution of diversity and divergence along the sequence:
As an
1-kb minisatellite-containing region and an
0.3-kb region containing parts of Alu repeats were not sequenced, our sequencing resulted in three continuous fragments: 12488 bp, 24894362 bp, and 43636600 bp (Fig 1; we kept the numbering contiguous in spite of two gaps in the sequence). The subdivision of the sequence into the three fragments is based solely on our inability to sequence through a minisatellite (1 kb long) and the 0.3-kb region of Alus. The AluY repeat was located in 433714 bp, two AluS repeats were located in 38204362 bp, and another two AluS repeats were located in 43634901 bp. We analyzed Alu-containing regions separately because they might have higher values of nucleotide diversity compared to surrounding regions (![]()
![]()
and
were compared among the 5' (1432 bp and 7152488 bp with AluY excluded), AluY subregion (433714 bp), middle (24893819 bp), Alu-rich (38204901 bp), and 3' (4902660 bp) subregions (Table 2); the 3' subregion is adjacent to the MC1R coding sequence (Fig 1). A sliding window analysis of
and
provides a graphical representation of the results (Fig 2A). The average numbers of differences between any two sequences in a sample (
) were about half in the 5' (
= 0.094; 95% C.I. = 0.0820.106) and middle (
= 0.086; 95% C.I. = 0.0700.102) subregions compared to the Alu-rich (
= 0.182; 95% C.I. = 0.1640.200) and 3' (
= 0.179; 95% C.I. = 0.1670.191) subregions (Table 2). This difference was statistically significant and suggests a smaller number of high frequency variants in the 5' and middle subregions compared to the Alu-rich and 3' subregions. The Alu-rich subregion had values of nucleotide diversity (
= 0.213; 95% C.I. = 0.0570.369; see
above) only slightly higher (and not significantly so) than that within the adjacent 3' subregion (
= 0.169; 95% C.I. = 0.0530.285). AluY had very high estimates of
(0.386; 95% C.I. = 0.3180.454) and of
(0.874; 95% C.I. = 0.2481.500) compared to the other subregions analyzed (the difference for
is statistically significant, while it is not for
). This repeat had 13 polymorphic sites within 280 bp, 9 of which were low frequency variants.
|
The distribution of nucleotide diversity along the sequence was different in Africans compared to Europeans and Asians (Table 2; Fig 2A). In Africans there was an excess of low frequency variants over high frequency variants (D was negative, though not significant) in all five subregions. In Asians and Europeans there was an excess of low frequency variants in the middle and AluY subregions (not significant), but there were more high frequency variants than low frequency ones in the other subregions (not significant; Table 2; Fig 2A).
We examined the spatial distribution of divergence on the basis of a comparison of human and chimpanzee and human and gorilla sequences (Fig 2B). The 5', middle, Alu-rich, and 3' subregions had similar divergence. The AluY region had elevated divergence between human and gorilla, but not between human and chimpanzee.
Tests of departure from neutrality:
To examine whether variation in the sequenced region is compatible with neutral evolution, we used several tests. First, using the Hudson-Kreitman-Aguadé (HKA) test (![]()
![]()
![]()
Second, Tajima's test and Fu and Li's test with an outgroup were applied to the pooled sample and to each of the three continent samples (Table 5). When sequences from all three continents were pooled, Tajima's test was not significant, while Fu and Li's D and F were highly significant (P < 0.005). All three test statistics were negative for the pooled sample. Fu and Li's D and F were significantly negative for the pooled sample even when calculated with the AluY excluded (D = -3.65, P < 0.02; F = -2.95, P < 0.02), specifying a significantly high proportion of "young" vs. "old" mutations. When each continent was considered separately, Tajima's test was again not significant. However, Fu and Li's F and D were significantly negative in Africans (P < 0.05) and consistently positive (though not significant) in Asians and Europeans. This suggests that the sequenced region does not evolve neutrally in Africans. Furthermore, while Africans had a significantly high proportion of young mutations, Asians and Europeans had a high proportion of old mutations, although this was not significant.
The distribution of Fu and Li's F-statistic along the sequence is intriguing (Fig 3). For the pooled sample from three continents, F is negative in the 5' and middle subregions of the sequence, but is around 0 in the 3' subregion of the sequence (adjacent to the MC1R coding region). The distribution of F-statistic along the sequence in the African sample is similar to that of the pooled sample. In contrast, in both Asian and European samples F is positive or around zero for most of the sequence length, except for the middle subregion in Europeans.
|
We hypothesize the presence of selection in the middle subregion as it is conserved (compared to the 5' and 3' subregions) and is the only subregion to exhibit a negative Tajima's D value in all three continents (see DISCUSSION). Additionally, Fu and Li's D is significantly negative for the African sequences and when all sequences are analyzed together (Table 6). To distinguish between purifying (or background) selection and directional selection (or hitchhiking) we compare Tajima's test statistic with Fu and Li's D statistic for this subregion (Table 6). Tajima's test statistic is less negative than Fu and Li's D for the pooled sample as well as for the African and European samples. This suggests that purifying but not directional selection is the more likely cause of the pattern observed in the middle subregion (![]()
|
Mutation rate, parameter
, effective population size, and age of the most recent common ancestor:
The average numbers of nucleotide substitutions per site were 2.03% between the human and chimpanzee sequences, 2.59% between the human and gorilla sequences, and 5.62% between the human and orangutan sequences. The substitution rates were estimated to be 1.68 x 10-9, 1.61 x 10-9, and 2.02 x 10-9 per nucleotide per year by using a divergence time of 6 million years between human and chimpanzee, 8 million years between human and gorilla (Table 7), and 14 million years between human and orangutan (data not shown). Other divergence times were also considered (Table 7). The divergence times are based on estimates of ![]()
![]()
|
Different methods were used to calculate the population parameter
. It was estimated to be 8.64 by the average mutation rate per sequence per generation and an effective population size of 10,000 (![]()
![]()
![]()
estimated by Watterson's method is due to an excess of singletons and doubletons. Watterson's and Tajima's
-values were used to estimate the effective population size for several divergence times (Table 7). The results are largely in agreement with the commonly accepted estimate of 10,000.
The age of MRCA was estimated for several values of effective population size for the entire sample, the African sample, and the non-African sample using the average mutation rate of 2.16 x 10-4 per sequence per generation (Table 8). Assuming the commonly used effective population size of 10,000 (![]()
|
Polymorphism at transcription factor binding sites and phylogenetic footprinting:
The potential effect of the variation at polymorphic sites on the function of the MC1R promoter was investigated. We examined whether any of the polymorphisms were located within the binding sites of transcription factors specified by ![]()
![]()
![]()
We also tested whether changes at other sites might be important for MC1R promoter function. Potential transcription binding sites were predicted by comparison with the TRANSFAC database. The variation at 12 variant sites (Table 3) changes the recognition sites of the potential transcription factors. Notably, four of these sites (sites 4485, 5305, 5539, and 6376) had significantly different allele frequencies among the analyzed continents (Table 3).
Comparison of the transcription binding sites predicted by MORO et al. (2000) among human, chimpanzee, gorilla, and orangutan sequences revealed interesting features. The only E-box located within the minimal promoter in human had different copy number in different species. There was only one E-box at this position in human, but three in chimpanzee, and two in gorilla and orangutan (mouse also had two E-boxes at this site). In addition, two of the three experimentally proven SP-1 sites in human were disrupted in orangutan.
From
800 bp of mouse MC1R promoter available (![]()
|
| DISCUSSION |
|---|
High polymorphism in the sequenced region:
The region sequenced in this study is more polymorphic than the other three regions (each
10 kb long) studied with a similar sampling schemeone on chromosome 1 (![]()
![]()
![]()
) are expected to increase with sample size. Our study examined a slightly smaller sample size compared to the ones for the two other autosomal regions and still showed a higher polymorphism. The study on chromosome X (![]()
) in the present region (0.141%) is higher than that in any of the three 10-kb regions (Table 1) and than the average value across 16 loci (0.081%) reported by ![]()
-value in the present region (0.211% or 0.175% with Alus excluded) is higher than that in any of the 16 loci surveyed by ![]()
The high polymorphism in the present region may be due to a high mutation rate (see below), a high recombination rate, and the presence of Alu repeats. The studied region is located on the very tip of the long arm of chromosome 16 (16q24.3), and the local recombination rate there is
3.76 cM/Mb (reported for a marker D16S3037, located about 10 cM from the MC1R gene; ![]()
1.5 cM/Mb; ![]()
1.86 cM/Mb; ![]()
0.16 cM/Mb; ![]()
![]()
![]()
![]()
The presence of Alu repeats contributed to the high level of variation in the sequenced region. The
-value is reduced from 0.141 to 0.120% and the
-value is reduced from 0.211 to 0.175%, when Alus are excluded from the analysis. High polymorphism at Alus is explained by a prevalence of highly mutable CpG dinucleotides in these repeats, especially in young Alus, such as AluY in the present region (![]()
Nonuniform distribution of variants among continents:
Comparison of the patterns of sequence variation among the three continents in the present region and other regions studied with a similar sampling scheme (Table 1; ![]()
![]()
![]()
![]()
was higher in Africans than in Asians or Europeans. The distribution of singletons, doubletons (low frequency variants), and other (high frequency) variants among the continents was investigated. The region sequenced here follows a pattern similar to that observed in the 10-kb region on 22q11.2 (![]()
![]()
![]()
In the present region there was a high range of FST values among sites. Some site variants were unique to particular continents (Table 3), but the frequencies of such variants were usually low. Importantly, allele frequencies at several sites were significantly different among continents (Table 3).
Nonneutral evolution and forces shaping the variation:
Our analyses indicate that the present region has not evolved according to the neutral Wright-Fisher model. Highly significant negative F and D values (implying an excess of rare variants) for the pooled sample suggest purifying (or background) selection, directional selection (or selective sweep), or population growth. Pooling of data from different populations may result in a higher level of population subdivision. However, population subdivision tends to reduce rather than increase the proportion of low-frequency variants (![]()
The distributions of
,
, and Fu and Li's F-statistic (Fig 2A; Fig 3) were not uniform among the continents. This implies either different demographic histories or different selective pressures among the continents. The pattern of distribution of Fu and Li's F-statistic in this 6.6-kb region among three continents is in sharp contrast to the uniform distribution of F in the APOE region among four populations (![]()
![]()
![]()
![]()
The distributions of
,
, and Fu and Li's F-statistic (Fig 2A; Fig 3) were also not uniform throughout the sequenced region. This observation can be explained by differential selective pressures acting on different parts of the sequence (or their different proximity to genes under selection). In particular, the 5' and middle subregions were significantly less variable than the 3' subregion. This suggests either that the 5' and middle subregions evolve under functional constraints or that the 3' region is under diversifying selection. The distribution of variation is also nonuniform among the three continents. In Africans the 5' and middle subregions may be evolving under purifying selection. This hypothesis is supported by significantly negative Fu and Li's D and F (Table 6) in the middle subregion and by marginally significant Fu and Li's F in the 5' subregion (F = -2.15, 0.05 < P < 0.10). In Asians and Europeans part of the 5' subregion close to the KIAA1049 gene and parts of the middle subregion may also be evolving under purifying or background selection (suggested by negative F values; Fig 3). We speculate that an unidentified part of the MC1R promoter, a silencing element that specifies a tissue-specific expression of the MC1R gene, might be located within the 5' or middle region and this important element may be under functional constraints. On the other hand, low polymorphism in the 5' subregion might be explained by its proximity to the KIAA1049 gene.
Positive (though nonsignificant) F and D values in Asians and Europeans (Fig 3) in the 3' subregion and parts of the 5' subregion as well as a high average nucleotide diversity (
) in the 3' subregion suggest that some sites in these subregions may be evolving under diversifying selection (possibly including the ones involved in the MC1R promoter function in Asia and Europe) and/or that the 3' subregion is linked to a gene under diversifying selection (![]()
![]()
Although it is difficult to make firm conclusions because both demographic factors and selection left their signatures in shaping the genetic variation of the same region, the observed pattern of polymorphism is consistent with a population expansion in Africans. We also speculate about (1) a possible phase of population size reduction in non-Africans and (2) possible purifying selection in the 5' and middle subregions. The hypothesis of diversifying selection acting on some sites in the 3' subregion or perhaps on the MC1R coding sequence in Asians and Europeans requires further investigation. The mostly uniform distribution of divergence in human-chimpanzee and human-gorilla comparisons suggests that the evolutionary events shaping this region are recent (i.e., they took place after the human-chimpanzee divergence).
Population parameters and age of the most recent common ancestor:
The mutation rate within the sequenced region estimated from the comparison of human sequences with the chimpanzee and gorilla sequences (1.65 x 10-9 per site per year) is higher than either of the estimates obtained for the 10-kb regions on chromosome 1 (0.74 x 10-9; ![]()
![]()
supports the commonly used value of 10,000. This is consistent with the estimates from the 10-kb regions on chromosome 1 (![]()
![]()
The estimates of the age of MRCA for the entire sample, Africans, and non-Africans were similar to the previous estimates from the 10-kb regions on chromosome 1 (![]()
![]()
![]()
![]()
![]()
![]()
Sites with potential regulatory function and phylogenetic footprinting:
Studies comparing levels of MC1R expression among humans of different continents are not available, but we may hypothesize different levels of MC1R expression among humans with different skin colors. Our study suggests polymorphic sites that may be important for differential MC1R expression and, as a consequence, pigmentation variation in humans. Promoter assays can be used to determine the role of mutations at these sites on the regulation of MC1R expression. Sites that have different allele frequencies among continents and polymorphisms that change recognition sites of potential transcription factors (Table 3) should be examined first. Examination of high frequency polymorphic sites within the minimal promoter may be fruitful, in view of the fact that most high frequency variant sites are conserved in mouse (i.e., they may be functionally important) and may be subject to diversifying selection according to our neutrality tests. In addition, since SP-1 is a transcription factor that determines the basal level of expression (![]()
Phylogenetic footprinting of the MC1R promoter provides interesting insights. Different numbers of E-boxes within the minimal promoter may be an important regulatory mechanism of MC1R expression. It is known that MC1R is expressed at a level much lower in humans than in mice (
100-fold; ![]()
![]()
| ACKNOWLEDGMENTS |
|---|
We are grateful to two anonymous reviewers for valuable suggestions, to Michael Jensen-Seaman for critical reading of the manuscript, to Nathaniel Pearson for assistance, and to Lynn Jorde, Pekka Pamila, and Laszlo Patthy for DNA samples. This work was supported by National Institutes of Health grants GM55759, GM30998, and HD38287.
Manuscript received December 13, 2000; Accepted for publication April 19, 2001.
| APPENDIX |
|---|
|
| LITERATURE CITED |
|---|
ABDEL-MALEK, Z., I. SUZUKI, A. TADA, S. IM, and C. AKCALI, 1999 The melanin-1 receptor and human pigmentation. Ann. NY Acad. Sci. 885:117-133
ADACHI, S., E. MORII, D.-K. KIM, H. OGIHARA, and T. JIPPO et al., 2000 Involvement of mi-transcription factor in expression of alpha-melanocyte-stimulating hormone receptor in cultured mast cells of mice. J. Immunol. 164:855-860
BEGUN, D. J. and C. F. AQUADRO, 1992 Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster.. Nature 356:519-520[Medline].
BENTLEY, N. J., T. EISEN, and C. R. GODING, 1994 Melanocyte-specific expression of the human tyrosinase promoter: activation by the microphthalmia gene and role of the initiator. Mol. Cell. Biol. 14:7996-8006
CHARLESWORTH, B., 1994 The effect of background selection against deleterious mutations on weakly selected, linked variants. Genet. Res. 63:213-227[Medline].
CHEN, F.-C. and W.-H. LI, 2001 Genomic divergences between human and other hominoids and the effective population size of the common ancestor of human and chimpanzee. Am. J. Hum. Genet. 68:444-456[Medline].
DON, R. H., P. T. COX, B. J. WAINWRIGHT, K. BAKER, and J. S. MATTICK, 1991 "Touch-down" PCR to circumvent spurious priming during gene amplification. Nucleic Acids Res. 19:4008
FU, Y.-X., 1996 Estimating the age of the common ancestor of a DNA sample using the number of segregating sites. Genetics 144:829-838[Abstract].
FU, Y.-X., 1997 Statistical tests of neutrality of mutations against population growth, hitchhiking and background selection. Genetics 147:915-925[Abstract].
FU, Y.-X. and W.-H. LI, 1993 Statistical tests of neutrality of mutations. Genetics 133:693-709[Abstract].
FU, Y.-X. and W.-H. LI, 1996 Estimating the age of the common ancestor of men from the ZFY intron. Science 272:1356-1357[Medline].
FU, Y.-X. and W.-H. LI, 1997 Estimating the age of the common ancestor of a sample of DNA sequences. Mol. Biol. Evol. 14:195-199[Abstract].
FULLERTON, S. M., A. G. CLARK, K. M. WEISS, D. A. NICKERSON, and S. L. TAYLOR et al., 2000 Apolipoprotein E variation at the sequence haplotype level: implications for the origin and maintenance of a major human polymorphism. Am. J. Hum. Genet. 67:881-900[Medline].
GOODMAN, M., C. A. PORTER, J. CZELUSNIAK, S. L. PAGE, and H. SCHNEIDER et al., 1998 Toward a phylogenetic classification of Primates based on DNA evidence complemented by fossil evidence. Mol. Phylogenet. Evol. 9:585-598[Medline].
HAMBLIN, M. T. and A. DI RIENZO, 2000 Detection of the signature of natural selection in humans: evidence from the Duffy blood group locus. Am. J. Hum. Genet. 66:1666-1679.
HARDING, R. M., E. HEALEY, A. J. RAY, N. S. ELLIS, and N. FLANAGAN et al., 2000 Evidence for variable selective pressures at MC1R. Am. J. Hum. Genet. 66:1351-1361[Medline].
HUDSON, R. R., M. KREITMAN, and M. AGUADÉ, 1987 A test of neutral molecular evolution based on nucleotide data. Genetics 116:153-159
JOHN, P. and M. RAMSAY, 2000 MC1R gene variation in normally pigmented Southern African individuals. Am. J. Hum. Genet. 67(Suppl. 2):237.
KAESSMANN, H., F. HEISSIG, A. VON HAESELER, and S. PAABO, 1999 DNA sequence variation in a noncoding region of low recombination on the human X chromosome. Nat. Genet. 22:78-81[Medline].
KIKUNO, R., T. NAGASE, K. ISHIKAWA, M. HIROSAWA, and N. MIYAJIMA et al., 1999 Prediction of the coding sequences of unidentified human genes. XIV. The complete sequences of 100 new cDNA clones from brain which code for large proteins in vitro. DNA Res. 6:197-205[Abstract].
MORO, O., R. IDETA, and O. IFUKU, 1999 Characterization of the promoter region of the human melanocortin-1 receptor (MC1R) gene. Biochem. Biophys. Res. Commun. 262:452-460[Medline].
MOUNTJOY, K. G., 1994 The human melanocyte stimulating hormone receptor has evolved to become "super-sensitive" to melanocortin peptides. Mol. Cell. Endocrinol. 102:R7-R11[Medline].
NACHMAN, M. W. and S. L. CROWELL, 2000 Estimate of the mutation rate per nucleotide in humans. Genetics 156:297-304
NACHMAN, M. W., V. L. BAUER, S. L. CROWELL, and C. F. AQUADRO, 1998 DNA variability and recombination rates at X-linked loci in humans. Genetics 150:1133-1141
NICKERSON, D. A., S. L. TAYLOR, K. M. WEISS, A. G. CLARK, and R. G. HUTCHINSON et al., 1998 DNA sequence diversity in a 9.7-kb region of the human lipoprotein lipase gene. Nat. Genet. 19:233-240[Medline].
NICKERSON, D. A., S. L. TAYLOR, S. M. FULLERTON, K. M. WEISS, and A. G. CLARK et al., 2000 Sequence diversity and large-scale typing of SNPs in the human apolipoprotein E gene. Genome Res. 10:1532-1545
PAYSEUR, B. A. and M. W. NACHMAN, 2000 Microsatellite variation and recombination rate in the human genome. Genetics 156:1285-1298
PRZEWORSKI, M., R. R. HUDSON, and A. DI RIENZO, 2000 Adjusting the focus on human variation. Trends Genet. 16:296-302[Medline].