To understand the evolution of human mental activity, we performed population genetic analyses of nucleotide sequences (∼11 kb) from a worldwide sample of 60 chromosomes of the N-acylsphingosine amidohydrolase (ASAH1) gene. ASAH1 hydrolyzes ceramides and regulates neuronal development, and its deficiency often results in mental retardation. In the region (∼4.4 kb) encompassing exons 3 and 4 of this gene, two distinct lineages (V and M) have been segregating in the human population for 2.4 ± 0.4 million years (MY). The persistence of these two lineages is attributed to ancient population structure of humans in Africa. However, all haplotypes belonging to the V lineage exhibit strong linkage disequilibrium, a high frequency (62%), and small nucleotide diversity (π = 0.05%). These features indicate a signature of positive Darwinian selection for the V lineage. Compared with the orthologs in mammals and birds, it is only Val at amino acid site 72 that is found exclusively in the V lineage in humans, suggesting that this Val is a likely target of positive selection. Computer simulation confirms that demographic models of modern humans except for the ancient population structure cannot explain the presence of two distinct lineages, and neutrality is incompatible with the observed small genetic variation of the V lineage at ASAH1. On the basis of the above observations, it is argued that positive selection is possibly operating on ASAH1 in the modern human population.
HOMO sapiens has evolved to adapt to new and diverse environments, showing rapid population expansion since the exodus from Africa, 60–80 thousand years (KY) ago (Watson et al. 1997; Macaulay et al. 2005; Kivisild et al. 2006; Mellars 2006). At the same time, modern humans have acquired specific mental activity (e.g., language, symbols, culture, arts, etc.) (Henshilwood et al. 2002; Mellars 2006; Bouzouggar et al. 2007), a possible driving force for subsequent dispersal around the world (Klein 1999; Mellars 2006). In this process of modern human evolution, it is likely that some genes, especially those related to mental activity, have evolved under natural selection (Nei 1983). Recent studies have reported positively selected genes for mental activity and/or brain development in the human lineage: Abnormal spindle-like microcephaly associated and Microcephalin in relation to brain size (ASPM, Zhang 2003; MCPH1, Evans et al. 2004; Wang and Su 2004, but see Currat et al. 2006; Yu et al. 2007); Dopamine receptor D4 and Monoamine oxidase A in relation to emotional activity (DRD4, Ding et al. 2002; MAOA, Gilad et al. 2002); Forkhead box P2 in relation to language (FOXP2, Enard et al. 2002; Zhang et al. 2002); and Spinocerebellar ataxia type 2 and Pituitary adenylate cyclase-activating polypeptide in relation to neurodegenerative disorders (SCA2, Yu et al. 2005; PACAP, Wang et al. 2005). In addition, many causal genes for several types of mental retardation, possibly related to brain and cognitive development, have recently been reported (Inlow and Restifo 2004; Mervis and Becerra 2007; Schumacher et al. 2007). These genes are further thought to influence mental activity.
Our interest lies in lipid storage diseases (LSDs) such as Gaucher, Tay-Sachs, Farber, and Niemann-Pick diseases, all of which are the result of inherited deficiency of genes whose products are related to sphingolipid metabolism. Deficiency causes intralysosomal accumulation of unmetabolized sphingolipids, ubiquitous components of eukaryotic cell membranes that play important roles in intracellular signaling and membrane structure (Futerman and Hannun 2004; Futerman and Riezman 2005). Sphingolipids regulate neuronal growth rates, differentiation, and death of neurons. This regulation depends on the concentration of sphingolipids in their metabolism pathway (Buccoliero et al. 2002; Buccoliero and Futerman 2003).
Ceramides are at the hub of sphingolipid metabolism and serve as the first point of significant sphingolipid accumulation in the de novo pathway (Hannun and Obeid 2002; Merrill 2002). This pathway involves acid ceramidase (ASAH1) (also known as N-acylsphingosine amidohydrolase, ASAH) (AC, MIM 228000, EC 22.214.171.124) (Rother et al. 1992), which hydrolyzes ceramides into sphingosines and free fatty acids (Gatt 1963). Catalysis of ASAH1 is highly related to neuronal development (Schwarz and Futerman 1997; Ruvolo 2003), and inherited deficiency leads to accumulation of ceramides in various tissues, resulting in Farber disease also known as Farber lipogranulomatosis (Sugita et al. 1972). Farber disease is a rare disorder with an autosomal recessive mode of inheritance. Typical symptoms include painful swelling of joints, hoarseness, and premature death, and depending on the tissues affected by the storage of ceramides, severe nervous system dysfunction (Moser 1995). Several mutations for Farber disease have been reported; a single nucleotide deletion (V96del, Muramatsu et al. 2002) and nine single nonsynonymous mutations (Y36C, V97E, E138V, L182V, T222K, G235R, R254G, N320D, and P362R, Koch et al. 1996; Li et al. 1999; Bär et al. 2001; Muramatsu et al. 2002; Devi et al. 2006). The gene is located on the short arm of chromosome 8 (8p22–p21.3), is ∼28.5 kb long, appears to be a single-copy gene, and encodes 14 exons. Depending on the splicing pattern, ASAH1 is translated into either 395 or 411 amino acids (NP_808592 and NP_004306).
In this article, we applied the long-range haplotype (LRH) test to eight genes associated with the sphingolipid metabolism and found a probable signature of selective sweep in ASAH1. We therefore examined the tempo and mode of ASAH1 evolution in human populations in more detail. Fifteen African and 15 non-African samples were used to analyze genetic variation. The region sequenced is ∼11 kb long and encompasses exons 3–10, where strong linkage disequilibrium (LD) is manifested. The results of LD and polymorphism analyses suggest that a particular group of haplotypes in ASAH1 represents a signature of recent positive Darwinian selection.
MATERIALS AND METHODS
The LRH test:
The LRH test (Sabeti et al. 2002) for eight LSD-associated genes was conducted using the HapMap Project data, which was released in June 2006 (http://www.hapmap.org) (International HapMap Consortium 2005). The eight genes are ASAH1, Glucosidase acid beta (GBA), β-Galactosidase 1 (GLB1), Ganglioside GM2-activator (GM2A), Hexosaminidase A (HEXA), Hexosaminidase B (HEXB), Niemann-Pick disease type C1 (NPC1), and Niemann-Pick disease type C2 (NPC2). For each of Yoruba in Ibadan from Nigeria (YRI) and Northern and Western Europeans in Utah (CEU), the HapMap data of 60 unrelated individuals were analyzed. For Han Chinese from Beijing (CHB) and Japanese from Tokyo (JPT), similar data of 45 and 44 unrelated individuals were used, respectively. To estimate haplotype phases accurately, chromosomes with ≥10% missing genotypes (undetermined genotypes) were excluded. The haplotype phase of these trimmed HapMap data was estimated using the program PHASE v2.1 (Stephens et al. 2001; Stephens and Donnelly 2003). After the phase estimation, haplotypes with a low probability were further excluded. Thus the total number of SNPs and chromosomes used in the LRH test ranges from 83 to 100 and 82 to 120, respectively, as given in supplemental Table 1 at http://www.genetics.org/supplemental/. The extended haplotype homozygosity (EHH) and relative EHH (REHH) in ∼200 kb surrounding specified genomic regions of interest were measured using the software Sweep 1.0 (Sabeti et al. 2002). The significance of the LRH test results was examined with the simulation program ms (Hudson 2002). In the simulation, neutral polymorphism data in a sample of 120 DNA sequences, each of which ∼200 kb in length, were generated without recombination to make the test conservative. One hundred segregating sites were randomly sampled so as to imitate the actual data for the eight LSD-associated genes (supplemental Table 1). One thousand replications were carried out for each of the eight genes. For each gene in each population, the observed and simulated EHH and REHH were compared within the bin that contained haplotypes of the same frequency. The standard deviations of the observed values from the mean in their bin were calculated using the EHH significance calculator option of Sweep 1.0.
DNA samples, PCR, and sequencing:
The 30 human genomic DNA samples used in this study come from 15 Africans (10 Pygmies, 2 African Americans, and 3 Yoruba) and 15 non-Africans (4 Amerinds, 5 Europeans, and 6 Asians). The repository numbers of these samples in the Coriell Cell Repositories are NA10470–10473, 10492–10496, 10469, 10965, 10970, 10975, 11197, 11322, 11324, 11373, 11521, 11587, 13597, 13607, 13617–13618, 13820, 13838, 14537, 14661, 18523, 18853, and 19208.
PCR was used to amplify the part of the ASAH1 gene (∼12.5 kb), ranging from chromosome position 17969623 to 17982155 on chromosome 8 (NCBI build 36.2). The primers were designed using the program Primer3 (Rozen and Skaletsky 2000) and are given in supplemental Table 2 at http://www.genetics.org/supplemental/. PCR was performed with 4 pmol of each primer, 150 ng of human genomic DNA, 0.2 mm dNTPs, 0.7 μl of Elongase enzyme mix (Invitrogen, Carlsbad, CA), and 4 μl of PCR buffer containing 1.9 mm MgCl2 in a total volume of 20 μl. A RoboCycler Gradient 96 (Stratagene, La Jolla, CA) and TGradient (Whatman Biometra, Goettingen, Germany) were used under the following conditions depending on primer pairs: denaturation at 94° for 2 min followed by 40 amplification cycles of 94° for 30 sec, 55°–59° for 30 sec, and 68° for 10 min, and ending with an extension at 68° for 20 min. The amplified products were purified using ExoSAP-IT (United States Biochemical, Cleveland) and sequenced directly. Except for repeated sequences and nucleotides with low quality peaks, the ∼11-kb region was used for subsequent analyses. Sequencing reactions were performed using BigDye Terminator v1.1 and v3.1 cycle sequencing kits (Applied Biosystems, Foster City, CA) and analyzed on an ABI PRISM 377, 3100, and 3730 DNA sequencer (Applied Biosystems). To avoid sequencing errors, the PCR products were read at least twice in both directions. The accession numbers of sequences determined in this study are as follows: AB371370–AB371406.
From the DNA sequence data of the ∼11-kb region of ASAH1, haplotype phases were inferred using the program PHASE v.2.1 (Stephens et al. 2001; Stephens and Donnelly 2003) and fastPHASE 1.0.1 (Scheet and Stephens 2006). All estimated haplotypes were used for further analyses. The nucleotide diversity (π) (Nei and Li 1979) and Tajima's D (Tajima 1989) were computed, and the HKA test (Hudson et al. 1987) was applied using the program DnaSP 4.10 (Rozas et al. 2003). A gene tree for the strong LD region (after excluding two possible recombinants) was constructed and the time to the most recent common ancestor (TMRCA) was estimated using the software Genetree (Griffiths and Tavaré 1995), assuming the effective population size (Ne) of 104 and the generation time (g) of 20 years (Takahata 1993; Klein and Takahata 2002). To estimate the TMRCA in an alternative way, the average p distance between the V and M lineages (pB) was used. The chimpanzee sequence was downloaded from the Ensemble database (ENSPTRT00000037110). The average per-site pairwise distance (d = 0.0174 ± 0.0003) between 60 human and one chimpanzee chromosomes was calculated in the strong LD region of ∼4.4 kb using the program MEGA 3.1 (Kumar et al. 2004). Under the assumption that humans and chimpanzees diverged 5–7 MYA, the between-lineage TMRCA was estimated as (pB/d) × 5–7 MY. The average nucleotide difference between the root haplotype and every individual in the sample was calculated in each of the V and M lineages. The ratio of these differences to d was then used to estimate the within-lineage TMRCA.
Simulations under various demographic models:
We assumed a model of a panmictic population with bottleneck or expansion as well as that of both recent and ancient structured populations. Under the assumption of selective neutrality, the software ms (Hudson 2002) efficiently evaluated the extent of neutral variation such as the π-value. Twenty-two different sets of demographic parameters were examined with 50,000 replications for each (supplemental Figure 1 and supplemental Table 3 at http://www.genetics.org/supplemental/) by commonly specifying the following parameters: the number of chromosomes, 60; the number of segregating sites, 49 (the observed value in the region of strong LD); and the generation time (g), 20 years. The other parameters for models followed the previous studies (Takahata 1995; Marth et al. 2004; Voight et al. 2005; Williamson et al. 2005).
To evaluate the effect of an advantageous mutation on the pattern of polymorphism in comparison with neutral cases, a forward simulation was also carried out (supplemental Figure 2 at http://www.genetics.org/supplemental/). An ancestral sequence of L = 1000 bp length was generated at random. Two hundred copies (2N0 = 200) of the sequences evolved according to a finite-site model of neutral mutations and random sampling, and this process was repeated till equilibrium. Each of the two subpopulations (popS and popN) was composed of these 100 sequences (N0 = 2N1 = 100, where N1 is the size of subpopulation). To trace a particular lineage (N lineage) in a subpopulation, a single mutation for labeling is introduced into a single gene in the popN. It was assumed that neutral mutations scaled by N0 occur at a rate of N0Lμ = 0.4 per generation, where μ is the neutral mutation rate per site per generation, followed by migration (at a rate of N0m = 0.1, 0.01, and 0.001 where m is the migration rate per gene) and random sampling of N0 sequences for the next generation. No recombination was assumed. Repeating this process for an additional 10N0 generations, two subpopulations admix with each other and the two become a single panmictic one. At 0.25N0 generations before this unification, a single advantageous mutation of N0s = 50 was introduced into a non-N lineage in the popS and a new lineage (S lineage) is generated. The time of 0.25N0 generations is long enough for the mutation to fix in the popS. The fixation time of a single advantageous mutation with N0s = 50 within a subpopulation is ∼0.2N0 generations (Takahata 1991). Simulation was terminated when the frequency of the S lineage in the entire population reached ∼0.7. Then the π-values within the S and N lineages and in the entire population were measured in each replication and ∼100 such π-values were collected.
For a neutral case, the population size of N0 was set to be 20, because the simulation of the same scheme takes an enormously long time to collect sufficient data. The simulation was ceased at 0.1N0 generations after the unification of two subpopulations. At any segregating site, the lineage with a nucleotide of its frequency ∼0.7 is defined as the S lineage, while if the frequency is ∼0.3, the lineage with the nucleotide is defined as the N lineage. The π-values within the S or N lineage and in the entire population were calculated in the same way for the case of selection, and ∼1000 π-values within each lineage were collected.
ASAH1 orthologous DNA sequences in mammals and birds were retrieved from the NCBI and Ensembl databases. The accession numbers of the sequences used are as follows: humans (NM_177924), chimpanzees (ENSPTRT00000037110), orangutans (CR859721), rhesus monkeys (ENSMMUT00000021738), mice (NM_019734), rats (NM_053407), cows (ENSBTAT00000014960), dogs (ENSCAFT00000039154), hedgehogs (ENSETET00000012263), opossums (ENSMODT00000024178), and chickens (NM_001006453). These nucleotide sequences were aligned by Clustal X (Thompson et al. 1997) and the alignment was further checked manually. A neighbor-joining (NJ) tree (Saitou and Nei 1987) was constructed by MEGA 3.1 (Kumar et al. 2004). The number of nonsynonymous sites was counted by a modified Nei–Gojobori method (Nei and Gojobori 1986; Ina 1995; Zhang et al. 1998) with the complete-deletion option.
RESULTS AND DISCUSSION
Linkage disequilibrium of LSD-associated genes:
To identify target genes that showed a plausible signature of positive selection, we focused on eight LSD-associated genes whose deficiency clearly results in symptoms of mental retardation. The LRH test (Sabeti et al. 2002) was applied to these genes using the HapMap data of four populations, YRI, CEU, CHB, and JPT (International HapMap Consortium 2005). The test is intended to detect the signature of positive selection, i.e., selective sweep. Positive selection results in such rapid spread of a selected site within a population that recombination does not take place frequently to break down LD at nearby sites. A haplotype containing the selected site becomes dominant while maintaining long-range LD (Sabeti et al. 2006) and resulting in significantly higher EHH and REHH than under neutrality (Aquadro et al. 1994; Sabeti et al. 2002).
The core region of each of the eight LSD-associated genes is determined separately in YRI, CEU, CHB, and JPT. For each gene, the core region turns out to be almost the same in the four populations, or the pattern of LD does not differ among populations. Both the EHH and the REHH of each haplotype in the core region (core haplotype) were measured in either the ∼200-kb region upstream or downstream from the core region and compared with simulation data under a neutral model without recombination (materials and methods). Among these 16 surrounding regions of the eight genes, a particular core haplotype including ASAH1 shows a significantly high EHH and REHH in the ∼200-kb downstream region for all four populations (P < 0.05; Figures 1 and 2, and supplemental Table 5 at http://www.genetics.org/supplemental/). The EHH and REHH associated with the ASAH1 core haplotype are kept high throughout (Figure 2A), whereas no other core regions chosen in the ∼200-kb region exhibit such a tendency. The most parsimonious explanation for the sharing of this pattern across all four populations is that the sweep occurred prior to the radiation of modern humans out of Africa.
Coalescence analysis of ASAH1 sequences:
About an 11-kb region of the ASAH1 gene was sequenced for 15 Africans and 15 non-Africans. The region includes exons 3–10, which are part of the functional domain, and covers the strongest LD block with significantly high EHH and REHH (Figure 2B). There are 106 segregating sites and 37 haplotypes within this region (Figure 3). The small number of haplotypes relative to the large number of segregating sites directly indicates fairly strong LD in this region. Nonetheless, we classified the region into two in terms of LD, a ∼4.4-kb subregion under strong LD (SL) and the remaining ∼6.6-kb subregion under moderately strong LD (ML). With an ancestor sequence being inferred from a chimpanzee sequence, the gene tree constructed for the SL subregion (Figure 4) reveals two distinct lineages. They are distinguished at two nonsynonymous polymorphic sites, M72V and I93V. One lineage possesses Val (derived type) at amino acid site 72 and Ile (ancestral type) at amino acid site 93, whereas the other possesses Met (ancestral type) and Val (derived type) at these sites, respectively. The two lineages are named M and V with respect to this M72V dimorphism and are found in both the African and the non-African samples (supplemental Table 6 at http://www.genetics.org/supplemental/). The TMRCA between the V and M lineages is estimated as 2.4 ± 0.4 MY from the average pairwise nucleotide differences (pB) (materials and methods). This is relatively old compared to the average TMRCA of 0.8 MY, if Ne = 104 and g = 20 years (Takahata 1993; Klein and Takahata 2002). However, there are several reports about genes with TMRCA >2 MY (Harris and Hey 1999; Barreiro et al. 2005; Garrigan et al. 2005a; Stefansson et al. 2005; Garrigan and Hammer 2006; Hayakawa et al. 2006). Except for one gene (Garrigan et al. 2005b), the ancient TMRCA of these genes is attributed to the presence of two distinct lineages in Africa (Satta and Takahata 2004). In particular, Hayakawa et al. (2006) show that the TMRCA at the CMP-N-acetylneuraminic acid hydroxylase (CMAH) locus is 2.9 MY and suggest that this rather ancient TMRCA may result from partially isolated populations in the Pleistocene period in Africa. To compare the TMRCA at ASAH1 with that at other loci, we applied the HKA test to the nucleotide diversity and divergence at 10 loci including CMAH (Hayakawa et al. 2006) and those at ASAH1, but no significant differences are detected in the test. This suggests that the TMRCA at ASAH1 is not exceptional (supplemental Table 7 at http://www.genetics.org/supplemental/).
The π-value (Nei and Li 1979) in the SL subregion of 60 chromosomes is high (0.37 ± 0.02%; Table 1), more than four times higher than the average value in the human genome (0.08%; Sachidanandam et al. 2001). It was examined whether this large π is compatible with demographic models that have been proposed for human evolution so far by computer simulation (see materials and methods, and supplemental Figure 1). The probability of π > 0.37% from simulated polymorphism data was estimated under neutrality. The results of 50,000 replications showed that the probability was <0.03 except for the ancient population-structure model (supplemental Table 4 at http://www.genetics.org/supplemental/). It should be noted that this ancient population-structure model (supplemental Figure 1H) is consistent with π-values at other loci, i.e., more than half of the 10 loci used for the HKA test (data not shown).
The frequency of V and M in the total sample is 0.62 and 0.38, respectively (supplemental Table 6). Under the assumption of Hardy–Weinberg equilibrium, the expected heterozygosity with V and M is 0.47. However, the observed value in the sample is 0.23, which is significantly lower than the expectation (P < 0.005, chi-square test). This heterozygosity deficiency is also observed in both the African and non-African samples. Under overdominance selection, we may also expect an excess of heterozygotes over Hardy–Weinberg equilibrium, although the extent is not necessarily large if mating occurs at random every generation. The deficiency is inconsistent with the possibility of overdominance that might maintain two allelic lineages for a long time (Takahata 1990). Alternatively, this deficiency may suggest that the V and M lineages have been maintained in a partially isolated subpopulation until the exodus from Africa. When migration is limited, it is likely that genes within each subpopulation coalesce to a common ancestor and form a single cluster in a tree (Takahata 1991). Subpopulation-specific lineages tend to be reciprocally monophyletic and independent mutations can accumulate in a lineage-specific manner. This pattern is consistent with the observed tree topology. In this context, it should be noted that the pattern of genetic diversity of ASAH1 and other loci is compatible with the proposal that the human population was once geographically structured and genetically differentiated in Africa (Takahata 1995; Satta and Takahata 2004).
The value of nucleotide diversity (π) within the V lineage (πVSL = 0.05 ± 0.01%; Table 1) is smaller than the overall π-value on chromosome 8 (0.08%) (Sachidanandam et al. 2001) on which ASAH1 is located. More importantly, the πVSL value is significantly smaller than the nucleotide diversity within the M lineage (πMSL = 0.13 ± 0.02%). Furthermore, the number of haplotypes in the V lineage is only 6, yet it is 11 in the M lineage. These hold true in both African and non-African samples (Table 1). We can therefore expect that the TMRCA within the V lineage is younger than that within the M lineage as far as the SL subregion is concerned (Figure 4). Indeed, the TMRCA of the V lineage is estimated as 200 ± 50 KY from Genetree analysis (Griffiths and Tavaré 1995) and 340 ± 80 KY on the basis of the average nucleotide diversity (materials and methods). On the other hand, the TMRCA of the M lineage is 320 ± 70 KY from the Genetree analysis and 680 ± 180 KY from the nucleotide diversity. Compared with the M lineage, the relatively recent origin of the predominant V lineage implies that it has been rapidly increasing in frequency. In accordance with this, Tajima's D value is negative (−0.11) for the V lineage and positive (0.27) for the M lineage, although both are not statistically different from zero (P > 0.1) (Tajima 1989).
To see if the reduced level of polymorphism within the V lineage is restricted to the SL subregion, the π-value is compared with that of the ML subregion defined as moderately strong LD (Figure 5). Possible recombination in the ML subregion does not allow us to make phylogenetic analysis and the concept of lineage per se becomes equivocal. For this reason, we simply defined the V and M lineages in the ML subregion as those that are linked with V and M in the SL subregion. We then calculated the nucleotide diversity as πVML = 0.10 ± 0.02% and πMML = 0.18 ± 0.02%. They are not significantly different from each other (P > 0.05, Z-test), but πVML is significantly larger than πVSL (P < 0.01, Z-test; Table 1).
As mentioned earlier, the frequency of the V lineage in the total sample is higher (0.62) than that of the M lineage (0.38). The predominance of the V lineage is observed in both Africans and non-Africans (supplemental Table 6). The HapMap data also gives similar results: 0.83 in YRI, 0.56 in CEU, and 0.67 in both CHB and JPT. Despite the relatively high frequency of the V lineage irrespective of data and populations, the π-value is smaller and the within-lineage TMRCA is shorter than the corresponding value in the M lineage. All these features are consistently explained by positive Darwinian selection operating for the V lineage. In addition, it should be noted that this small genetic diversity was limited to the SL subregion, suggesting that the target of the selection is located within the subregion.
Lineage-specific amino acid changes:
We have also attempted to identify a target site of positive selection. In the SL subregion of ASAH1, there are 46 segregating sites in introns, one synonymous and two nonsynonymous segregating sites. The synonymous mutation is observed in only one chromosome (V040), whereas two nonsynonymous mutations (M72V and I93V) are “fixed” within each lineage. Regarding M72V, the Val is a derived and human-specific amino acid, because in chimpanzees, orangutans, and rhesus monkeys the amino acid at this site is exclusively occupied by Met. Regarding I93V, although the Val is shared by rhesus monkeys, but the site is occupied by Ile in chimpanzees and orangutans (Figure 6). It is likely that site 93 is subject to recurrent substitution to Val.
The SL subregion contains exon 3 (31 amino acids) and exon 4 (29 amino acids). Although these exons do not encode the active center of ASAH1, mutations in these exons are associated with Faber disease. Three such nonsynonymous mutations are Y36C, V96del, and V97E (Bär et al. 2001; Muramatsu et al. 2002). It thus appears that these exons encode important functional parts of ASAH1. It is interesting to note that the 659-bp region encompassing M72V did not accumulate any mutations within the V lineage (Figure 5). This is again a signature of selective sweep on the region encompassing Val at site 72, suggesting the amino acid might be a likely target of positive selection for the V lineage.
Theoretical considerations of natural selection operating on ASAH1:
The above results show that positive selection is likely to have favored the V lineage commonly sharing Val at amino acid site 72 and showing reduction of genetic diversity in the lineage despite its predominance (the large population size). To examine the relationship between π within a lineage and population size with and without selection, we carried out computer simulations under the ancient population-structure model (materials and methods; supplemental Figure 2). To check the validity of the computer program for the model, we compare simulation results with the theoretical expectation for neutral cases.
To obtain the theoretical formulas for the coalescence time of two neutral genes at a locus, we use Equation 4 for a finite island model in Takahata (1995). The model assumes that the ancestral panmictic population of effective size M has been subdivided into l island populations with effective size N1 for ti generations. Our model is different in that the subpopulations have further admixed to form a panmictic population of effective size N0 for subsequent tm generations to the present. We sample two genes from this panmictic population and derive the formula of the mean coalescence time. Although the formula is complicated for any values of l, M, N1, and N0, it is much simplified for the case of l = 2, M = N0 = 2N1, and this simplified formula is sufficient for our present purpose.
The coalescence can occur in the admixed population during the period of tm generation. This conditional TMRCA, T0, is readily given by(1)where a = . On the other hand, if two genes do not coalesce during tm, the two ancestral gene lineages reside in the same subpopulations with probability P(0) and in two different populations with probability Q(0) in Takahata's designations. Obviously, in the present model with l = 2. Equation 4 in Takahata (1995) then reduces to(2)whereNoting that this coalescence is conditional with probability a and from Equations 1 and 2, we obtain the formula of the unconditional coalescence time T = T0 + aT1:(3)For 4N0m > 1, T approaches to 2N0 generations, whereas for small 4N0m, T becomes large in proportion to the reciprocal of m.
Finally, to define the effective population size and the expected nucleotide diversity π for the present nonequilibrium demographic model, we set Ne = T/2 as in Nei and Takahata (1993) and obtain(4a)(4b)Under neutrality, simulation assumes N0Lμ = 0.4, L = 1000, and N0 = 20. In a wide range of N0m, π in Equations 4a and 4b agrees well with that in simulation (Table 2). The same simulation also demonstrates that the π-value within the predominant lineage (S lineage) is significantly larger than that within the other lineage (N lineage), even though with limited gene flow (4N0m = 0.004–0.4) (Table 2; t-test, P < 1E − 130). Thus neutrality cannot explain the observed reduction of nucleotide diversity in the V lineage in the SL subregion. On the other hand, in the presence of selection, the mean and variance of nucleotide diversity in the S lineage are significantly smaller than those in the N lineage with any tested migration rate (Table 2; F-test, P < 1E − 10; t-test, P < 1E − 6). Selective sweep can thus result in reduced nucleotide diversity within a lineage that carries an advantageous mutation. This pattern is exactly what we observed at ASAH1.
The pattern and level of genetic variability at ASAH1 cannot be explained by demographic causes alone and/or overdominance selection. Rather, they suggest positive Darwinian selection operating on the predominant V lineage of a relatively recent origin. It is likely that the human-specific Val at amino acid site 72 is a possible target for positive Darwinian selection and may have occurred before the most recent common ancestor of all V lineage haplotypes, 200–340 KYA. It is speculated that, when modern humans dispersed from Africa, admixture of the distinct V and M lineages occurred and the V lineage has since spread in the entire population by possible positive selection. Although there is no functional assay of the Val substitution at present, the Val may influence the enzyme activity of ASAH1, thereby affecting catalysis of ceramides and possibly neuronal development as well. Further comprehension of ASAH1 and other genes could shed light on the evolution of mental activity in modern human evolution.
We thank Naoyuki Takahata, Tatsuya Ota, and Andrew G. Clark for their critical reading of the manuscript and for their numerous suggestions and three anonymous reviewers for their helpful comments on an early version of the manuscript. This study was supported in part by an intramural grant from the Hayama Center for Advanced Studies.
- Received October 25, 2007.
- Accepted December 21, 2007.
- Copyright © 2008 by the Genetics Society of America