The size of human brain tripled over a period of ∼2 million years (MY) that ended 0.2–0.4 MY ago. This evolutionary expansion is believed to be important to the emergence of human language and other high-order cognitive functions, yet its genetic basis remains unknown. An evolutionary analysis of genes controlling brain development may shed light on it. ASPM (abnormal spindle-like microcephaly associated) is one of such genes, as nonsense mutations lead to primary microcephaly, a human disease characterized by a 70% reduction in brain size. Here I provide evidence suggesting that human ASPM went through an episode of accelerated sequence evolution by positive Darwinian selection after the split of humans and chimpanzees but before the separation of modern non-Africans from Africans. Because positive selection acts on a gene only when the gene function is altered and the organismal fitness is increased, my results suggest that adaptive functional modifications occurred in human ASPM and that it may be a major genetic component underlying the evolution of the human brain.
AMONG mammals, humans have an exceptionally big brain relative to their body size. For example, in comparison with chimpanzees, the brain weight of humans is 250% greater while the body is only 20% heavier (McHenry 1994). The dramatic evolutionary expansion of the human brain started from an average brain weight of 400–450 g ∼2–2.5 million years (MY) ago and ended with a weight of ∼1350–1450 g ∼0.2–0.4 MY ago (McHenry 1994; Wood and Collard 1999). This process represents one of the most rapid morphological changes in evolution. It is generally believed that the brain expansion set the stage for the emergence of human language and other high-order cognitive functions and that it was caused by adaptive selection (Decan 1992), yet the genetic basis of the expansion remains elusive. A study of human mutations that result in unusually small brains may help identify the genetic modifications that contributed to the human brain expansion. In this regard, primary microcephaly (small head) is of particular interest (Mochida and Walsh 2001; Bondet al. 2002; Kumaret al. 2002). Microcephaly is an autosomal recessive genetic disease with an incidence of 4–40 per million live births in western countries (Mochida and Walsh 2001; Kumaret al. 2002). It is defined as a head circumference >3 standard deviations below the population age-related mean, but with no associated malfunctions other than mild-to-moderate mental retardation (Mochida and Walsh 2001; Kumaret al. 2002). The reduction in head circumference correlates with a markedly reduced brain size. Microcephaly is genetically heterogeneous, associated with mutations in at least five loci (Mochida and Walsh 2001; Kumaret al. 2002), one of which was recently identified and named ASPM (abnormal spindle-like microcephaly associated; Bondet al. 2002). Four different homozygous mutations in ASPM introducing premature stop codons were found to cosegregate with the disease in four respective families, while none of these mutations were found in 200 normal human chromosomes (Bondet al. 2002). Because the brain size of a typical microcephaly patient (430 g; Mochida and Walsh 2001; Kumaret al. 2002) is comparable with those of early hominids such as the 2.3- to 3.0-MY-old Australopithecus africanus (420 g; McHenry 1994; Wood and Collard 1999), I hypothesize that ASPM may be one of the genetic components underlying the human brain expansion. Signatures of accelerated evolution of ASPM under positive selection during human origins would strongly support my hypothesis, because the action of positive selection indicates a modification in gene function resulting in elevated organismal fitness (Zhanget al. 2002). Below I provide population genetic and molecular evolutionary evidence for the operation of such adaptive selection on ASPM.
MATERIALS AND METHODS
Sequencing of ASPM: The human ASPM gene has 28 exons. All 28 exons were PCR amplified from genomic DNA samples of 14 human (Homo sapiens) individuals of different geographic origins (2 African Pygmies, 3 African Americans, 4 Europeans, 2 Southeast Asians, 1 Chinese, 1 Pacific islander, and 1 South American), using the high-fidelity Taq of Invitrogen (Carlsbad, CA). The PCR products were then purified and sequenced in both directions. Polymorphisms that appear only once (singletons) were confirmed by a second PCR-sequencing experiment. The human DNA samples were purchased from Coriell (Camden, NJ). I also amplified all 28 exons from one chimpanzee (Pan troglodytes) and one orangutan (Pongo pygmaeus) and sequenced the insert DNAs after the PCR products were cloned into the pCR4TOPO vector (Invitrogen). To trace the evolutionary origin of a large insertion/deletion in exon 18, two segments (Figure 1, I and II) of exon 18 were also amplified and directly sequenced from genomic DNAs of hyrax (Procavia capensis), sea lion (Zalophus californianus), seal (Phoca vitulina), wolverine (Gulo gulo), fox (Alopex lagopus), dog (Canis familiaris), bear (Ursus maritimus), cat (Felis catus), pig (Sus scrofa), cow (Bos taurus), whale (Balaena mysticetus), rhesus monkey (Macaca mulatta), owl monkey (Aotus trivirgatus), and hamster (Cricetulus griseus).
Data analysis: The dN/dS ratios (Li 1997; Nei and Kumar 2000) in the human, chimpanzee, and orangutan branches of the ASPM gene tree (Figure 2) were estimated using the maximum-likelihood method (Yang 1997), under the model of unequal codon frequencies (CodonFreq = 3) and unequal rates of transitions and transversions. This model fits the data significantly better than those with fixed codon frequencies (CodonFreq = 2) or equal rates of transitions and transversions. The hypothesis of dN/dS = 1 for a given branch and the hypothesis of equal dN/dS ratios between two branches were tested using the likelihood-ratio test, as well as a nonlikelihood method based on inferred ancestral sequences (Zhang et al. 1997, 1998). The ASPM sequence for the common ancestor of humans and chimpanzees was estimated using the Bayesian method (Yang 1997; Zhang and Nei 1997). Population genetic analysis was conducted using DnaSP3.99 (Rozas and Rozas 1999). Tajima's (1989), Fu and Li's (1993), and Fay and Wu's (2000) tests were conducted using coalescent simulations under the assumption of no recombination across the gene. This assumption makes the tests more conservative. When recombination is considered, the P values become lower in Tajima's and Fu and Li's tests, but remain virtually the same in Fay and Wu's test. The recombination rate in the ASPM locus was estimated from Kong et al. (2002) to be 1.8 cM/106 nucleotides. Since the coding sequences analyzed here span 62 × 103 nucleotides of the genome, the recombination rate for the sequence is r = 1.8 × 10–2 × 0.062 = 1.1 × 10–3 recombination/generation. Assuming that the human effective population size is 104 (Takahataet al. 1995; Harpendinget al. 1998), the population recombination parameter (4Nr) equals 4 × 104 × 1.1 × 10–3 = 44. Gene trees of segment I and II sequences from various mammals were reconstructed by the neighbor-joining method (Saitou and Nei 1987) with Kimura's (1980) two-parameter distances. Use of other distances or other methods (Nei and Kumar 2000) does not change the result. Two thousand bootstrap replications (Felsenstein 1985) were used to test the reliability of the trees. The MEGA2 software (Kumaret al. 2001) was used for the phylogenetic analysis.
Computer simulation: Computer simulation of ASPM evolution under pure neutrality was conducted following the procedure described in Zhang and Webb (2003). The rate that an open reading frame (ORF) becomes disrupted is mainly determined by the sequence of the ORF, the rate of point mutations, and the rate of insertion/deletion (indel) mutations. Here I used a point mutation rate of 1 × 10–9/site/year, which was estimated from a genomic comparison between humans and chimpanzees (Yiet al. 2002). The relative mutational frequencies among the four nucleotides have only a negligible effect on the simulation result and I assumed that they are equal. I used an indel mutation rate of 1.1 × 10–10/site/year, which was also estimated from a genomic comparison between the human and chimpanzee (Britten 2002). I assumed that all indels with sizes that are multiples of three nucleotides do not disrupt an ORF. This simplifies the simulation but does not affect the results, because the majority of indels generated by mutations have small sizes (less than or equal to nine nucleotides; Britten 2002). In the above genomic data analyzed, 19% of the total 1019 indels are of sizes that are multiples of three nucleotides. A simulation was then performed for 20,000 replications with the inferred ASPM coding sequence of the common ancestor of humans and chimpanzees. Under no functional constraints, the substitution rate is identical to the mutation rate and mutations are assumed to be random. An ORF is interrupted when an indel of a size that is not a multiple of three nucleotides or a nonsense point mutation occurs. I thus determined the t1/2 for ASPM, or the time required for an intact ORF to be interrupted in half of the simulation replications. Under the above parameters, t1/2 = 0.48 MY. When point and indel mutation rates are both halved, t1/2 = 0.97 MY. The probability that ASPM retains its ORF after T MY of neutral evolution is (½)T/t. This estimation of the rate of pseudogenization is conservative, because other deleterious events such as insertions of transposable elements and null mutations at promoter regions and splicing sites are not considered here. The computer program for the simulation is from Zhang and Webb (2003).
Elevation of dN/dS in the human ASPM lineage: Human ASPM has 28 coding exons, spanning 62 kb in chromosome 1p31 and encoding a huge protein of 3477 amino acids (Figure 1). I determined the entire coding sequences of ASPM from one human, one chimpanzee, and one orangutan, and compared them in the phylogenetic tree of the three species (Figure 2). The orangutan sequence is used as the outgroup for humans and chimpanzees so that nucleotide substitutions on the human and chimpanzee lineages can be separated. I did not sequence the gorilla because the gorilla sequence may not be appropriate as the outgroup due to incomplete lineage sorting (Sattaet al. 2000). Use of orangutan, a slightly more distant outgroup, solves this problem. A commonly used indicator of natural selection at the DNA sequence level is the ratio of the rate of nonsynonymous nucleotide substitution (dN) to that of synonymous substitution (dS). Most functional genes show dN/dS < 1, because a substantial proportion of nonsynonymous mutations are deleterious and are removed by purifying selection, whereas synonymous mutations are more or less neutral and are generally uninfluenced by selection. A gene may occasionally exhibit dN/dS > 1 when a large fraction of nonsynonymous mutations are advantageous and are driven to fixation by positive selection (Li 1997; Nei and Kumar 2000). I estimated the dN/dS ratio for ASPM in each of the three tree branches (Figure 2), using a maximum-likelihood method, and found that dN/dS is lowest in the orangutan branch (0.43), higher in the chimpanzee branch (0.66), and highest in the human branch (1.03). The hypothesis of dN/dS = 1 is rejected for the orangutan branch (P < 0.001, likelihood-ratio test), but not for the other two branches, suggesting a difference in selection has occurred. Indeed, a test of the difference in dN/dS between the human and orangutan branches gives a marginally significant result (P = 0.064), but the difference between the chimpanzee and orangutan branches is not significant (P = 0.29), nor is the difference between human and chimpanzee branches (P = 0.45). Because the dN/dS ratio between the orangutan and mouse (Mus musculus) is also low (0.29), an increase of dN/dS in humans is more likely than a decrease in orangutans. The mouse sequence (GenBank accession no. AF533752) was not included in the phylogeny-based analysis as it is relatively distantly related to the ape sequences and contains multiple insertions and deletions, which would make the inference less reliable. Similar results are obtained when I first infer the ASPM sequence for the common ancestor of humans and chimpanzees and then estimate the dN/dS ratio by counting the numbers of synonymous and nonsynonymous nucleotide substitutions on each branch. For instance, this approach gives dN/dS = 1.13, 0.84, and 0.52, respectively, for the human, chimpanzee, and orangutan branches.
Complete functional relaxation does not adequately explain the elevation of dN/dS: Two hypotheses may explain the increase in dN/dS to 1.03 during the evolution of human ASPM. First, the functional constraints and purifying selection on ASPM may have been completely relaxed and many deleterious nonsynonymous mutations were fixed by random genetic drift. Alternatively, advantageous nonsynonymous substitutions under positive selection occurred at some sites, while purifying selection acted at some other sites, resulting in an average dN/dS of ∼1. Under the first hypothesis, ASPM has been under pure neutral evolution since the human-chimpanzee separation ∼6–7 MY ago (Brunetet al. 2002). Using rates of single-nucleotide mutations and insertion/deletion mutations estimated from human-chimpanzee genomic comparisons (Britten 2002; Yiet al. 2002), I conducted a computer simulation of neutral evolution of ASPM (see materials and methods). I found that the probability that ASPM retains its open reading frame after 6 MY of neutral evolution is extremely low (1.7 × 10–4). Even when the above two mutation rates are both halved, the probability is still very small (0.014), suggesting that ASPM must have been under purifying selection. The fact that nonsense mutations in ASPM lead to microcephaly also demonstrates the presence of functional constraints on the gene. Thus, the hypothesis of complete relaxation of functional constraints and lack of purifying selection for the past 6–7 MY of human evolution is inconsistent with the data, and some sites in ASPM must have been subject to purifying selection (dN/dS < 1). This result would imply, although not prove, that some other sites are under positive selection (dN/dS > 1), so that the average dN/dS across the entire protein is ∼1. However, it is difficult to rule out the possibility of an incomplete functional relaxation in human ASPM, which can lead to a dN/dS ratio of ∼1 when the number of substitutions is relatively small. A population genetic study may help resolve this question.
Signatures of purifying selection from population genetic data: The entire coding sequence of ASPM is determined from 14 human individuals of different geographic origins. A total of 33 single-nucleotide polymorphisms are found (Tables 1 and 2). The derived and ancestral alleles are inferred using the chimpanzee and orangutan sequences as outgroups. Tajima's (1989) and Fu and Li's (1993) tests reveal slight departure of the data from the Wright-Fisher model of neutrality (D =–1.29, P = 0.081; F =–1.76, P = 0.074; Table 2). But Fay and Wu's (2000) test, which is designed to detect recent selective sweeps, does not show a significant result (H = –2.08, P = 0.21). Thus, the negative D and F likely reflect recent population expansions and/or purifying background selection. A recent study suggested that negative D values may also be found under certain sampling schemes if there is fine-scale population differentiation (Ptak and Przeworski 2002). When the synonymous and nonsynonymous sites were analyzed separately, I detected significant negative D and F values at nonsynonymous sites (P < 0.05; Table 2), but not at synonymous sites. H is not significant at either type of site. These results suggest that the nonsynonymous sites in human ASPM are subject to purifying selection. It should be mentioned that the recombination rate in the ASPM region is ∼1.8 cM/106 nucleotides (Konget al. 2002), which translates into 1.1 × 10–3 recombination/meiosis for the sequences analyzed here. This relatively high recombination rate localizes signatures of selection to a small region surrounding the selected sites. This might in part explain the above differences in the test results between synonymous and nonsynonymous sites.
Population genetic theory predicts that deleterious mutations do not reach high frequencies in populations, while neutral and advantageous mutations do. A comparison between rare and common polymorphisms may detect purifying selection of deleterious mutations (Fayet al. 2001). Fay et al. recommended a frequency of ∼10% for the derived allele as a cutoff between rare and common polymorphisms (Fay et al. 2001, 2002). In the present sample of 28 chromosomes, derived alleles that appear one or two times are regarded as rare polymorphisms, and the rest are common. Because of the limited sample size, a truly rare allele may inadvertently appear more than twice in our sample and a truly common allele may inadvertently be regarded as rare. Using probability theory, I computed that the probability of the former error is <5% for an allele with frequency <3% and the probability of the latter error is <5% for an allele with frequency >20%. Thus, the present classification of rare and common alleles is expected to be relatively accurate. I observed that nR = 15 nonsynonymous and sR = 5 synonymous rare polymorphisms and nC = 5 nonsynonymous and sC = 8 synonymous common polymorphisms from the present data (Table 2; Figure 1). The ratio of nC to nR (5/15 = 0.333) is significantly lower than that of sC to sR (8/5 = 1.6; χ2 = 4.41, P < 0.05; Table 2). Since synonymous mutations are more or less neutral, the observed deficit of common nonsynonymous polymorphisms suggests that purifying selection has prevented the spread of nonsynonymous deleterious mutations. It is estimated by the likelihood method that there are N = 7459 and S = 2972 potentially nonsynonymous and synonymous sites in ASPM, respectively. Thus, for rare polymorphisms, there are nR/N = 15/7459 = 2.01 × 10–3 polymorphisms/nonsynonymous site and sR/S = 5/2972 = 1.68 × 10–3/synonymous site. Their difference is statistically insignificant (χ2 = 0.09, P > 0.5). In contrast, for common polymorphisms, the number is significantly smaller per nonsynonymous site (nC/N = 5/7459 = 0.67 × 10–3) than per synonymous site (sC/S = 8/2972 = 2.69 × 10–3; χ2 = 6.98, P < 0.01), confirming that purifying selection has reduced the number of common nonsynonymous polymorphisms. This result also suggests the absence or rareness of advantageous nonsynonymous polymorphisms of ASPM that are currently segregating in humans, as such polymorphisms would predominantly show up as common polymorphisms and render nC/N higher. This is consistent with the above result from Fay and Wu's test. The proportion of nonsynonymous polymorphisms not under purifying selection may be estimated by (nC/N)/(sC/S) = (0.67 × 10–3)/(2.69 × 10–3) = 0.25 or by (nC/sC)/(nR/sR) = (5/8)/(15/5) = 0.21. The two estimates are close to each other and to the dN/dS ratio between the mouse and orangutan (0.29). This indicates that human ASPM is currently under relatively strong purifying selection, and the strength of selection is comparable to or even greater than that in the long-term evolution of mammalian ASPM.
Comparison of polymorphism and divergence suggests past positive selection: Because both the synonymous and nonsynonymous common polymorphisms are largely neutral, comparing them with the fixed substitutions between humans and chimpanzees can reveal the signature of selection that has influenced the substitution processes (McDonald and Kreitman 1991; Fay et al. 2001, 2002; Smith and Eyre-Walker 2002). This comparison shows a significant excess of fixed nonsynonymous substitutions (χ2 = 3.88, P < 0.05, Table 2), suggesting that some nonsynonymous substitutions were fixed by positive selection. Because the expansion of brain size occurred in the human lineage after the human-chimpanzee split, it is more relevant to examine whether the human branch exhibits an excess of nonsynonymous substitutions. For this, the ASPM sequence of the common ancestor of humans and chimpanzees was inferred by the Bayesian method. Because the sequences considered are closely related, this inference is reliable, with the average posterior probability >0.999. Comparing the ancestral sequence with the polymorphic human sequences, I identified 16 nonsynonymous and 6 synonymous mutations that have been fixed in the human lineage (Table 2; Figure 1). Their ratio (16/6 = 2.67) is significantly greater than that for common polymorphisms (nC/sC = 5/8 = 0.63; χ2 = 4.00, P < 0.05). The number of neutral nonsynonymous substitutions may be estimated from the number of synonymous substitutions multiplied by nC/sC, which yielded 6 × (5/8) = 3.75 (Fay et al. 2001, 2002; Smith and Eyre-Walker 2002). The number of nonsynonymous substitutions unexplainable by neutral evolution is 16 – 3.75 = 12, which may have been fixed by positive selection. It should be noted that a recent population expansion can cause an overestimate of the number of adaptive substitutions when slightly deleterious mutations are present. However, such overestimation is unlikely in the present case because the current effective population size of humans, even after the recent expansion, is still smaller than the long-term effective population size separating humans and chimpanzees and the effective population size of the common ancestor of humans and chimpanzees (Takahataet al. 1995; Chen and Li 2001; Kaessmannet al. 2001; Eyre-Walker 2002). It is interesting that there is no significant excess of nonsynonymous substitutions for either the chimpanzee or orangutan branches when the common polymorphisms and substitutions are compared (P > 0.05).
IQ repeats and brain size variation: Human ASPM contains multiple calmodulin-binding IQ repeats (Bondet al. 2002). In a comparison of putative orthologous ASPM genes from the human, mouse, fruit fly (Drosophila melanogaster), and nematode (Caenorhabditis elegans), Bond et al. (2002) noticed that organisms with larger brains have more IQ repeats, implying a possible relation of IQ repeats and brain size. In particular, the predominant difference between the human and mouse ASPM genes is a large IQ-repeat-encoding insertion of 867 nucleotides at the end of exon 18. However, my data showed no difference in the number of IQ repeats between human and chimpanzee ASPM sequences. To trace the origin of the large insertion in human ASPM, I amplified and sequenced from several mammals two DNA segments that cover most of the insertion (Figure 1). Segment I is of 212 nucleotides and segment II is of 706 nucleotides. One or both segments were obtained from species belonging to primates, Cetartiodactyla, Carnivora, and Hyracoidea, but not from mouse or hamster (Figure 3). Phylogenetic analyses were conducted to confirm that the obtained sequences are orthologous to the human sequence (Figure 4). While nonamplification of a sequence does not prove its nonexistence, the amplification of the orthologous sequence indicates its presence. From the recently established mammalian phylogeny (Murphyet al. 2001), it can be inferred that the large human insertion was already present in the common ancestor of most placental mammals, but was deleted in mouse and possibly in other rodents (Figure 3). Thus, this IQ-repeat-containing sequence does not explain the brain size variation among many nonrodent mammals.
In the above, I provided evidence that advantageous amino acid substitutions unrelated to IQ repeats have been fixed by adaptive selection in human ASPM after the human-chimpanzee split, which strongly suggests that ASPM might be an important genetic component in the evolutionary expansion of human brain. The episode of positive selection on ASPM appears to have ended some time ago, as there is no evidence for positive selection on ASPM in current human populations; rather, relatively strong purifying selection is detected. Roughly, selective sweeps occurring in the past 0.5N generations may be detected (Fay and Wu 2000), where N is the effective population size of humans and is thought to be ∼10,000 (Takahataet al. 1995; Harpendinget al. 1998). That is, the positive selection detected in ASPM occurred some time between 6–7 and 0.1 MY ago (0.5 × 10,000 generations × 20 years/generation). The latter date coincides with the suggested time of migration of modern humans out of Africa (reviewed in Cavalli-Sforza and Feldman 2003). It is also interesting to note that although the precise time when positive selection acted on ASPM is difficult to pinpoint, my estimate is consistent with the current understanding that the human brain expansion took place between 2–2.5 and 0.2–0.4 MY ago (McHenry 1994; Wood and Collard 1999). Furthermore, a selective sweep in human FOXP2, a gene involved in speech and language development, has been detected (Enardet al. 2002; Zhanget al. 2002). This sweep was estimated to have occurred no earlier than 0.1–0.2 MY ago (Enardet al. 2002; Zhanget al. 2002). That is, the adaptive evolution of FOXP2 postdated that of ASPM, consistent with the common belief that a big brain may be a prerequisite for language (Decan 1992).
Studies of ASPM in model organisms can help us understand how it impacts brain size. The mouse Aspm is highly expressed in the embryonic brain, particularly during cerebral cortical neurogenesis (Bondet al. 2002). The fruit fly ortholog asp is involved in organizing and binding together microtubules at the spindle poles and in forming the central mitotic spindle (Gonzalezet al. 1990; Wakefieldet al. 2001). Mutations in asp cause dividing neuroblasts to arrest in metaphase, resulting in reduced central nervous system development (Wakefieldet al. 2001). The amino acid substitutions in human ASPM are located in exons 3, 18, 20, 21, and 22 (Figure 1), which encode a putative microtubule-binding domain and an IQ calmodulin-binding domain (Bondet al. 2002). These features suggest that the adaptive substitutions in human ASPM might be related to the regulation of mitosis in the nervous system, which can be tested in the future by functional assays of human ASPM as well as a laboratory-reconstructed ASPM protein of the common ancestor of humans and chimpanzees.
I thank David Webb for technical assistance and Douglas Futuyma, Priscilla Tucker, and David Webb for valuable comments on an earlier version of the manuscript. This work was supported by a start-up fund of the University of Michigan and a research grant from the National Institutes of Health (GM67030).
Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos. AY367065–87.
Communicating editor: S. Yokoyama
- Received July 7, 2003.
- Accepted August 20, 2003.
- Copyright © 2003 by the Genetics Society of America