Genetics, Vol. 150, 835-861, October 1998, Copyright © 1998
Genetic Variation and Phylogeography of Central Asian and Other House Mice, Including a Major New Mitochondrial Lineage in Yemen
Ellen M. Prager1,a,
Cristián Orrego1,b,c, and
Richard D. Sage2,d,e
a Division of Biochemistry and Molecular Biology, University of California, Berkeley, California 94720-3202,
b Museum of Vertebrate Zoology, University of California, Berkeley, California 94720-3160,
c Conservation Genetics Laboratory, Department of Biology, San Francisco State University, San Francisco, California 94132-1722,
d Division of Biological Sciences, University of Missouri, Columbia, Missouri 65211
e Department of Biological Sciences, University of California, Santa Barbara, California 93106
Corresponding author:
Ellen M. Prager, Department of Biology, San Francisco State University, 1600 Holloway Avenue, San Francisco, CA 94132-1722., emprager{at}sfsu.edu (E-mail).
Communicating editor: W. F. EANES
 | ABSTRACT |
|---|
The mitochondrial DNA (mtDNA) control region and flanking tRNAs were sequenced from 76 mice collected at 60 localities extending from Egypt through Turkey, Yemen, Iran, Afghanistan, Pakistan, and Nepal to eastern Asia. Segments of the Y chromosome and of a processed p53 pseudogene (
p53) were amplified from many of these mice and from others collected elsewhere in Eurasia and North Africa. The 251 mtDNA types, including 54 new ones reported here, now identified from commensal house mice (Mus musculus group) by sequencing this segment can be organized into four major lineagesdomesticus, musculus, castaneus, and a new lineage found in Yemen. Evolutionary tree analysis suggested the domesticus mtDNAs as the sister group to the other three commensal mtDNA lineages and the Yemeni mtDNAs as the next oldest lineage. Using this tree and the phylogeographic approach, we derived a new model for the origin and radiation of commensal house mice whose main features are an origin in west-central Asia (within the present-day range of M. domesticus) and the sequential spreading of mice first to the southern Arabian Peninsula, thence eastward and northward into south-central Asia, and later from south-central Asia to north-central Asia (and thence into most of northern Eurasia) and to southeastern Asia. Y chromosomes with and without an 18-bp deletion in the Zfy-2 gene were detected among mice from Iran and Afghanistan, while only undeleted Ys were found in Turkey, Yemen, Pakistan, and Nepal. Polymorphism for the presence of a
p53 was observed in Georgia, Iran, Turkmenistan, Afghanistan, and Pakistan. Sequencing of a 128-bp
p53 segment from 79 commensal mice revealed 12 variable sites and implicated
14 alleles. The allele that appeared to be phylogenetically ancestral was widespread, and the greatest diversity was observed in Turkey, Afghanistan, Pakistan, and Nepal. Two mice provided evidence for a second
p53 locus in some commensal populations.
WITHIN the past two decades, a number of important issues about the genetic variation and phylogenetic relationships of members of the house mouse species group have been resolved, and data are accumulating steadily with respect to several remaining fundamental questions about the extent and organization of the variation in wild mice and the relationships, origin, and radiation of the commensal taxa (e.g., see BOURSOT et al. 1993
, BOURSOT et al. 1996
; SAGE et al. 1993
; MORIWAKI et al. 1994
; DIN et al. 1996
; PRAGER et al. 1996
; BOISSINOT and BOURSOT 1997
). Thus, it has been demonstrated that the three aboriginal speciesMus spicilegus, M. macedonicus, and M. spretus, each of which occupies limited ranges in Europe, western (W) Asia, and North Africalie phylogenetically outside the commensal clade. The preponderance of evidence indicates that M. spretus is an outgroup to all the other house mouse taxa.
The native range of the commensal house mice collectively is all of Eurasia plus North Africa. According to the most commonly used system, they can be divided into three or four taxa that, in a binomial classification, are designated M. domesticus of W Europe, North Africa, and the Middle East; M. musculus of eastern (E) Europe and northern (N) Asia; M. castaneus of southeastern (SE) Asia; and M. bactrianus of south-central (SC) Asia from Iran to N India. (In the trinomial classification system, these taxa would be called M. m. domesticus, M. m. musculus, M. m. castaneus, and M. m. bactrianus.) M. bactrianus is the least well defined and characterized taxon, and it is not known whether it is a cohesive genetic entity. On a broader scale, the genetic constitution of the central populationsfrom the Indian subcontinent, Afghanistan, and Iranand their genetic affiliations with the other taxa are just now being elucidated, and it has been suggested (BOURSOT et al. 1993
, BOURSOT et al. 1996
; DIN et al. 1996
) that assignment of a particular taxonomic name to members of the central populations (including those previously called M. bactrianus) be held in abeyance. [Mice from many central populations have been categorized as M. domesticus on the basis of morphological criteria (MARSHALL and SAGE 1981
).]
The corollary issues being addressed concern the geographic origin of the commensal clade and the modes and routes of radiation giving rise to the diverse species and populations over their present-day ranges. The geological feature of primary importance in understanding the past and present ranges of house mice is the east-west wall of high mountains that runs through Europe and Asia. This backbone of Eurasia, which in Central Asia encompasses the ranges from the Caucasus to the Himalayas, is the major geographic barrier that keeps M. musculus in northern Eurasia, away from the commensal taxa that inhabit southern (S) Eurasia. The Zagros Mountains, which run NS through W and S Iran, may well act in the same way and form the major geographic barrier that keeps M. domesticus in the west, away from other SC Asian mice. These mountain massifs act as barriers to mice during both glacial periods (when the higher elevations are colder and even glaciated) and interglacials [when these mountains become forested and, thus, also inhospitable to house mice (SAGE 1981
)]. Explaining where and how ancestral house mice got from one side of these barriers to the other is a significant challenge for any hypothesis of commensal mouse origin and radiation.
A consensus is lacking as to whether the commensal house mouse taxa should be regarded as full species or as subspecies or perhaps as semispecies (e.g., see SAGE et al. 1986
, SAGE et al. 1993
; AUFFRAY et al. 1990A
; BOURSOT et al. 1993
; BONHOMME et al. 1994
; PRAGER et al. 1996
; references therein). Thus, on the basis of evidence of separate gene pools, notably of M. domesticus and M. musculus in Europe, R. D. Sage and E. M. Prager have denoted them as full species, while other investigators, including P. Boursot, F. Bonhomme, and co-workers (e.g., BOURSOT et al. 1993
, BOURSOT et al. 1996
; MORIWAKI et al. 1994
; DIN et al. 1996
), designate them as subspecies in light of appreciable evidence for a continuum of interbreeding populations over much of Eurasia. These contrasting views become more understandable if M. musculus is a ring species (BONHOMME et al. 1994
; DIN et al. 1996
), with the secondary contact in Europe occurring between the most divergent, longest separated forms. Here we designate the taxa as full species, but recognize that it may ultimately prove appropriate to denote at least some commensal populations as members of subspecies.
Recent investigations have addressed the questions of the genetic make-up of the SC Asian populations and the origin and radiation of house mice by restriction analysis (BOURSOT et al. 1996
) and sequencing (BOISSINOT and BOURSOT 1997
) of mitochondrial DNA (mtDNA); by electrophoresis of proteins encoded by autosomal loci and restriction analysis of three genes on chromosome 6 (DIN et al. 1996
); and by Southern blotting, PCR amplification of a variable length marker and of microsatellites, and sequencing of the Y chromosome (NAGAMINE et al. 1992
; BOISSINOT and BOURSOT 1997
). The mice studied came from N and S India, several localities in Pakistan, and N and E Iran. The central populations were found to be highly polymorphic for nuclearly encoded proteins and mtDNA in comparison to the populations recognized as M. domesticus, M. musculus, and M. castaneus from around the periphery of the Eurasian land mass. Most of the mtDNAs fell into a diverse group of types that BOURSOT et al. 1996
and BOISSINOT and BOURSOT 1997
call "oriental" (and we call castaneus), while some from Iran were musculus types. Of two categories of Y chromosome, the type found in M. domesticus was detected in the Indian and Pakistani mice, while the Ys in Iran were of the type found in M. musculus and peripheral populations of M. castaneus. These molecular and biochemical data provided the foundation for the hypothesis of the northern part of the Indian subcontinent as the cradle of the commensal species, with centrifugal radiations to the west, north, and east giving rise to the peripheral taxa (BOURSOT et al. 1993
, BOURSOT et al. 1996
; BONHOMME et al. 1994
; DIN et al. 1996
). TANOOKA et al. 1995
and OHTSUKA et al. 1996
, in turn, carried out limited surveys for the presence or absence of a processed p53 pseudogene (
p53) on chromosome 17. They observed polymorphism in the Central Asian region, in contrast to the invariable presence (in the homozygous state) of this
p53 in a broad survey of mice recognized phenotypically and genetically as M. domesticus and its complete absence in a similar survey of those recognized as M. musculus (PRAGER et al. 1997
).
In this article, we extend and augment the previously published work in several ways. First, we have filled in genetic "blank spots" on the house mouse map by sampling additional areasparticularly Yemen, Turkey, W and SC Iran, localities throughout Afghanistan, SW as well as N Pakistan, and Nepal. Included are regions, notably Yemen and Nepal, from where anatomical and ecological information is available (e.g., GRUBER 1969
; HARRISON 1972
; MARSHALL 1981
; HARRISON and BATES 1991
), but no molecular work has been done.
Second, our mtDNA study is done by sequencing all or much of the control region and flanking tRNAs, which, relative to restriction analysis (BOURSOT et al. 1996
) and sequencing the most variable part of the control region (BOISSINOT and BOURSOT 1997
), facilitates data analyses involving more distantly related lineages (including those of aboriginal mice), increases resolution, enhances delineation of evolutionary tree structure, and does not require intact high-molecular-weight DNA. In addition to focusing on phylogenetic analyses and biogeographic models, we quantitatively compare the independent duplications of the same tandem repeat.
Third, besides assessing for presence or absence, we carried out a broad survey of sequence variation in a short segment of
p53. Fourth, to relate the molecular results to morphologically based categories (e.g., see MARSHALL 1981
; MARSHALL and SAGE 1981
), we provide phenotypic and anatomical information for many of the animals we studied.
Finally, our survey of the geographically most interesting areas was carried out largely using museum skins as the DNA source because of the ready availability of specimens from these remote areas. A special value of using museum study skins is that molecular genotypes can be linked to specimens that have been previously classified by taxonomists on the basis of morphological traits conventionally used to define rodent taxa. In addition, these study skins are in public institutions and, thus, available for future analyses by other investigators. Because the DNA in such skins is present in reduced amounts and is generally broken down into small pieces, we used sets of primers that amplify short segments to screen the genetic variability of house mouse specimens. As one must amplify several fragments to sequence the same mtDNA region normally obtained in one or two fragments from total genomic DNA prepared from frozen tissues, our strategy was to sample one or two individuals per locality over a broad range and to survey dozens rather than hundreds of individuals. The markers, i.e., variable sites, we identified among new mtDNA lineages and at a
p53 locus should facilitate future surveys of variation in house mice from additional localities.
 | MATERIALS AND METHODS |
|---|
Specimens:
Skin snippets, typically 6 mm2 per mouse, from 50 of the animals (Table 1, Figure 1) were sent to us in 1991 and 1992 from the Field Museum of Natural History in Chicago. Using ethanol- and flame-sterilized instruments, we cut similarly sized skin snippets from 18 mice in the collection of the Museum of Vertebrate Zoology (MVZ) at the University of California in Berkeley; the 12 samples from mainland China came to the MVZ from the Academia Sinica in Beijing. The Museum of Zoology at the University of Michigan in Ann Arbor sent us frozen tissues of eight Pakistani mice listed by the Field Museum (Table 1); we snipped and extracted them in the same ways as the skin specimens. The mice had been collected during 19511954 in Yemen and Turkey, 19611975 in Egypt, Iran, Afghanistan, and Nepal, 1990 in Pakistan, and 19451978 in eastern Asia. Genomic DNAs, many of them available from previous projects (PRAGER et al. 1993
, PRAGER et al. 1996
, PRAGER et al. 1997
), were used along with the skin and tissue extracts to survey the following: (1) types of Y chromosomes and (2) presence/absence polymorphism and sequence differences at a
p53 locus. Table 2 provides phenotypic descriptions and measurements of 74 of the commensal mice studied.
Extractions:
With sterilized forceps, we rinsed each snippet of skin or tissue through a series of eight 40-µl drops of water before putting it into 250 or 500 µl of extraction solution in a 2-ml screw-cap (for autoclaving) or 1.5-ml locking microcentrifuge tube. Negative controls consisted of (1) sterilized forceps put through the water droplets and then dipped into the extraction tube and (2) untouched extraction solution. Specimens from all 76 individuals were extracted by adding them to a 5% Chelex (Bio-Rad, Richmond, CA) suspension in water, autoclaving for 5 min, and vortexing vigorously for 15 sec. Working stocks containing some Chelex beads were stored at -20°; these sample tubes were vortexed, and the beads were spun down before each PCR. For each 12.5-µl double-stranded amplification of mtDNA and nuclear loci, 12 µl of extract was generally used. Fresh snippets of 13 MVZ skins and of the frozen tissues were extracted by a second procedure that, for several skins, markedly improved our ability to amplify at least mtDNA segment 1 (Figure 2) or additional, longer pieces (e.g., segment 4), and for the Pakistani tissues, facilitated amplification of 0.50.7-kb fragments. The samples were first heated at 56° for 2 hr in 250 µl of hair lysis buffer, which contains 10 mM Tris-HCl (pH 8.0), 35 mM dithiothreitol, 0.9% Laureth 10 (Macol LA-12; PPG Industries), and 50 µg/ml proteinase K. The tubes were then spun down, 2.5 µl of 10 mg/ml RNase A was added, and the 56° incubation was continued for 1 hr. After the tubes were vortexed, 225 µl of a 5% Chelex suspension in water was added and incubation at 95° was done for 20 min. After centrifugation, 350 µl of the supernatant (without Chelex beads) was removed, stored, and used as the DNA source for PCR as described above.

View larger version (11K):
In this window
In a new window
Download PPT slide
|
Figure 2.
Strategy for amplification and sequencing of 0.80.9 kb of the control region and flanking tRNA genes of mouse mtDNAs retrieved from museum skins. Arrows denote primers, bars 14 indicate the individual segments amplified, and r1 and r2 represent, respectively, the 5' and 3' tandem repeats. Three-letter abbreviations stand for the tRNA genes; 12S is the small ribosomal RNA gene. Nucleotide positions are numbered throughout this report according to the domesticus type 1 sequence as described previously (PRAGER et al. 1993 , PRAGER et al. 1996 ). The basic strategy was to amplify segments 14 with primer pairs 1 + 2, 3 + 4, 7 + 9, and 10 + 12, and to sequence single-stranded templates generated from both strands with the PCR primers used as sequencing primers, except that primer 11 was substituted for primer 12. Additional primer pairs (e.g., 13 + 14, 15 + 16) and internal sequencing primers (e.g., 1316) were sometimes used. Apart from occasional length variants, the sizes (between primers) of amplified segments 14 are in order 160, 224301, 242243, and 194 bp. Primers 5, 6, and 8 were used during amplification in two portions and sequencing of the entire 1.01.1-kb region from extracts of frozen tissue according to strategies described previously (PRAGER et al. 1993 , PRAGER et al. 1996 ). The region between segments 2 and 3 is totally invariant among all reported commensal mouse mtDNA sequences; the 1218 bp where primers 2 and 3 and primers 9 and 10 overlap are conserved, except for three positions, each of which is variant in one domesticus or musculus mtDNA, and a fourth position that is variant in two castaneus mtDNAs (PRAGER et al. 1996 and references therein; BOISSINOT and BOURSOT 1997 ; this report). Primers 3, 5, 6, 8, 9, 11, and 12 correspond, respectively, to primers 25, 7, 8, and 9B of PRAGER et al. 1993 , and primer 4 corresponds to H15720 of PRAGER et al. 1996 . Locations (L, light strand; H, heavy strand; numbers representing positions of the 3' base) and 5'-to-3' sequences of the other primers are as follows: 1, L15320, ATTACTCTGGTCTTGTAAACC; 2, H15481, ATGTACTTGCTTATATGCTT; 7, L15911, GTGGTGTCATGCATTTGGTAT; 10, L16171, TTAACTATCAAACCCTATGT; 13, L15537, GGTCATAAAAYAACYATCAACA; 14, H15612, TCATGRTGTATATCAGTTTAGTYA; 15, L15538, AAGACATACCTRTRTTATCTRACT; 16, H15616, AGAGTTTATGACTGTATGGTGTAT.
|
|
PCR amplification and sequencing:
Figure 2 outlines the strategy for obtaining the sequence of the variable parts of the mitochondrial control region plus flanking tRNAs from extracts of museum skins by amplifying with four pairs of primers. Double-stranded products of segment 2 (the most variable region) from most of the skin specimens from the Field Museum were generated in 25-µl volumes using reactant solution 1 (PRAGER et al. 1993
), which has 1 mM of each dNTP and 6.7 mM MgCl2, and adding 1.6 µg of T4 gene 32 protein (United States Biochemical Co., Cleveland, OH; LESSA et al. 1992
). Amplification was done in a PCR-1000 thermal cycler (Perkin Elmer, Norwalk, CT) for 3538 cycles of denaturation at 92° for 40 sec, annealing at 60° for 1 min, and extension at 72° for 30 sec. The rest of the double-stranded PCRs were done in 12.5-µl volumes using reactant solution 2 (PRAGER et al. 1997
), which has 0.2 mM of each dNTP and MgCl2 at 2.5 mM (primer pairs 1 + 2 and 3 + 4) or 3.5 mM (primers 7 + 9 and 10 + 12); 00.8 µg of T4 gene 32 protein or 0.13 µg of Escherichia coli SSB (Pharmacia, Piscataway, NJ) was added for segment 1, and 0.2 µg of the T4 protein was added for segments 2 and 4. PCR in a Perkin Elmer 480 cycler was generally carried out for 3637 cycles; each cycle consisted of 92° for 50 sec (but 3 min for the first cycle), 60° for 45 sec, and 72° for 20 sec (but 3 min for the last cycle). For segment 3, a hot start [as described by PRAGER et al. 1997
] was followed by a "touchdown" procedure: seven precycles, during which an initial annealing temperature of 67° was lowered by 1° for each successive cycle, preceded 36 cycles with annealing at 60°. Amplifications with primer pairs 13 + 14, 13 + 16, 15 + 16, 3 + 16, and 13 + 4 were done for 4345 cycles, often with a hot start, using reactant solution 2 (2.5 mM MgCl2 for all) and the second cycling protocol given above, except that the annealing temperature was 56° for pairs 15 + 16 and 13 + 4.
For the eight mice from Pakistan, we not only amplified and sequenced segments 14, but also amplified the entire region in Figure 2 in two portions, with primer pairs 1 + 6 and 5 + 12, as done previously for genomic DNAs and purified mtDNAs (PRAGER et al. 1993
, PRAGER et al. 1996
), and sequenced unidirectionally using primers 1, 3, 8, 9, and 11. Amplification of these two longer fragments from our extracts was appreciably harder than from isolated genomic DNAs. The 5' portion was amplified from seven individuals with reactant solution 2 (with 2.5 mM MgCl2) and, after a hot start, 45 cycles of 93° for 50 sec (3 min during cycle 1), 60° for 45 sec, and 72° for 20 sec (3 min during cycle 45). The 3' portion was amplified from four mice using reactant solution 1 and the previous protocol (PRAGER et al. 1993
), but with 32 cycles rather than 25, and from three other mice using solution 2, but with the high dNTP and MgCl2 concentrations characteristic of solution 1 and, after a hot start, 36 or 43 cycles of 93° for 50 sec (3 min for cycle 1), 64° for 45 sec, and 72° for 1 min (3 min for the last cycle).
Gel purification of the double-stranded products in 5 µl of the reaction was done in 2% (occasionally 3%) NuSieve agarose as described previously (PRAGER et al. 1993
); some or all of the band with the amplified fragment was diluted 2- to 40-fold in water. PCR to yield single-stranded templates for sequencing in both directions was done in 25-µl volumes under a variety of conditions (details available from the authors). Segment 3 in Figure 2 proved to be the hardest from which to obtain templates amenable to sequencing, particularly in the direction of excess primer 7, and we did not sequence the segment fully from any individual. Nearly all 50 skin samples from the Field Museum worked well for PCR and sequencing. In contrast, 14 of the 18 from the MVZ (all but those from Korea and Taiwan) were harder to amplify; from eight, we could sequence only segment 1 or segments 1 and 4 (see Table 1).
Double-stranded amplifications of a short segment of the duplicated Zfy-1 and Zfy-2 genes on the Y chromosome (with a hot start and 45 cycles for the museum skin and tissue extracts) and of two short segments of a
p53 plus one of the functional p53 gene (with 37 cycles or a hot start followed by 42 cycles for the skin and tissue extracts) were done as described by PRAGER et al. 1997
. The Y primers, Zfy2DF and Zfy2DR, yield PCR products of 184 and 202 bp and bracket the 139- or 157-bp region extending from the second position of codon 467 through the second position of codon 519, with codons 480485 deleted in Zfy-2 in one type of Y.
p53 and p53 primer pair Int5S + Int5R brackets the 89- or l67-bp region extending from the third position of codon 182 to the first position of codon 212, with codons numbered according to the cDNA sequence of the functional gene; the size difference is caused by the 78-bp intron 5 in p53. As the
p53 and p53 PCR products of 137 and 215 bp are close in size, one can score presence or absence of
p53 while confirming successful PCR by appearance of the p53 product, and can usually also distinguish between individuals homozygous and heterozygous for
p53 (PRAGER et al. 1997
). Primers Exon 4 and Exon 5 bracket a 128-bp piece of the
p53 in commensal mice and a 133-bp piece in M. macedonicus and M. spicilegus; these extend from the third position of codon 109 to the third position of codon 153, and the PCR products are 176- or 181-bp long. We tested one or both
p53 primer pairs on genomic DNA of nine M. spretus (four from Catalunya and two from Puerto Real in Spain plus three from Azrou, Morocco) to confirm the previous inference, based on one Spanish mouse (TANOOKA et al. 1995
; OHTSUKA et al. 1996
), that this species lacks a
p53. Gel analysis and purification of PCR products in 3% NuSieve agarose were done as described before (PRAGER et al. 1993
, PRAGER et al. 1997
). Single-stranded templates for sequencing were made (details available from the authors) in one direction from the shorter Y chromosome fragment (for sequencing with primer Zfy2DR), in both directions with primers Exon 4 and Exon 5, and in one or both directions with primers Int5S and Int5R.
Desalting of templates, which were generally resuspended in 15 µl of water, and dideoxy sequencing were done as described before (PRAGER et al. 1993
), except that half volumes were used for the sequencing reactions and usually only wedge gels were required. Segments 14 in Figure 2 total 820835 bp in most of the mtDNAs and 898 bp in those bearing a tandem 76-bp repeat. For mice where museum skins were the starting material, we read for n = 58 an average of 744 bp (range, 385890 bp), and for the n = 10 worst results, an average of 233 bp (range, 160354 bp). Starting with the frozen tissues, we read (of totals of 1043 or 1119 bp) an average of 1059 bp (range, 9641119 bp; n = 6 had no unread sequence). GenBank accession numbers for the 59 new mtDNA sequences we determined are AF074490AF074548.
Y chromosome sequences of the 139-bp segment (average of 126 bp read) were determined to see whether the same 18 bp had been deleted in Ys from diverse areas. The mice assessed were 13 of the 16 with the B type of Y in Table 1 (all but that from locality 34 and two from locality 50) plus one each from Croatia, Moldova, and Ukraine, and two from Germany. The GenBank accession no. for the variant sequence found is AF074549.
An average of 126 bp was read for a 128-bp
p53 fragment flanked by primers Exon 4 and Exon 5 (n = 79 commensal mice; localities and individuals detailed in Figure 10). Complete 133-bp sequences (which match the functional gene) inferred to come from a separate
p53 locus were determined from two commensal mice; to obtain this slightly longer sequence from a mouse yielding both bands, with heteroduplex formation and/or trailing of the shorter fragment in the area of the longer one, we subtracted out the bases found in the shorter piece. The mice and localities that yielded each of the five distinct sequence phenotypes (see RESULTS) obtained by sequencing 133 bp (average of 129 bp read; n = 9) from aboriginal mice at the locus, designated
p53-1, that is shared with most commensal mice are as follows: (1) two M. macedonicus from Gradsko, Macedonia, and one from Turkey (no. 74392), plus a M. spicilegus from Halbturn, Austria; (2) one M. spicilegus from Debeljaca, Serbia, and one from Kishinev, Moldova; (3 and 4) each in one M. spicilegus from Srpska Mitrovica, Serbia; (5) one M. spicilegus from Debeljaca. By sequencing between primers Int5S and Int5R, we defined one 89-bp sequence for this second segment of
p53-1 (in Georgian mouse 4569 plus one from Bokhorst, Germany) and two 167-bp sequences for the equivalent part of the functional p53 (from the data for two German mice from Burg and Dannau). GenBank accession numbers for the 24
p53 and two p53 sequence phenotypes we obtained are AF074551AF074576.

View larger version (59K):
In this window
In a new window
Download PPT slide
|
Figure 4.
Variation at 94 polymorphic sites among 34 types of castaneus and Yemeni mtDNA sequences shown in the format of Figure 3. The + at 15537a indicates a tandem 76-bp repeat of the sequence from 15538 to 15615; the variation in the 3' repeat appears on separate lines designated 3'. Among the 28 castaneus types, 78 sites are variable, and 16 more vary among the 27 additional distinct castaneus mtDNAs identified from the partial sequences in BOISSINOT and BOURSOT 1997 (see Figure 6). The tandem repeats in castaneus types 1628 are 76-bp long, rather than the 75 bp in musculus types 3236 (Figure 5A), because all musculus mtDNAs have a 1-bp gap at positions 1557015572. Among the six Yemeni mtDNAs, 16 sites vary.
|
|

View larger version (22K):
In this window
In a new window
Download PPT slide
|
Figure 5.
Parsimony trees for 44 musculus mtDNAs (A) and six Yemeni mtDNAs (B). The number of mutations inferred to have occurred along each lineage is indicated. The large solid triangle in A marks the lineage where the 75-bp tandem repeat of the sequence from 1553815615 arose; small open triangles mark the five lineages with inferred additions of 12 bp. Aust, Austria; Bavaria and Bav, Bavarian transect (see PRAGER et al. 1996 ); Croa, Croatia; Czech, Czech Republic; Dagh, Daghestan; N, northern; NC, north central; S, southern; Slov, Slovakia; SW, southwestern; Turk, Turkmenistan. Heavy horizontal lines in A highlight the terminal lineages leading to the eight new musculus mtDNA types and also the 15 of 23 internal branches present in 100% of the 4128 minimal-length trees that PAUP found for these 44 mtDNAs plus castaneus type 1 used as an outgroup. The musculus tree requires 84 mutations: 66 transitions, 12 transversions, and six length changes (consistency index = 0.73). The eight internal branches not highlighted in A occur in 4486% of all the minimal-length trees. The musculus tree was rooted as shown in all PAUP analyses done. The variation in the single most parsimonious network derived for the six types of Yemeni mtDNA can be explained by 16 transitions and one transversion (consistency index = 0.94). The root was placed as shown in B on the basis of diverse analyses that included a variety of commensal or commensal plus aboriginal mtDNAs. Among the 17 distinct musculus mtDNAs identified by BOISSINOT and BOURSOT 1997 by sequencing positions 1544315742, type B92 from Latvia matches our types 7, 9, 10, and 1619 for this 0.3-kb region, and B94 from Georgia matches type 31. Their 15 other musculus mtDNAs can be added to the tree in A as follows (see MATERIALS AND METHODS for details), with several of the placements being tentative: types B93 from Latvia and Armenia, B95 from Armenia, and B96, B97, B99, and B101B103 from Georgia emanating from the same basal node as types 2528 and 31, with B97 + B99 + B103 and B101 + B102 associated in clades; B130 from Moscow in a clade with type 35; B91 from Georgia and B100 from Daghestan emanating from the same basal node as types 3236 and 44; B98 from Georgia breaking up the deepest internal branch into two branches, such that among the types depicted, only the clade of 3840 lies deeper within the musculus tree; the phylogenetically equivalent Iranian types B118 and B129 from Mashhad and B119 from Kakhk in a clade that shares a common lineage with the clade of types 3840 or (among additional equally parsimonious options) emanation from the same node as suggested for B98.
|
|

View larger version (27K):
In this window
In a new window
Download PPT slide
|
Figure 6.
Parsimony tree constructed for 28 types of castaneus mtDNAs shown with heavy lines in the format described for Figure 5. Thin lines indicate the placement (see MATERIALS AND METHODS) of and additional branchings generated by the 29 castaneus mtDNAs identified by BOISSINOT and BOURSOT 1997 from the sequences of positions 1544315742; type B127 matches our types 3 and 4 in that portion of the control region and B136 matches our type 19. Pak, Pakistan; single letters and two-letter combinations of C, central; E, eastern; N, northern; S, southern; W, western. The source localities
|
|

View larger version (43K):
In this window
In a new window
Download PPT slide
|
Figure 7.
Parsimony tree for 110 domesticus mtDNAs shown in the format described for Figure 5. Solid circles indicate the connection of the left and right halves of the tree. Den, Denmark; Eng, England; Fin, Finland; Ger, Germany; Nor, Norway; Swe, Sweden; Switz, Switzerland; Scotland includes also localities in the Orkney and Shetland Islands. The solid triangle marks the lineage where the 11-bp direct repeat of the sequence at positions 1607316083 has arisen; open triangles mark 36 lineages with inferred additions or deletions of 15 bp. Heavy horizontal lines highlight the 14 new domesticus mtDNA types and the 36 of 54 internal branches that are present in 100% of all minimal-length trees. This tree requires 237 mutations: 171 transitions, 26 transversions, and 40 length changes (consistency index = 0.50). The root was placed as shown from the strict consensus tree of an analysis that included musculus mtDNA types 1, 20, 29, 30, and 3840; Yemeni types 1, 2, and 6; and castaneus types 1, 9, 12, and 28. Six of the internal branches not highlighted occurred in 6590% of all minimal-length trees, nine occurred in 2150%, and three were not evaluated [see MATERIALS AND METHODS and PRAGER et al. 1993 , PRAGER et al. 1996 for further details]. Among the 25 distinct domesticus mtDNAs identified by BOISSINOT and BOURSOT 1997 by sequencing positions 1544315742, type B66 from Tunisia matches types 86, 87, 89, and 90 for this 0.3-kb region; Tunisian B67 matches type 94; Tunisian B75 matches types 80 and 99; French B82 matches type 76; and French B84 matches types 15, 16, and 5961. Twelve of their 20 domesticus sequences distinct from types 1110 could be assigned (see MATERIALS AND METHODS) to specific sections of our tree with reasonable confidence, as follows: B83 from Italy emanating from the same node as type 17 and the clades of 11 + 81 and 1316 + 5761; clades of Tunisian B64 + B65 and B78 + B79 emanating from the same basal node as types 80, 99, and several other lineages; Spanish B80 + B81 in a clade emanating from the same node as types 20 and 21; Tunisian B76 and B77 emanating from the same node as types 18 and 77; Tunisian B72B74 in a clade with type 100, with B72 + B73 grouped therein. Possible placements for the remaining eight sequences are as follows: clades of Georgian types B85 + B88 and B86 + B87 emanating from the same node as type 110 and the clade of 16 + 70; Tunisian B68, B69, a clade of B70 + B71, and perhaps also B67 (see above) emanating from the same node as type 97 and the clade extending from type 7 to 10.
|
|

View larger version (14K):
In this window
In a new window
Download PPT slide
|
Figure 8.
Parsimony tree for mtDNAs of commensal house mice. First, this tree schematically summarizes the information in Figure 5 Figure 6 Figure 7. Thus, for example, the musculus portion represents the deepest intra-musculus node, plotted at an average of 4.2 events per lineage as in Figure 5A. Second, it adds six lineages that connect the four trees in Figure 5 Figure 6 Figure 7 to one another; the 48 events assigned to these lineages (at 47 polymorphic sites) consist of 35 transitions, nine transversions, and four 12-bp length changes (open triangles). Assignment of mutations to these six lineages, selection of branching order, and root placement were done by considering 188 commensal mtDNA sequences (i.e., all those in Figure 5 and Figure 7 plus types 128 in Figure 6) and those from aboriginal house mice. Parsimony and neighbor-joining (Figure 9) trees, pairwise distances, and estimates of nucleotide variability (Table 3) were taken into account. In analyses of commensal plus aboriginal sequences, arrangement of the deeper commensal lineages and placement of the deepest root were not completely stable to methods of tree construction and choice of representative sequences (e.g., see Figure 9 and its legend). While the musculus, Yemeni, and domesticus mtDNAs each invariably formed a monophyletic clade (bootstrap values of 97100% in 1000 replications in two parsimony analyses of 13 commensal plus 6 aboriginal sequences, including the set of types in Figure 9), as did all commensal mtDNAs collectively (100% bootstrap values), this was not true of the castaneus mtDNAs. Indeed, in some parsimony analyses, the domesticus mtDNAs were implied to emanate from the same node as castaneus type 13. The branching order and rooting shown here were, therefore, chosen based on intracommensal parsimony trees and distance values.
|
|

View larger version (18K):
In this window
In a new window
Download PPT slide
|
Figure 9.
Neighbor-joining trees for 19 mtDNAs from commensal and aboriginal house mice. In A, all changes were weighted equally; in B, transversions were weighted fivefold relative to transitions and length changes. The deepest and second-deepest commensal clades are, respectively, domesticus and Yemeni mtDNAs in both trees. However, the castaneus mtDNAs are monophyletic in A but paraphyletic in B. In the analogous analyses for only 14 sequences (with dom 96, mus 38, and cas types 10, 13, and 14 omitted), the branching order of both trees matched A here.
|
|

View larger version (30K):
In this window
In a new window
Download PPT slide
|
Figure 10.
Variable sites, observed sequence patterns (A), and inferred alleles (B) in a 128-bp segment of a p53 pseudogene among 79 commensal mice from 68 localities, presented in the format of Figure 3 and Figure 4. The 12 variable sites are listed vertically according to codon number and position within the codon; S, R, and Y indicate, respectively, C + G, A + G, and C + T; ?, unsequenced sites. Phenotype 1 and allele 1 at locus p53-1 differ from the functional p53 in having codons 120 and 121 deleted, a T inserted between positions 2 and 3 of codon 122, and a stop signal at codon 143. Locality numbers and, if necessary, identification numbers are included in A for regions in Table 1 where not all mice had the same sequence phenotype. A, Afghanistan; P, Pakistan; N, Nepal; h, heterozygosity for presence/absence of the p53. In B, Chr tabulates the number of chromosomes out of 151. Because phenotypic patterns 4, 9, and 14 are each polymorphic at two or three sites, one cannot infer their allele sequences conclusively in the absence of sequencing multiple clones of PCR products. Thus, 14 should be regarded as the minimum number of distinct alleles, and the sequences of postulated alleles 3 and 14 should be viewed as uncertain. (Assignment of both variant bases in phenotype 4 to allele 3 was arbitrary.) At site 111-2 in phenotypes 17 and 18, the Nepalese mouse and Taiwanese animal 152831 may have A +
|
|
Calculations:
We made use of the 139 published Mus mtDNA sequences for this 1-kb region included by PRAGER et al. 1996
: domesticus types 196, musculus types 136, castaneus and macedonicus types 1, spicilegus types 13, and spretus types 1 and 2. Because segments 14 encompass almost all the known intracommensal mtDNA sequence variation in the whole control region and flanking tRNAs (Figure 2), we assumed for all sequences considered here a length of 1000 bp for computations of nucleotide variability, which was estimated with the parameters
and
as before (NACHMAN et al. 1994
; PRAGER et al. 1996
). This assumed length is very close to the averages read by PRAGER et al. 1996
(and references therein), starting with total genomic DNA or purified mtDNA.
Character-state parsimony trees for mtDNAs were constructed with the PAUP (Phylogenetic Analysis Using Parsimony) version 3.0s program with a heuristic search procedure and equal weighting of all character changes, as described in detail previously (PRAGER et al. 1993
, PRAGER et al. 1996
). As before, smaller subsets of a given dataset (notably that of 110 domesticus mtDNA sequences) were analyzed with PAUP to examine all most-parsimonious arrangements in various sections of the tree and to root trees and relate major commensal mtDNA lineages to one another (see also legends to Figure 5 Figure 6 Figure 7 Figure 8). Neighbor-joining mtDNA trees were constructed with the PHYLIP 3.572c program from matrices of pairwise differences computed after weighting transversions five times as heavily as other changes, as well as from matrices of unweighted differences. M. spretus mtDNAs served as the outgroup to those of all the other taxa (cf. PRAGER et al. 1996
).
For the reasons discussed by PRAGER et al. 1996
, we assumed the likeliest base at missing variable sites for tree construction and computation of pairwise differences; as argued before, the likelihood of an incorrect assignment is often low and the consequences are in most cases expected to be minor. For the sequence types newly defined here, we do not expect the assumptions made for unsequenced sites to have an effect on any substantial inferences, except perhaps with respect to castaneus types 14 and 15. The specific assumptions made beyond those in PRAGER et al. 1996
are as follows: For musculus mtDNA types 3744: as in type 1 at all missing sites. For domesticus types 97110: T at position 15912 in types 9799 and 102110, C at position 16012 in type 99, as in type 1 at all other missing sites. Among castaneus types 228: as in type 1 at all sites missing in types 6, 7, 12, 14, 16, 17, and 19; G at position 15958 in type 23; types 15, 20, 22, and 25 taken as matching, respectively, types 14, 21, 23, and 24. For the mtDNAs in Figure 4, except castaneus 8 (for which we made no assumptions), any additional missing sites beyond the 94 sites in that figure were assumed to match castaneus type 1; macedonicus types 13 were assumed to match at all sites missing in any of the three sequences.
Analogous to the procedures described for musculus mtDNA types 3236 (PRAGER et al. 1996
), variable positions within the tandem repeats in castaneus types 1628 were entered into the computer only once for PAUP analyses, and seven events were added by hand after tree construction: at position 15548, T to A in the 5' copy of type 23; at 15550, T to C in the 5' copy of type 24 or in the 3' copy of type 25; at 15554, T to C in the 5' copy along the lineage leading to the clade of types 1628 and C to T in the 5' copy of type 22; at 15569, T to C in the 5' copy of type 28; at 15581, A to G in the 5' copy of type 27; at 15601, T to C in the 3' copy of type 18.
BOISSINOT and BOURSOT 1997
reported the sequences of mtDNA positions 1544315742 from 131 commensal mice. Among the 71 mtDNA types they defined, 62 are distinct from the 189 collectively defined by us and NACHMAN et al. 1994
. Their segment of 297374 bp (allowing for length variants and tandem repeats) encompasses segment 2 and part of segment 1 in Figure 2, and includes the most variable part of the control region (PRAGER et al. 1993
, PRAGER et al. 1996
). Sequencing this 0.3-kb fragment is likely to detect much of the diversity among the mtDNAs examined, but it lacks many of the variable sites that provide structure and define clades in our parsimony trees based on the 1-kb region in Figure 2. We, therefore, added the BOISSINOT and BOURSOT 1997
mtDNAs to the trees in Figure 5A, Figure 6, and Figure 7 by hand after tree construction. Their 29 castaneus mtDNAs could be placed with appreciable confidence, so we show them explicitly in Figure 6; placement of their 17 musculus (Figure 5A) and 25 domesticus (Figure 7) mtDNAs is described in the figure legends. We preface the BOISSINOT and BOURSOT 1997
mtDNA type numbers with the letter B, except within Figure 6.
 | RESULTS |
|---|
Mitochondrial DNA sequences:
Among the 76 newly studied mice from 60 localities, we resolved 61 distinct sequences (Table 1, Figure 3 and Figure 4); 57 of them correspond to types of mtDNA not seen in earlier surveys (PRAGER et al. 1993
, PRAGER et al. 1996
; NACHMAN et al. 1994
; BOISSINOT and BOURSOT 1997
). The new types are assignable to four previously recognized clades (i.e., domesticus, musculus, castaneus, and macedonicus) and one distinctive new clade (see below). Two types we saw before were dom 28 and mus 24; in addition, each of the partial sequences B127 and B136 matches one or two of our castaneus types (see Figure 6). For eight animals, our fragmentary sequences allowed classification to the musculus and castaneus mtDNA categories, but not designation of specific mtDNA types (Table 1).
Our survey revealed domesticus mtDNAs in 18 miceall the commensals from Egypt and Turkey plus three from Iran. The results for the Egyptian mice concur with previous mtDNA (e.g., FERRIS et al. 1983
; PRAGER et al. 1993
) and protein electrophoretic (SAGE 1981
) evidence as well as phenotypic classification. They supplement the earlier mtDNA work on specimens from NE Egypt by documenting domesticus mtDNAs in the NW and SE parts of the country. In what appears to be the first mtDNA characterization of Turkish mice, we detected six domesticus mtDNAs from four localities in the country's SE quarter, from sea level on the eastern Mediterranean to the mountains bordering Lake Van. This study also marks the first report of domesticus mtDNA in Iran, which we found at localities 18, 19, and 21 along the western border, the first two in the Zagros Mountains and the third near the Persian Gulf.
Seventeen of the newly surveyed mice had musculus mtDNAs, 13 of them from areas in East Asia known to harbor musculus mtDNAs (YONEKAWA et al. 1988
; NAGAMINE et al. 1994B
). We found musculus mtDNA in NC Iran, at locality 25 on the Caspian Sea, which is consistent with recent detection of musculus mtDNAs in E Iran (BOISSINOT and BOURSOT 1997
). Our study is the first report of musculus mtDNA in Afghanistan, which we found at localities 3133, extending some 500 km across the northern edge of the country, just north of the great central mountain range.
Seven of the newly surveyed mice had sequences (cas types 15; Figure 4, Table 1) very similar to castaneus type 1 known from Thailand. Four of these mice came from Taiwan, SE mainland China, and the Philippines, areas where such castaneus mtDNAs are well known (YONEKAWA et al. 1988
; NAGAMINE et al. 1994B
; BOURSOT et al. 1996
; BOISSINOT and BOURSOT 1997
); the M. castaneus animal from the Mariana Islands also had such a cas mtDNA. Types 2 and 3 at localities 38 and 40 on the SW Pakistan coast are the first report of this kind of mtDNA in that country. Transport beyond a natural range by humans via shipping merits consideration. However, as BOURSOT et al. 1996
and BOISSINOT and BOURSOT 1997
have documented such castaneus mtDNAs in SW and NC India (see Figure 6) and the subspecies (castaneus and tytleri) of M. castaneus are known from much of India, finding this kind of mtDNA in SW Pakistan may not be surprising. The pelage of these two Pakistani mice is not characteristic of M. castaneus, but the skull of one of them is (Table 2).
We found a diverse collection of mtDNAs denoted castaneus types 628 among 23 mice from Central Asia: Iran, Afghanistan, Pakistan, and Nepal. Among the mice with such mtDNAs are those from localities 3537, which are in the general area of Kabul, and Pakistani localities 4345, which are in the general area of some of those in the BOURSOT et al. 1996
and BOISSINOT and BOURSOT 1997
surveys; the remainder represent previously unsampled areas. As reported in a preliminary account (PRAGER et al. 1996
), types cas 1628 have a second, tandem 76-bp copy of a control region segment that is independently duplicated in the musculus 3236 clade of mtDNAs (cf. Figure 5A). Within a given mtDNA type, the repeats differ by one to seven base substitutions and, by several criteria (see below), are considerably more diverse than those of the musculus mtDNAs. BOISSINOT and BOURSOT 1997
have also documented the independence of the musculus and castaneus duplications.
We use the name castaneus for mtDNA types cas 128 [and the 29 phylogenetically related types from BOISSINOT and BOURSOT 1997
], even though few of the mice bearing these mtDNAs, especially outside the clade of types 15, have been called M. castaneus on phenotypic and morphological grounds (Table 2). There are two reasons to apply one name to all these mtDNAs: first, mice bearing mtDNAs in the shallow clade with types 15 are intermixed throughout the Indo-Pakistan area with mice bearing types outside this clade (see Figure 6), which suggests they belong to the same interbreeding population and are connected by considerable amounts of gene flow. Second, these mtDNAs constitute a phylogeographic unit. BOURSOT et al. 1996
and BOISSINOT and BOURSOT 1997
have also recognized the apparent unity of this group, but with the name "oriental." We prefer castaneus because it follows the heretofore used protocol of describing gene lineages with names derived from species names of the mice and was already applied to type 1.
PRAGER et al. 1996
reported both domesticus and musculus mtDNAs in SW Georgia (locality 16), which is consistent with the ORTH et al. 1996
inference of a broad area of secondary contact and remixing of genomes in Transcaucasia. To the countries with different major lineages of commensal mtDNAs we can now add Iran [also from the results of BOISSINOT and BOURSOT 1997
] and Afghanistan.
Most remarkable in our present survey are the six mtDNAs of Yemeni mice (Figure 4). They are similar and clearly related to one another (pairwise differences of 211 bp) but rather different from all the other kinds of mtDNAs of commensal mice (pairwise differences of 2447 in Table 3 below). Thus, the Yemeni mtDNAs represent a major new lineage from part of the house mouse range previously unexplored at the molecular level. Relevant to our findings, the mice in the southern portion of the Arabian Peninsula were given a distinct subspecific or racial name, M. m. gentilulus [HARRISON 1972; HARRISON and BATES 1991; M. d. gentilulus in MARSHALL and SAGE 1981
], in light of their being so conspicuously smaller that HARRISON 1972
called them pygmy mice. The Yemeni animals are clearly the smallest long-tailed mice we studied (Table 2). Nine mice from eight nearby localities, to the south and east of ours, had similar traitswith averages (and ranges) for total length, tail length, and tail-to-body ratio, respectively, being 134 mm (111161), 69 mm (6383), and 1.07 (0.801.31)as was true also for mice assigned to this taxon from Oman on the SE tip of the Arabian Peninsula and from Bahrain on the Persian Gulf (HARRISON 1972
). The cranial measurements of the M. (m.) gentilulus mice seem even more distinctively small, relative to the mice from the northern Arabian Peninsula and Mesopotamian areas assigned to M. (m. or d.) praetextus, than do their external dimensions (HARRISON 1972
).
View this table:
In this window
In a new window
|
Table 3.
Quantitative comparisons of sequence differences among the mtDNAs of commensal house mice
|
|
Evolutionary trees and diversity of mtDNAs:
Figure 5A presents a rooted parsimony tree relating 44 musculus mtDNAs. The present tree differs from the one for musculus types 136 (PRAGER et al. 1996
) in two conspicuous ways: first, it has a new basal clade that is made up of Afghan types 3840. That the deepest lineage stems from Afghanistan and the next-deepest clade is also from Central Asia accords well with a model [e.g., see Figure 4 of BOURSOT et al. 1996
] postulating the original homeland of M. musculus and the start of intraspecific divergence in or near this northern fringe of Afghanistan. Our results for nuclear loci (see below) along with their short tails (Table 2) suggest that these mice are authentic M. musculus rather than the products of mtDNA introgression into another species. Second, the average depth of the tree in Figure 5A is ~4.2 events per lineage, 20% deeper than the tree in PRAGER et al. 1996
and close to two-thirds that shown for 110 types of domesticus mtDNAs in Figure 7, contrasted to the earlier relative value of about half inferred for the tree for 36 musculus mtDNAs vs. that for 96 domesticus mtDNAs (PRAGER et al. 1996
). If we assume that the deepest split among commensal species occurred 350,000900,000 years ago (SHE et al. 1990
; BOURSOT et al. 1993
, BOURSOT et al. 1996
) and that this split corresponds to the deepest node among commensal mtDNA lineages (at the base of the tree in Figure 8), the implication is that the musculus mtDNA lineages examined could have shared a common ancestor some 70,000180,000 years ago.
Figure 5B shows the most parsimonious rooted tree for the six types of mtDNA from Yemen. The eye-catching feature of the Yemeni tree is that, with a depth of ~3.7 events per lineage, it is nearly as deep as the musculus tree in Figure 5A even though it is derived from ~5% of the number of specimens represented in the musculus tree. One implication is that the mitochondrial lineages in a limited part of the Arabian Peninsula might have begun diverging nearly as long ago (perhaps 60,000160,000 years) as did the lineages for extant musculus mtDNAs over their entire range of northern Eurasia. The
and
values in Table 3 suggest that the mice in Yemen are mitochondrially ~60% as variable as is M. musculus, an inference supported by the relative ranges of pairwise differences (notably 01 vs. 05 transversions and 0 vs. 03 length changes). An expectation, also in light of our evolutionary model (see DISCUSSION), is that sampling from additional localities on the southern Arabian Peninsula (HARRISON and BATES 1991
) would reveal more lineages, including deeper ones, in this newly described major branch of the commensal mtDNA tree.
Figure 6 presents a parsimony tree constructed for the 28 castaneus mtDNA sequences in Figure 4 and also shows placement of the BOISSINOT and BOURSOT 1997
castaneus sequences. The tree for 28 sequences has a transition-to-transversion ratio of 4.2, a value lower than those of 5.5 and 6.6, respectively, for the trees in Figure 5A and Figure 7 and indicative of greater sequence divergence. The average depth of the tree in Figure 6 of ~10.6 events per lineage is, respectively, ~2.5 and 1.7 times as deep as those for musculus and domesticus mtDNAs. The implication is that the mtDNA lineages in Figure 6 began diverging from one another some 170,000460,000 years ago. The values in Table 3 suggest that these mtDNAs exhibit at least as much genetic diversity as do the domesticus mtDNAs.
Members of the shallow clade of cas 15 and related types (Figure 6) are found across the range of mice designated M. bactrianus and M. castaneus, from SW Pakistan through NC India to Taiwan, but the southeastern mice have only this category of mtDNA. One possibility is that ships moved mice with this mtDNA lineage around the area and that this lineage is the dominant one in SW Pakistan and SW India. Another interpretation is that M. castaneus only recently spread into extreme SE Asia. This latter hypothesis invokes filtering out of the mtDNA diversity from the core Indo-Pakistan area as the mice moved through patchy habitats into E India and SE Asia. SAGE and WOLFF 1986
have shown how such repeated colonization events lead to erosion of genetic diversity in peripheral populations. Under the filter hypothesis, we would expect to find only this mtDNA clade in future surveys of mice from the extreme southeastern part of the M. castaneus range. The out-of-India filter model appears favored over the out-of-Pakistan shipping model because members of this shallow clade also occur in NC India.
Figure 7 shows a rooted parsimony tree for 110 domesticus mtDNAs. An important feature is the placement of the easternmost domesticus mtDNAs, i.e., those from Iran, Turkey, and Georgia. Under the earlier hypothesis that the commensal clade arose in the east and M. domesticus originated via westward migration (see Introduction and DISCUSSION), one would predict that the eastern M. domesticus mice would have representatives of all the major mtDNA clades and perhaps some clades not detected in the extensive surveys of the Mediterranean (including North African) and western European animals. Instead, all our Iranian, Turkish, and Georgian mtDNAs [and possibly also the Georgian sequences of BOISSINOT and BOURSOT 1997
] are limited to the clade comprising the top left quarter of the tree. In contrast, the deepest lineage in our domesticus tree (type 96) comes from two Greek mice, and mtDNAs from Greek mice are also found in all but one of the other deep clades in this tree. Sampling of the eastern domesticus mtDNAs was limited (n = 11 mice and l = 7 localities from Turkey plus Iran; n = 8 and l = 6 for Georgia), but the Greek sample size was similar (n = 11, l = 6). mtDNAs from Spain (n = 11, l = 7) and Italy (n = 34, l = 18) are also found as members of diverse deep clades. This tree does further support the view (PRAGER et al. 1996
and references therein) that southern Mediterranean domesticus mtDNA lineages are older than northern European ones.
Among the new domesticus mtDNAs from Egypt, types 99101 fall into the same large clade as do the previously characterized Egyptian types 18 and 2225, type 97 is a deeper lineage in a clade previously containing mtDNAs from NW Europe and Croatia, type 98 constitutes a relatively deep monotypic branch, and type 28 extends the range of mtDNAs with an 11-bp direct repeat to North Africa. Ten Tunisian mtDNAs belong to the clade containing most of our Egyptian mtDNAs (see legend to Figure 7). These results provide increasing evidence for considerable molecular evolution within NE Africa (see also TUCKER et al. 1989
).
The tree in Figure 7 differs structurally from that presented for 96 domesticus mtDNAs (PRAGER et al. 1996
) in two notable respects: first, it is shallower, with an average depth of ~6.4 events per lineage rather than 7.3. The start of divergence among all 110 lineages is suggested as some 100,000280,000 years ago. Second, there has been some rearrangement of the deeper lineages. Specifically, the mtDNAs with C at position 00055 (types 5356, 68, 69, 9195, and 102110) no longer form a monophyletic clade, and they have moved from the lower right of the tree to the upper left. Consequently, the G at position 00055 in types 16 and 70 arises via a C-to-G transversion rather than an A-to-G transition. In addition, all the mtDNAs with A at position 00055 are united in a clade (from type 7 down to type 98 in the figure). We previously chose from among equally parsimonious alternatives a tree structure that accounted for the four different bases at position 00055 with two transitions and one transversion (see also PRAGER et al. 1993
), an option that now does not yield minimal-length trees.
Figure 8 provides an overview of the character-state phylogenetic analyses in Figure 5 Figure 6 Figure 7 and relates the four major commensal mtDNA lineages to one another. The neighbor-joining trees in Figure 9 exhibit the same branching order of the major lineages and the cohesiveness of the musculus, Yemeni, and domesticus clades (each of which is united by 914.5 events on the common lineages in Figure 8). The trees reinforce the view that the Yemeni mtDNAs constitute a distinct branch. In both figures, the domesticus lineage occupies the ancestral position among the commensal mtDNAs, the Yemeni lineage appears as the next oldest, and the castaneus and musculus lineages appear to be the two shallowest. This arrangement and rooting of the four commensal lineages are consistent with the
values in Table 3. Leaving out the newly discovered Yemeni lineage, the trees in Figure 8 and Figure 9 have the same branching order and root placement as the trees of BOURSOT et al. 1996
and BOISSINOT and BOURSOT 1997
(see DISCUSSION for details). However, as emphasized in the DISCUSSION, the available data do not permit the arrangement and rooting of the four major lineages in Figure 8 to be inferred with statistical confidence, as is true also of the assessment of the cohesiveness of the castaneus mtDNAs. The deepest internal branches in Figure 8 have only three to four events, and they are similarly short in Figure 9. An obvious possibility is that cladogenesis has been rapid.
Tandem repeats of 75 and 76 bp:
Table 4 quantitatively compares the results of the independent duplications of the same control region segment in castaneus and musculus mtDNAs. By all criteria, the duplication occurred earlier among the castaneus mtDNAs: assuming roughly equal rates of evolution, the tree-based analyses place the duplication point at least twice as long ago for the castaneus lineage, with computed depths of ~6.5 vs. 3.0 events per lineage. About the same number of events occurred in the areas flanking and within the repeats among the castaneus mtDNAs, but none accumulated outside the repeats after the duplication among the m