Hybrid zones provide an excellent opportunity for studying the consequences of genetic changes between closely related taxa. Here we investigate patterns of genetic variability and gene flow at four X-linked loci within and between the two subspecies of European rabbit (Oryctolagus cuniculus cuniculus and O. c. algirus). Two of these genes are located near the centromere and two are located near the telomeres. We observed a deep split in the genealogy of each gene with the root located along the deepest branch in each case, consistent with the evolution of these subspecies in allopatry. The two centromeric loci showed low levels of variability, high levels of linkage disequilibrium, and little introgression between subspecies. In contrast, the two telomeric loci showed high levels of variability, low levels of linkage disequilibrium, and considerable introgression between subspecies. These data are consistent with suppression of recombination near the centromere of the rabbit X chromosome. These observations support a view of speciation where genomic incompatibilities at different loci in the genome create localized differences in levels of gene flow between nascent species.
A key problem in evolutionary genetics concerns the origin of reproductive isolation between incipient species (Coyne and Orr 2004). Two important conclusions come from previous studies of the genetics of reproductive isolation. First, barriers to gene flow often derive from incompatibilities between allelic variants at two or more loci, i.e., epistasis (Bateson 1909; Dobzhansky 1936; Muller 1940, 1942). Empirical support for epistasis comes from a large body of work in Drosophila, beginning with Dobzhansky (1936). More recently, specific genes underlying reproductive isolation have been identified, and all involve epistatic interactions (Malitschek et al. 1995; Ting et al. 1998; Barbash et al. 2003; Presgraves et al. 2003). Second, loci contributing to reproductive isolation tend to be overrepresented on the X chromosome in groups in which males are heterogametic (Coyne and Orr 1989). Evidence for the “large X effect” comes from mapping studies of hybrid sterility and hybrid inviability (e.g., Dobzhansky 1936; Grula and Taylor 1980; True et al. 1996; Presgraves 2003; Presgraves et al. 2003; Tao et al. 2003). Moreover, Haldane's (1922) rule (the sterility or inviability of heterogametic hybrids) seems to be due largely to incompatibilities involving recessive X-linked mutations (Turelli and Orr 1995, 2000). Finally, in a number of cases where sister species hybridize in nature, X-linked loci introgress less than autosomal loci (e.g., Hagen 1990; Sperling and Spence 1991; Tucker et al. 1992).
The genetic basis of reproductive isolation has been studied both with laboratory crosses and in natural hybrid zones, and both approaches have advantages and disadvantages. For example, laboratory crosses make it possible to control the genetic background as well as the environment, and they are repeatable. Hybrid zones offer the advantage of many generations of recombination, making fine-scale mapping more feasible. In hybrid zones, it is possible to identify genes contributing to isolation simply from patterns of gene flow without prior knowledge of the phenotype. Hybrid zones also allow us to study species that cannot be crossed in the laboratory. Finally, hybrid zones provide a picture of the fitness of hybrid genotypes under natural conditions.
The European rabbit (Oryctolagus cuniculus) provides an opportunity to study the genetic basis of reproductive isolation between recently evolved taxa. This species consists of two subspecies, O. cuniculus algirus in the southwestern portion of the Iberian Peninsula and O. cuniculus cuniculus in the northeast of the Iberian Peninsula and France. These two groups diverged in allopatry during the early Pleistocene and have subsequently come into secondary contact in central Iberia, forming a contact zone that runs in a NW–SE direction (Figure 1) (Branco et al. 2000, 2002). The two subspecies are well differentiated with respect to mtDNA (Branco et al. 2000), the Y chromosome (Geraldes et al. 2005), and some allozyme loci (Ferrand and Branco 2006).
Motivated by the large X-effect documented in other species, here we focus on four X-linked loci to understand the nature of reproductive isolation in rabbits. Two of these loci are near the centromere and two are near the telomeres. We address three main questions. First, what are the levels and patterns of genetic variation at genes on the rabbit X chromosome? Second, are patterns of variation and introgression heterogeneous among loci, and if so, do the differences correlate with the physical location of genes on the X chromosome? Third, are the data compatible with a model of divergence without gene flow? We surveyed nucleotide variability at four X-linked loci in a sample of 43 male rabbits representing both subspecies and the area of contact. All four loci showed two divergent lineages. Despite this deep divergence, there is still evidence of gene flow between subspecies. Patterns of gene flow and nucleotide variability were heterogeneous among loci, being low at the centromeric loci and high at the telomeric loci. We hypothesize that the centromeric region of the X chromosome of the European rabbit may be involved in reproductive isolation between these two subspecies.
MATERIALS AND METHODS
Forty-three male European rabbits were sampled (Table 1). The samples were divided into three groups: 20 from the northeastern region of the Iberian Peninsula and from France, corresponding to O. c. cuniculus (NE), 14 from the southwestern region of the Iberian Peninsula, corresponding to O. c. algirus (SW), and 9 from the contact zone (CZ) as defined by mtDNA variation (Branco et al. 2000). The geographic locations of the populations sampled are shown in Figure 1, and collecting localities are given in Table 1. Additionally, one male Lepus granatensis was used as an outgroup.
PCR amplification and sequencing:
Genomic DNA was extracted from blood, muscle, or liver following Sambrook and Russell (2001). Introns of four X-linked loci were PCR amplified; two are located near the centromere and two are near the telomeres (Figure 2). Amplification and sequencing primers are listed in supplemental Table 1 at http://www.genetics.org/supplemental/. For each locus, two pairs of amplification primers were designed. The first was based either on published rabbit sequences (Phka2) (Davidson et al. 1992) or on conserved exonic regions among human, mouse, and rat. Nested primers were then designed specifically for the rabbit on the basis of the first sequences obtained. Amplifications were carried out in 50-μl volumes using Platinum Taq High Fidelity DNA Polymerase (Invitrogen, San Diego) following manufacturer recommendations. Cycling temperatures were as follows: an initial denaturation step at 94° for 1 min and 20 sec followed by 35 cycles of 94° for 20 sec, annealing for 20 sec, and extension at 68° for 4 min. Annealing temperatures for each PCR are specified in supplemental Table 1 at http://www.genetics.org/supplemental/. PCR products were purified using the QIAquick PCR purification kit (QIAGEN, Chatsworth, CA) prior to sequencing. Sequencing was carried out using an ABI 3700 automated sequencer. All sequences have been deposited in GenBank under accession nos. DQ306315–DQ306490.
Sequences were inspected and concatenated using the computer program Sequencher (Gene Codes, Ann Arbor, MI) and then aligned manually using the BioEdit software (Hall 1999). By sequencing the X chromosome in males we were able to recover haplotypes directly. The analyses below were based on single nucleotide polymorphisms in introns only.
Basic population genetic parameters, including the number of segregating sites, number of haplotypes, levels of nucleotide diversity, π (Nei and Li 1979), and the proportion of segregating sites, θ (Watterson 1975), were estimated using the program DnaSP 4.00 (Rozas et al. 2003) for the entire data set and also for the NE, CZ, and SW groups (Figure 1). Phylogenetic relationships among alleles were estimated using the median-joining algorithm (Bandelt et al. 1999) as implemented in Network v126.96.36.199 (http://www.fluxus-technology.com/).
We estimated divergence three ways. First, divergence between all O. cuniculus alleles and L. granatensis was calculated as the average pairwise distance per nucleotide site, Dxy (Nei 1987), and as the number of net nucleotide substitutions per site, Da (Nei 1987). Da is defined as Dxy − 0.5 (Dx + Dy), where Dxy is the average pairwise distance between groups and Dx and Dy are the average pairwise distances within groups. Second, Dxy and Da were calculated between the NE and SW groups of O. cuniculus. Finally, to estimate the divergence time of the two subspecies of O. cuniculus, maximum-likelihood net nucleotide distances between L. granatensis and O. cuniculus, and between the two main lineages found in O. cuniculus (see results), were calculated using PAUP v 4.0 (Swofford 2002). Divergence time between subspecies of O. cuniculus was calculated assuming a divergence time between L. granatensis and O. cuniculus of 11.8 million years (MY) (Matthee et al. 2004).
The population recombination parameter, R (R = 3Nc for X-linked loci, where c is the recombination rate per generation and N is the population size) between adjacent sites (Hudson 1987), the minimum number of recombination events, Rm (Hudson and Kaplan 1985), and the number of pairs of sites showing four gametic types were calculated using DnaSP 4.00 (Rozas et al. 2003). Another estimator of the population recombination parameter, γ (Hey and Wakeley 1997), was calculated using the software SITES. While Hudson's R is based on the variance of the number of base-pair differences between DNA sequences, γ is a maximum-likelihood estimator developed using a coalescent model for a sample of four DNA sequences with recombination. Linkage disequilibrium (LD) between pairs of polymorphic sites present at a frequency of at least 10% was calculated within and between all loci, using the statistics D′ (Lewontin 1964) and r2 (Hill and Robertson 1968) as implemented in DnaSP 4.00 (Rozas et al. 2003).
Tajima's D (Tajima 1989) and Fu and Li's D (Fu and Li 1993) were calculated to test for deviations from a neutral equilibrium frequency distribution using DnaSP 4.00 (Rozas et al. 2003). Ratios of polymorphism within O. cuniculus to divergence between O. cuniculus and L. granatensis were compared with the expectations under a neutral model using the Hudson–Kreitman–Aguadé (HKA) test (Hudson et al. 1987). We performed one four-locus test and six pairwise comparisons between loci using the HKA software (Hey and Kliman 1993).
At each of the four loci we detected a deep split in the genealogy (see results). We asked if the observed pattern of nucleotide polymorphism is compatible with a single panmitic population, as opposed to some form of population subdivision. If two populations have evolved in allopatry, the basal branch of a gene genealogy may be longer than in a panmitic population. Furthermore, mutations arising in an isolated subpopulation are unable to recombine with mutations in a different subpopulation, resulting in higher levels of LD. Wall (2000) suggested two measures based on LD that could be powerful indicators of population subdivision. The first, lb, is the number of congruent sites, defined as the number of mutations that, on a pairwise basis, result in only two haplotypes. The second, gd, is the maximum physical distance between congruent sites. Coalescent simulations of panmixia were performed with the computer program ms (Hudson 2002). For each locus, 50,000 genealogies of 43 individuals were simulated conditioned on the estimated values of θ and γ. Additionally, for each locus, coalescent simulations were performed using two different values of the population recombination parameter (3Nc = 0.0015 and 3Nc = 0.015 per site), chosen to reflect the range of recombination rates known for other mammals (e.g., Dietrich et al. 1994; Kong et al. 2002; Jensen-Seaman et al. 2004). A computer program (Garrigan et al. 2005) was used to calculate lb and gd from the simulated data sets, and the distributions of the two statistics for each set of conditions were plotted against each other. The probability of obtaining the observed values of lb and gd was calculated as the proportion of simulated genealogies for which the values of ld and gd were greater than the observed values.
FST and Nm were calculated using the method of Hudson et al. (1992a) implemented in DnaSP 4.00 (Rozas et al. 2003). Genetic differentiation was also calculated using the test statistic Ks* (Hudson et al. 1992b), and significance was assessed by performing 1000 permutations. To test for significant population structure among populations and among groups of populations, analyses of molecular variance (AMOVA; Excoffier et al. 1992) between the SW and NE groups were performed using ARLEQUIN (Schneider et al. 2000).
One simple model of divergence is an isolation model in which two populations become separated with no subsequent gene exchange. The HKA model (Hudson et al. 1987) takes this form and further assumes that the ancestral species has a population size that is the average of the two descendant species. More recent models relax this assumption. For example, Wakeley and Hey (1997) proposed a model that is similar to the HKA model but includes an additional parameter, θA, the population mutation parameter for the ancestral species. While the HKA test uses only the number of polymorphic sites and divergence, this model also incorporates the total number of polymorphic positions in the two groups (S), the number of polymorphisms exclusive to one group (SxNE and SxSW), the number of shared polymorphisms (Ss), and the number of fixed differences (Sf). We tested the fit of our data to these two models in two different ways. First, we performed pairwise comparisons among all loci, and second, we performed tests with all four loci together. The fit of our data to the Wakeley and Hey model of divergence without gene flow was tested using the program WH (Wang et al. 1997).
These models assume that there has been no gene flow between the two populations since the initial split. In many cases this is an unrealistic assumption. Hey and Nielsen (2004) developed a model of population divergence that allows for genetic drift (increasing population divergence) and gene flow (preventing population divergence) to act together, which they call the isolation with migration model. The computer program IM is an implementation of the Markov chain Monte Carlo method for the analyses of genetic data under this model. We used IM to estimate the effective population size of O. c. cuniculus and of O. c. algirus and to estimate migration rates for each locus between subspecies in each direction. IM assumes that there is no recombination within loci. For Phka2 and Hprt1, we used the largest region showing no evidence of recombination, following Won and Hey (2005). For Phka2, a portion of 687 bp containing 23 polymorphic sites was used, and for Hprt1, a region of 501 bp with 20 polymorphic sites was used. For Smcx, all 20 NE and 14 SW individuals were used since the data are free of recombination. For Msn, we removed three recombinant individuals (Vau1, Rsl4, and Rsl10) from the NE group. We assigned wide prior distributions of the parameters on the basis of preliminary trial runs. We ran the program under Metropolis Coupled Monte Carlo Markov Chains, using 10 chains with linear heating. We used a burn-in period of 1,000,000 steps and recorded results every 40 steps. To test whether the chains were mixing well, we ran the program with different random seed numbers and the results were similar. We ran the program for 25,625,601 steps after the burn-in period and recorded the results of 625,014 steps.
Levels and patterns of variation:
We observed considerable variation at all four genes. Polymorphic sites for each gene are shown in Figure 3, and summaries of variation are given in Table 2. The number of polymorphic sites varied from 56 at Msn to 151 at Phka2, but the number of haplotypes observed at each gene was much more constant (from 23 at Smcx to 29 at Phka2). Levels of polymorphism were high, both in the total sample and within each subspecies. However, nucleotide diversity (π) at the two centromeric loci was considerably lower (0.52% at Smcx and 0.55% at Msn) than at the two telomeric loci (0.70% at Phka2 and 1.26% at Hprt1). This contrast was even more striking in the proportion of segregating sites, where θ at the centromeric loci was roughly half the value seen at the telomeric loci.
We assessed the amount of LD in our sample in several ways, and all were consistent in revealing more recombination (less LD) at the telomeric loci than at the centromeric loci (Table 2), consistent with suppression of recombination near the centromere. The number of pairs of sites showing all four gametic types was zero at Smcx, 77 at Msn, 390 at Phka2, and 180 at Hprt1. Rm, the minimum number of recombination events in the history of the sample (Hudson and Kaplan 1985), was zero at Smcx, intermediate at Msn and Hprt1 (7 and 6, respectively), and highest at Phka2 (17). Similarly, R (Hudson 1987) between adjacent sites was low at the two centromeric loci (Smcx and Msn) and much higher at the telomeric loci (Phka2 and Hprt1). The values for γ (Hey and Wakeley 1997), a maximum-likelihood estimator of the population recombination parameter, are concordant with R in showing more recombination at the telomeric loci than at the centromeric loci. Thus, although there are small differences among estimators, levels of recombination are higher at the telomeric loci than at the centromeric loci. We assessed the significance of LD through pairwise comparisons of polymorphic sites present at a frequency of at least 10% using a Fisher's exact test. From the 8515 pairwise comparisons performed, 2497 were significant in a two-tailed test, and 980 remained significant after a Bonferroni correction for multiple tests. Of these, 197 were between polymorphic sites at Smcx, 219 between sites at Msn, 93 between sites at Phka2, and 130 between sites at Hprt1. Interlocus LD was detected only between Smcx and Msn where there were 341 pairs of sites that showed significant LD.
The distribution of allele frequencies as measured by Tajima's D and Fu and Li's D generally conformed to expectations under a neutral model of molecular evolution (Table 2). For example, in the total sample, Tajima's D was positive (Msn and Hprt1), very close to zero (Smcx), or negative (Phka2), but not significantly different from zero. When the population groups were analyzed separately, Tajima's D was negative (except for Hprt1 in the NE and SW groups), but not significantly so (P > 0.05 for all tests). We also tested a neutral model of molecular evolution by comparing ratios of polymorphism within O. cuniculus (in the total sample) to divergence between O. cuniculus and L. granatensis, using the HKA test. We performed one four-locus comparison as well as six pairwise comparisons among loci, and none of these were significant (P > 0.05 for each).
Divergence and gene flow between subspecies:
FST estimates between O. c. cuniculus and O. c. algirus are shown in Table 3. FST was very high at the centromeric loci, Smcx (0.680) and Msn (0.829), and one order of magnitude lower at the telomeric loci, Phka2 (0.027) and Hprt1 (0.022). The two subspecies were significantly differentiated, using the Ks* test statistic, at all loci but Hprt1 (Table 3). Similarly, the AMOVA analyses revealed that at the two centromeric loci, Smcx and Msn, most of the genetic variation is partitioned among the two subspecies (64 and 84%, respectively) while at the telomeric loci, Phka2 and Hprt1, most of the observed variation was partitioned among populations within each subspecies (93% at both loci), and only a marginal proportion (2%) of the variation segregated between subspecies.
This differentiation can also be seen in the phylogeny of alleles for each gene (Figure 4). At each locus there were two divergent groups of haplotypes, and in each case the root fell along the deep branch separating these two groups. In this analysis, only Smcx was free of homoplasy. The locus with the most homoplasy was Phka2. This homoplasy may be due to recombination or recurrent mutation. Evidence for recurrent mutation comes from the observation that at Hprt1 three different positions have three nucleotides segregating (Figure 3d). Other evidence of recurrent mutation is the fact that the amount of homoplasy is slightly reduced if CpG sites, which are known to be hypermutable, are excluded. For example, for Phka2, the consistency index (CI) increased from 0.778 to 0.803 when CpG sites were removed. However, much of the homoplasy is probably due to recombination, as evidenced by the fact that the CI was 1.0 for Smcx, 0.889 at Msn, but was 0.778 and 0.717 at Phka2 and Hprt1, respectively. Moreover, at Phka2 and Hprt1, 1 and 16 individuals, respectively, were identified as recombinants between the two divergent lineages on the basis of their position on the haplotype network and by visual inspection of the table of polymorphism (Figure 3, a and d).
The degree of introgression between subspecies can also be seen by the concordance (or lack thereof) between geography and phylogeny. For the two centromeric loci (Smcx and Msn), there was good concordance between phylogeny and geography; i.e., the two major lineages correspond well with each subspecies (Figure 4). At Msn we did not detect any introgressed haplotypes and at Smcx we observed only three NE individuals with haplotypes from the lineage that is otherwise restricted to the SW and CZ groups. At the two telomeric genes (Phka2 and Hprt1), in contrast, there seems to be little or no concordance between phylogeny and geography. At all four genes, individuals from the CZ group are scattered throughout the haplotype networks.
The proportion of congruent sites, lb, is greater at the two centromeric loci (representing 30 and 32% of all polymorphic sites at Smcx and at Msn, respectively) than at the telomeric loci (11% at Phka2 and 16% at Hprt1). Similarly, the maximum distance between congruent sites, gd, is greater at Smcx (85% of the total locus length) and at Msn (95%) than at Phka2 (65%) and Hprt1 (16%). We calculated the probability of observing these values of lb and gd using coalescent simulations of 50,000 genealogies of 43 individuals evolving neutrally under panmixia with mutation (θ) and recombination (γ) parameters estimated from the data. Results are shown in Table 4. Under these conditions, the null model was rejected for Msn (P = 0.00214). This test is quite conservative using γ estimated from the data since population subdivision will increase LD and thus underestimate the true value of recombination. Therefore, we also conducted simulations with a population size of 105 and per-site recombination rates of 0.5 × 10−8 and 5 × 10−8, reflecting the range of recombination rates seen in other mammals (e.g., Dietrich et al. 1994; Kong et al. 2002; Jensen-Seaman et al. 2004). At the two centromeric loci, Msn and Smcx, the null model was rejected using either value of recombination. For Phka2, the null hypothesis was rejected only with the higher recombination rate, and Hprt1 was marginally significant (P = 0.066) only for the higher recombination rate (Table 4).
Another way of looking at divergence is to quantify the amount of shared and fixed variation between the two groups (Table 5). The number of shared polymorphisms was low at the centromeric loci (16 and 6% of all polymorphisms at Smcx and Msn, respectively) and high at the telomeric loci (42 and 64% at Phka2 and Hprt1, respectively). Only Msn showed fixed differences between the two groups. These patterns of variation suggest that there has been gene flow between O. c. cuniculus and O. c. algirus at some, but not all, loci. To further test this, we performed an HKA test between NE and SW population groups. A multilocus test between all four loci failed to reject the null model, but in one pairwise comparison (between Phka2 and Msn) the model was rejected (P = 0.035). This result seems to be mainly driven by the fact that divergence at Msn was much higher than expected (observed D = 26.80; expected D = 10.68). The isolation without migration model (Wakeley and Hey, 1997) shares most of the assumptions with the HKA model, but estimates the ancestral population size instead of assuming that it is the average of the population size of the extant populations. A four-locus test using this model also failed to reject the null hypothesis. We also tested the fit of the data using all pairwise comparisons to Msn. Only the comparisons to Msn were performed because in the other comparisons there are no fixed differences and the program is unable to simulate the distribution of the expected values. The comparison between Smcx and Msn failed to reject the null model while the other two comparisons did reject the null model (Phka2/Msn: P(χ2) = 0.032 and P(WH) = 0.030; Msn/Hprt1: P(χ2)= 0.024 and P(WH) = 0.009).
We used IM (Hey and Nielsen 2004) to obtain maximum likelihood estimates (MLE) of the effective population size for each subspecies. We also estimated migration rates for each locus in each direction. The average estimate of the effective population size was ∼882,000 for O. c. algirus and 422,000 for O. c. cuniculus (Table 6). The probability distribution of the ancestral population parameter was flat (not shown), as expected if the ancestral population existed long ago (Won and Hey 2005). Similarly, the probability distribution of t, the time since divergence, was flat, but nonzero (not shown). This suggests that the two subspecies were isolated in the past, but this analysis does not provide a reliable estimate of the time of isolation. Gene flow at the telomeric loci was higher from NE to SW than from SW to NE. For the centromeric loci, introgression of Msn is quite low in both directions, while Smcx shows some unidirectional introgression from SW to NE. Thus it seems that levels and patterns of gene flow are very different between centromeric and telomeric loci.
We estimated divergence time between the two subspecies of O. cuniculus using a phylogenetic approach. Assuming a divergence time of 11.8 MY (Matthee et al. 2004) between O. cuniculus and L. granatensis, divergence time between O. c. cuniculus and O. c. algirus was estimated to be on the order of 2–5 MYA (Table 7 ).
We documented genetic variation at four X-linked loci in natural populations of the European rabbit, O. cuniculus. At each locus, we observed a deep split in the phylogeny with the root lying along the long internal branch. This pattern is consistent with the evolution of each subspecies in allopatry and subsequent secondary contact. Despite this broad similarity among loci, we detected heterogeneity among loci in terms of levels of nucleotide polymorphism, recombination, and introgression between the two subspecies. This heterogeneity corresponds well with the physical location of the loci on the rabbit X chromosome. The two centromeric loci had lower levels of nucleotide polymorphism, higher levels of LD, and reduced introgression in comparison with the two telomeric loci. Although we do not have direct estimates of the frequency of crossing over in rabbits, these observations are consistent with suppression of recombination near the centromere, as has been observed in other species (e.g., Kong et al. 2002).
Levels and patterns of variation:
Across the entire sample (i.e., including both subspecies), the average heterozygosity among all loci (π = 0.76%) was high and roughly one order of magnitude higher than heterozygosity at X-linked loci in humans (π = 0.081%; Hammer et al. 2004) and mice (π = 0.078%; Nachman 1997). Clearly, this high level of nucleotide variability reflects not only nucleotide polymorphism within each subspecies but also the divergence between subspecies. One gene (Msn) showed no introgression between subspecies. Levels of nucleotide polymorphism at this gene were 0.14% for O. c. cuniculus and 0.26% for O. c. algirus, closer to values observed in humans (Hammer et al. 2004) and mice (Nachman 1997).
We also observed variation in levels of polymorphism among loci. Interestingly, the two centromeric loci had lower levels of π and θ than observed at the two telomeric loci, both for the entire data set and for each subspecies considered separately. Within each subspecies, this difference may be explained by different levels of introgression. In other words, Smcx and Msn may be less variable within each subspecies because they contain relatively few introgressed haplotypes, compared to Phka2 and Hprt1. However, we also observe less variation at Smcx and Msn in the total sample. This may be due in part to lower mutation rates at these genes. For example, divergence between Oryctolagus and Lepus is lower at Smcx (Dxy = 1.74%) and Msn (Dxy = 4.08%) than at Phka2 (Dxy = 6.06%) or Hprt1 (Dxy = 4.43%) (Table 2). If recombination is suppressed near the centromere, these differences in mutation rate may reflect an association between mutation and recombination (e.g., Hellmann et al. 2003). It is also possible that reduced variation at Smcx and Msn may be due partly to the effect of either positive or negative selection at linked sites (Maynard-Smith and Haigh 1974; Charlesworth et al. 1993).
Indirect evidence that recombination is suppressed near the centromere comes from our observation of increased LD at Smcx and Msn compared to Phka2 and Hprt1. Patterns of LD are affected by many factors, including selection, mutation, recombination, and changes in population size (e.g., Ardlie et al. 2002). However, in humans, there is good evidence that levels of LD are inversely correlated with recombination rate over much of the genome (e.g., Reich et al. 2001; McVean et al. 2004; Myers et al. 2005). Moreover, in many organisms, recombination is suppressed near the centromeres, particularly in metacentric chromosomes (e.g., Kong et al. 2002). Thus, our observation of increased LD at Smcx and Msn relative to Phka2 and Hprt1 is consistent with, but not proof of, reduced recombination near the rabbit X centromere.
Divergence and gene flow between subspecies:
RFLP surveys of mtDNA polymorphism in the Iberian Peninsula and France have shown that O. cuniculus is composed of two deeply divergent mtDNA lineages that are thought to have diverged ∼2 MYA (Biju-Duval et al. 1991; Branco et al. 2000). A survey of nucleotide variability at Sry also found evidence for the existence of two divergent lineages in the Y chromosome (Geraldes et al. 2005). These two lineages are associated with O. c. algirus and O. c. cuniculus (Branco et al. 2000) and are thought to have evolved in allopatry. Our X chromosome data confirm the existence of two divergent evolutionary units in O. cuniculus, and we show that in general the data reject the evolution of the two lineages under panmixia. The divergence time estimated from these loci is in good agreement with divergence time estimated from mitochondrial genes and places the origin of these two subspecies at the Pliocene/Pleistocene boundary. We observed high levels of population differentiation at the two centromeric loci, but not at the telomeric loci. At the centromeric loci the two divergent lineages correspond well with the described subspecies and are broadly concordant with the patterns of differentiation seen at the Y chromosome and at the mtDNA. The same was not observed at the two telomeric loci, where geography and phylogeny are largely decoupled.
If two populations evolve in allopatry for a sufficiently long time and then come into secondary contact with little or no gene flow, a high percentage of fixed differences and a small number of shared polymorphisms are expected. In our data this is seen only at Msn where 36% of all polymorphisms correspond to fixed differences between groups, and 6% correspond to shared polymorphisms. At all other loci, there are no fixed differences between subspecies and the percentage of shared polymorphisms varies from 16% at Smcx (centromeric) to 64% at Hprt1 (telomeric). This heterogeneity among loci is also reflected in the rejection of an isolation without gene flow model using the HKA test between Phka2 and Msn and the rejection of the null model using the WH test between Phka2 and Msn and between Hprt1 and Msn.
It is noteworthy that the patterns of reduced introgression seen at Smcx and Msn, which may experience reduced recombination, are similar to the patterns seen previously at the mtDNA (Branco et al. 2000) and the Y chromosome (A. Geraldes, unpublished data), genomic regions with no recombination. In contrast, a survey of 14 allozyme loci revealed higher, but variable, levels of introgression (ϕct between subspecies ranged from 0 to 0.46), comparable to the patterns observed at X chromosome loci. Differences among loci in levels of introgression have also been documented in other organisms. For example, genomic regions with suppressed recombination as a result of chromosomal rearrangements introgress less than colinear regions in comparisons between Drosophila pseudoobscura and D. persimilis (Noor et al. 2001; Machado et al. 2002) and between hybridizing sunflowers of the genus Helianthus (Rieseberg et al. 1999). On the basis of such observations, Noor et al. (2001) and Rieseberg (2001) have argued that chromosomal rearrangements may promote speciation, not through underdominance directly as in traditional models (e.g., White 1978), but by suppressing recombination and thereby extending the effects of isolation genes to linked sites. Our finding of low levels of introgression in an area of high LD near the X chromosome centromere of the rabbit is consistent with similar observations in fruit flies and sunflowers. Similarly, in Anopheles mosquitoes, two (of three) areas of reduced introgression map to centromeres (Turner et al. 2005).
Our observations also have some interesting parallels with studies of hybridization in the house mice, Mus musculus and M. domesticus. In the house mouse hybrid zone in Western Europe, the Y chromosome shows reduced introgression (Vanlerberghe et al. 1986; Tucker et al. 1992; Dod et al. 1993), and the X chromosome shows lower levels of introgression than do the autosomes (Tucker et al. 1992; Dod et al. 1993; Munclinger et al. 2002), although there is also considerable variability in levels of introgression among loci on the X chromosome (Payseur et al. 2004). In a similar fashion, we observe some X-linked loci with much reduced introgression in rabbits, providing further support for the importance of the X chromosome in reproductive isolation. Interestingly, the differences among loci in migration estimates (Table 6) may provide some clues to the nature of incompatibilities underlying reproductive isolation. In particular we note that estimates of the number of migrants from NE to SW for the centromeric loci are slightly lower than in the opposite direction. This is in agreement with the expected asymmetric behavior of young Dobzhansky–Muller interactions (Orr 1995) and suggests that incompatibilities may derive from interactions between the cuniculus X chromosome and an algirus genetic background.
One ultimate goal of speciation studies is to determine the identity of genes involved in reproductive isolation between nascent species. With the completion of the sequence of the rabbit genome expected in the next few years, it may soon be possible to identify candidate genes for reproductive isolation in this species. The results presented here suggest that some of these genes may lie near the centromere of the X chromosome.
We thank R. Villafuerte for help in collecting rabbit samples; J. Good and C. Pinho for valuable discussions; D. Garrigan, T. Salcedo, and M. Dean for help with ms simulations; and J. Hey for help with IM. We also thank two anonymous reviewers for suggestions on a previous version of this manuscript. This work was supported by Fundação para a Ciência e a Tecnologia (SFRH/BD/4621/2001) Ph.D. grant to A.G. and Research Project POCTI/BSE/40280/2001 and by a National Science Foundation grant to M.W.N.
- Received November 30, 2005.
- Accepted March 20, 2006.
- Copyright © 2006 by the Genetics Society of America