- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Supplementary Material
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Ramos-Onsins, S. E.
- Articles by Aguadé, M.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Ramos-Onsins, S. E.
- Articles by Aguadé, M.
Multilocus Analysis of Variation and Speciation in the Closely Related Species Arabidopsis halleri and A. lyrata
Sebastián E. Ramos-Onsins1,2,a, Barbara E. Stranger1,3,b, Thomas Mitchell-Oldsb, and Montserrat Aguadéaa Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, 08028 Barcelona, Spain
b Department of Genetics and Evolution, Max Planck Institute of Chemical Ecology, 07745 Jena, Germany
Corresponding author: Barbara E. Stranger, Facultat de Biologia, Universitat de Barcelona, Diagonal 645, 08028 Barcelona, Spain., bstrange{at}porthos.bio.ub.es (E-mail)
Communicating editor: O. SAVOLAINEN
| ABSTRACT |
|---|
Nucleotide variation in eight effectively unlinked genes was surveyed in species-wide samples of the closely related outbreeding species Arabidopsis halleri and A. lyrata ssp. petraea and in three of these genes in A. lyrata ssp. lyrata and A. thaliana. Significant genetic differentiation was observed more frequently in A. l. petraea than in A. halleri. Average estimates of nucleotide variation were highest in A. l. petraea and lowest in A. l. lyrata, reflecting differences among species in effective population size. The low level of variation in A. l. lyrata is concordant with a bottleneck effect associated with its origin. The A. halleri/A. l. petraea speciation process was studied, considering the orthologous sequences of an outgroup species (A. thaliana). The high number of ancestral mutations relative to exclusive polymorphisms detected in A. halleri and A. l. petraea, the significant results of the multilocus Fay and Wu H tests, and haplotype sharing between the species indicate introgression subsequent to speciation. Average among-population variation in A. halleri and A. l. petraea was
1.5- and 3-fold higher than that in the inbreeder A. thaliana. The detected reduction of variation in A. thaliana is less than that expected from differences in mating system alone, and therefore from selective processes related to differences in the effective recombination rate, but could be explained by differences in population structure.
MULTILOCUS analyses of nucleotide variation in closely related species not only reveal variation that has accumulated independently in each species since their divergence, but also may detect variation originally segregating in their common ancestor, which has not yet been lost. Indeed, early in the process of speciation, newly formed species share polymorphisms due to recent divergence from a common ancestor (![]()
![]()
Speciation models have been developed to describe the expected behavior of genetic polymorphism within and between species following divergence (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Arabidopsis halleri and A. lyrata are closely related outcrossing species whose divergence from the inbreeder A. thaliana is estimated to have occurred
5 MYA (![]()
![]()
![]()
![]()
![]()
This work also aims to quantify species-wide levels of variation in the studied outbreeding Arabidopsis species and to establish, through comparison with A. thaliana, the effect of mating system and population structure on intraspecific variation. In mutation-drift equilibrium, differing levels of variation might indicate differences in either the effective population size or the neutral mutation rate. They might also reflect differences in mating system, which influence effective population size (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
| MATERIALS AND METHODS |
|---|
Species sampling:
Table 1 describes the geographic origin of the individuals of each species used in this study and the genes sequenced for each individual. DNAs of most A. halleri and A. l. petraea individuals were kindly provided by Pierre Saumitou-Laprade and Helmi Kuittinen, respectively. In other cases, seeds were obtained from T. Mitchell-Olds' personal collection (some A. halleri and A. l. petraea individuals and all A. l. lyrata individuals) and from the Nottingham Arabidopsis Stock Centre (A. thaliana ecotypes Col-0, Mrk-0, Kas-1, Lip-0, Ler, and Ll-0; Nottingham, United Kingdom). Seeds were planted and grown in growth chambers under standard 16-hr light and 8-hr dark conditions. Leaves were harvested from individual plants prior to flowering, and genomic DNA was extracted as described (![]()
![]()
|
PCR amplification and sequencing:
The genes studied code for enzymes in the phenylpropanoid pathway (CAD, cinnamoyl alcohol dehydrogenase; CHI, chalcone isomerase; CHS, chalcone synthase; DFR, dihydroflavonol reductase; F3H, flavanone-3-hydroxylase; and FAH-1, ferulate-5-hydroxylase), which may influence resistance to UV-radiation, and enzymes involved in glucosinolate biosynthesis (GS, glucosyltransferase; and MAM-L, methylthioalkylmalate-like synthase), and thus in plant defense against insects. The PCR amplification primers used, amplification conditions, and description of the fragment amplified for each of the eight genes studied are as described in ![]()
400-bp intervals were designed.
In A. thaliana, sequences (one per ecotype) were obtained on both strands by direct sequencing of PCR products. In the other species, two sequences were obtained per individual. For the CHS, GS, and MAM-L loci, the PCR products were run on agarose gels, the appropriate-sized bands were excised, and the DNA was extracted from the gel slices using the QIAquick gel extraction kit (QIAGEN, Valencia, CA). The purified fragments were subsequently cloned into a pCRII vector using the TOPO-TA cloning system (Invitrogen, Carlsbad, CA). After transformation of TOP10F bacteria, plasmid DNA was extracted from individual transformed colonies using a plasmid DNA isolation kit (QIAGEN). The M13 universal and M13 reverse primers, as well as internal locus-specific primers, were used for sequencing. Both strands of 418 plasmids per locus per plant were sequenced using the Big Dye Terminator 2.0 kit (Applied Biosystems, Foster City, CA) and run on an ABI 3700 sequencer (Applied Biosystems). For the CAD, CHI, DFR, F3H, and FAH1 loci, sequences were obtained by direct sequencing of PCR products after their purification with QIAquick columns (QIAGEN); in all lines with two or more polymorphisms, the cloning and sequencing strategy indicated above was used.
Data analyses:
Contigs were assembled using Sequencher (Gene Codes, Ann Arbor, MI), SeqMan (DNASTAR, Madison, WI), and SeqEd version 1.0.3 program (Applied Biosystems). Multiple sequences for each gene were aligned using either a combination of methods implemented in Megalign (DNASTAR) or ClustalW (![]()
![]()
![]()
The DnaSP version 3.98 (![]()
![]()
![]()
![]()
Nucleotide variation was estimated as nucleotide diversity,
(![]()
B), which refers to the average number of pairwise differences between geographic locations, was used as a measure of species-wide diversity (as proposed by ![]()
w) was also estimated. Note that our
B does not refer to net variation between locations, i.e., to
T -
w (![]()
![]()
![]()
Several tests were used to detect possible departure from the predictions of an equilibrium neutral model: tests based on either the frequency spectrum of polymorphisms or the haplotype distribution (![]()
![]()
![]()
![]()
![]()
![]()
For the Fay and Wu H test (![]()
) rather than a fixed number of mutations. Analyses were performed with total and with silent positions, using both an unbiased and a biased transition/transversion ratio (ratio values equal to 0.5 and 2.0, respectively). Time of divergence to the outgroup was estimated taking into account the ratio of divergence to nucleotide diversity (K/
). Therefore, the time to isolation (Td) of the outgroup sequence was calculated as ((K/
) - 1)/2 (![]()
The nonparametric sign test (![]()
Sampling strategy for neutrality tests:
The null hypothesis of most tests of neutrality assumes panmixia in addition to mutation-drift equilibrium. Deviations from panmixia, e.g., population subdivision, can thus lead to rejection of the null hypothesis even if neutrality holds. Since sequences in the present study were obtained by sampling within and between locations throughout each species' distribution area, we tried to minimize the effect of population subdivision in the neutrality tests. If no genetic differentiation was detected for a particular locus and species, the sample was considered to be panmictic, and all sequences were used in neutrality tests. Otherwise, a single randomly chosen sequence per location (constructed sample) was used, and all (or a subset of all) combinations were tested. This mixed sampling strategy uses all the information available whenever there is no evidence for population subdivision and minimizes the effect of subdivision on neutrality tests if subdivision is present. The use of a single sequence per deme conforms to the assumptions of a standard equilibrium model, where
= 4Neµ and Ne is rescaled as a function of population structure (![]()
![]()
![]()
Multilocus tests of neutrality:
Neutrality tests using multilocus data from a single species were performed following Hey (as detailed in the HKA program) for TAJIMA's (1989) DT, FU's (1996) Fs, FU and LI's (1993) DF and FF, and FAY and WU's (2000) H tests. When, as in this example, loci are either not linked or loosely linked, then the history of each locus can be considered to be independent and forces affecting one particular locus should not affect other loci. Under strict neutrality and complete independence between loci, the effective population size (Ne) would be the only population parameter shared by all loci. The multilocus test statistic (m[DT], m[Fs], m[DF], m[FF], and m[H]) was obtained, in each case, by summing across loci the corresponding test statistic values for each locus and dividing by the total number of loci, which gives them all the same weight. The empirical distribution of the multilocus test statistic under neutrality was obtained from coalescent-based simulations. The simulations were run independently for each locus (with the corresponding sample size) and were conditioned on the number of segregating sites. For each replicate, the multilocus test statistic (e.g., m[DT]) was obtained by averaging the corresponding test statistic (e.g., DT) across loci.
Inferring ancestral variation:
The multilocus sequence comparison of two closely related species allows detection of four classes of mutations (excluding sites with multiple hits, Smhits; Fig 1A), which can be grouped into three different categories (![]()
![]()
|
As depicted in Fig 1B, availability of the homologous sequences from an outgroup species allows partitioning of fixed differences into the two lineages of closely related species (Sf1, Sf2) and also detection of two new classes of mutations (Sf1x2, Sf2x1). These latter classes correspond to mutations that are fixed in one species but still segregate as polymorphisms in the other species. Unlike biallelic variation, multiple hits are grouped into a single category because of the difficulty of unambiguously establishing the direction of mutations. These two new classes of mutations and the shared polymorphisms class constitute, therefore, the third category of mutations, hereafter called ancestral mutations. The numbers of mutations in each class were obtained using the calc_mut program (available from the authors).
The hypergeometric distribution (![]()
![]()
![]()
Population parameters for two extant closely related species were estimated from the observed distribution of variation within and between species (Sx1, Sx2, Sf, Ss) under the simple isolation model of speciation (here named WH) described in ![]()
A) and each descendant (
1 and
2) species, as well as the time from speciation (Tsp, measured in Ne generations of one of the descendant species), can be estimated.
| RESULTS |
|---|
Levels of DNA sequence polymorphism and divergence from A. thaliana:
In both A. halleri and A. l. petraea, sequences of eight independent genes (Table 2) were obtained from one to two individuals from each of three to five populations (Table 1 and Table 3). A subset of these genes was sequenced in samples from A. l. lyrata and A. thaliana (Table 1). Two sequences per individual were obtained for each locus, except in A. thaliana (see MATERIALS AND METHODS). Samples consisted of individuals collected across each species' distribution area. In general, each location was represented by one individual, although for A. halleri two individuals from the same location were sequenced in some cases (Table 1). For each species, the length of the region analyzed for each gene varied between
500 and 2000 nucleotides; these numbers are, however, lower when all species are considered (Table 2). Figures S1S8 (at http://www.genetics.org/supplemental/) summarize intra- and interspecific variation.
|
|
For each gene and species, nucleotide diversity estimates for total and silent variation are presented in Table 3 and Table 4, while estimates of synonymous and nonsynonymous variation are given in Table 5. In A. halleri, A. l. petraea, and A. l. lyrata, estimates of variation within and among locations were obtained in addition to estimates based on all sequences (Table 3). Comparison of the averages of among-population variation between the three outcrossing species yielded similar results to those obtained using all sequences: (i) variation was higher in A. l. petraea than in A. halleri, and (ii) variation was lowest in A. l. lyrata (Table 3). Levels of polymorphism varied across loci, with DFR and GS being the overall most variable genes. Variation was lower at nonsynonymous sites than at synonymous sites both within and among species. In the three taxa, within-population variation was generally a small fraction of total variation. Indeed, significant genetic differentiation between locations was detected in four out of eight genes in A. halleri, in seven out of eight in A. l. petraea, and in all three in A. l. lyrata (Snn, Table 3).
|
|
Table 5 shows the variation at synonymous and nonsynonymous sites, as well as the synonymous/nonsynonymous ratio for polymorphism and divergence. Average levels of divergence (number of substitutions per site) between these species and A. thaliana were
0.16 for synonymous and 0.015 for nonsynonymous, approximately the same order of magnitude as estimates reported by ![]()
We compared nucleotide variation in six genes (CHI, CHS, F3H, FAH1, GS, and MAM-L) between A. thaliana (![]()
![]()
1.53 times lower in the inbreeding species A. thaliana than in the outcrossing species A. halleri and A. l. petraea. Thus, we clearly observe reduced variation in the inbreeder, a reduction that is concordant at least in sign with the difference in mating system. We obtained analytical results that could also explain the observed levels of variation in species with different population structure and mating system (Appendix).
Detecting ancestral variation in A. halleri and A. lyrata:
Table 6 presents the distribution of variable sites in the A. halleri/A. l. petraea and the A. l. lyrata/A. l. petraea pairs, using A. thaliana as the outgroup. In the A. halleri/A. l. petraea comparison, most genes harbored variants in the two new classes of putatively ancestral polymorphisms (Sf1x2, Sf2x1), while only three loci exhibited shared polymorphisms (Ss). The two newly proposed classes of mutations make an important contribution to the total number of ancestral mutations: 38 out of 71 mutations in the A. halleri/A. l. petraea comparison (Table 6A) and 31 out of 36 in the A. l. lyrata/A. l. petraea comparison (Table 6B). The possible difference in population size of these subspecies is also reflected in the number of segregating sites in each new category. Indeed, the total number of mutations fixed in the A. halleri branch that still segregate in A. l. petraea (24) is higher than those fixed in A. l. petraea that still segregate in A. halleri (14). This is consistent with A. halleri having a lower effective population size, since the coalescence time is shorter for A. halleri than for A. l. petraea. The same effect can be detected in the A. l. lyrata/A. l. petraea comparison (28 vs. 3; Table 6B).
|
The hypergeometric distribution (![]()
![]()
|
In the A. l. lyrata/A. l. petraea comparison, the presence of ancestral mutations in the classes Sf1x2 and Ss (Table 6B) could not be explained by a recurrent mutation process. Indeed, significant or nearly significant departures from the null hypothesis were detected in most cases (Table 7B). The presence of only a single silent Sf2x1 mutation precluded performing the corresponding test.
Neighbor-joining trees (![]()
1000 bp) to sequences present in A. halleri. Although these two species shared polymorphisms and had no fixed differences between them at the DFR locus, the distribution of polymorphic sites at DFR did not reveal interspecific haplotype sharing (Figure S4). There was no clearly identifiable region in either DFR or any of the other loci with ancestral mutations that we might have attributed to introgression. However, the heterogeneous patterns observed in GS and the other loci do not preclude the possibility of other, undetected introgressed variants.
Table 8 presents the population parameter estimates from the WH isolation model of speciation (![]()
A) is estimated to be 105-fold higher than that in extant populations, which would imply a 105-fold difference in effective population size between the ancestor and the descendant species (assuming mutation-drift equilibrium and homogeneous mutation rates). Therefore, the estimated time to speciation was rather low compared to estimates from nucleotide divergence (results not shown). The biological implausibility of the parameters estimated from the WH model points to some violation of the neutrality and/or complete isolation assumptions of the model in the A. halleri/A. l. petraea comparison.
|
The low number of segregating sites in A. l. lyrata probably affects the estimation of population and speciation parameters in the A. l. lyrata/A. l. petraea comparison (Table 8), and these estimates should therefore be interpreted with caution. Although estimates differed between the two constructed combinations, they suggest a lower effective population size for A. l. lyrata.
Neutrality tests:
The implausible parameter estimates produced by the isolation model in the A. halleri/A. l. petraea species comparison may be caused by the violation of assumptions of the model. Natural selection, nonequilibrium demographic processes, or introgression may explain these results. We used neutrality tests for individual loci, as well as several multilocus tests, to test for deviations from a neutral equilibrium model. For individual loci, most tests were nonsignificant, as summarized by the average values of the test statistics over all combinations (Table 9). The only exception was the highly significant Fay and Wu's H statistic for CHS in A. halleri. Indeed, 13 of the 15 polymorphisms in this gene were due to ancestral variants present in a single highly divergent sequence (Figure S2). Also, pairwise MK (McDonald-Kreitman) tests were performed for all loci and species pairs. A marginally significant deviation was detected for CHS in the A. halleri/A. l. petraea comparison (P < 0.049) and for CHS in the A. halleri/A. thaliana comparison (P < 0.047). However, no significant deviation from neutral expectation was detected after Bonferroni correction (results not shown).
|
Despite the general lack of significance observed, in some tests the deviations from equilibrium neutral patterns were in the same direction (for each particular sample combination and species) in a majority of the genes (Table 9). That was the case for the H statistic, which was negative in seven out of eight genes in A. halleri and in six out of eight in A. l. petraea, and also for the Tajima's D and Fu and Li's D and F in A. l. lyrata. These trends prompted us to perform multilocus tests.
Polymorphism in A. halleri and in A. l. petraea was compared to divergence of each species relative to A. thaliana with the multilocus HKA test. The large number of combinations of constructed samples for all loci (162 x 322 = 2.62 x 105 for A. halleri and 326 x 16 = 1.72 x 1010 for A. l. petraea) precluded performing all possible tests. After performing multilocus analyses for several combinations, it was clear that test results in both A. halleri and A. l. petraea were consistent for most data subsamples (results not shown). Therefore, only results for one combination per species are shown (Fig 2). In the A. halleri/A. thaliana comparison, the HKA test rejected the null hypothesis of proportionality between polymorphism and divergence (P < 0.002), while the test statistic value was close to significance in the A. l. petraea/A. thaliana comparison (P < 0.073). In both cases, the largest contributions to the overall test statistic were due to DFR, GS, and MAM-L (Fig 2). Significant values can be explained by a larger variance in the polymorphism/divergence ratio than that expected under a neutral equilibrium model. This large variance might be attributable to selection on a subsample of the loci or to violation of assumptions of the model. The multilocus HKA test was not significant in the A. l. lyrata/A. thaliana comparison.
|
We employed multilocus analyses incorporating across-loci averages for each statistic and nonparametric sign tests (see MATERIALS AND METHODS). As above, test results were similar for most combinations. In A. halleri, most single and multilocus tests gave nonsignificant results. Indeed, only the multilocus m[H] test for A. halleri yielded a significant result (Fig 3A), as well as the nonparametric sign test (eight H values under the median; P = 0.0039). A similar result was obtained in this species when all tests were performed for the combination that yielded the most conservative value for the m[H] statistic: only the tests using the H statistic were significant. The m[H] test was also significant when the CHS gene, which was highly significant by itself, was excluded. In A. l. petraea, only the nonparametric sign test for the H statistic was significant (seven of the eight H values under the median; P = 0.035). The m[H] value was also under the median, indicating a similar, but nonsignificant, deviation (Fig 3A). Equivalent results were obtained for Fay and Wu's H test considering recurrent variation (see MATERIALS AND METHODS), except for the sign test in the species A. l. petraea, where half of possible combinations were nonsignificant. The significant results of the tests for A. halleri and A. l. petraea indicate deviations from the neutral equilibrium model at a multilocus level rather than for single loci. Significant values of the Fay and Wu statistic have been generally explained by positive directional selection, but they can also be the result of recent gene flow (![]()
|
For A. lyrata ssp. lyrata, both data subsamples gave similar results, although m[DF] and m[FF] were significant in only one of the two possible combinations (P = 0.048 and P = 0.035). Fig 3B shows the individual and average multilocus tests for Tajima's D and Fu and Li's D statistics. The same negative pattern was found for all loci, which may indicate population expansion or other demographic processes.
| DISCUSSION |
|---|
Relationship between A. halleri and A. lyrata:
Comparative analysis of nucleotide variation at eight loci in A. halleri and A. l. petraea and at three loci in A. l. petraea and A. l. lyrata can shed light on the process of speciation at different timescales. In general, species recently originated from a common ancestor should present few fixed differences, due to the long time required for mutations to accumulate and become fixed. Genetic variants shared by these species may represent polymorphisms present in their common ancestor that have been maintained either by chance or by some form of balancing selection. However, they also may result from introgression between species or from recurrent mutations.
Our comparative analysis of variation in the two subspecies of A. lyrata (A. l. petraea and A. l. lyrata) is based on only three genes, with very low levels of variation in A. l. lyrata. The observed data can generally be explained by a simple isolation model. The parameters estimated using the WH model indicated a significant reduction of variability in A. l. lyrata, suggesting a bottleneck during the colonization of the New World from Eurasia.
In the A. halleri/A. l. petraea comparison, recurrent mutation was insufficient to explain the observed number of putatively ancestral mutations, indicating that most of them are in fact the result of individual mutation events. The high number of mutations observed in the ancestral class could thus be due either to their persistence in isolated taxa since speciation or to an ancient divergence with some degree of subsequent gene flow between species. Results from the WH isolation model were biologically implausible, which suggests violation of some assumptions in this model, quite possibly neutrality or isolation of species. Also, gene trees of some loci showed that alleles from the same species were not clustering together. Specifically, the GS locus shared haplotypes between these two species, which may be explained by introgression.
By combining single and multilocus tests of neutrality we can distinguish between processes affecting individual genes and forces influencing the entire genome. The multilocus m[H] test showed a significant or nearly significant deviation from the simple neutral model in A. halleri and A. l. petraea, respectively, and significant values for the sign test in both species. Although multilocus tests indicate deviation from neutral equilibrium expectations for the complete set of loci analyzed, the negative value of the H statistic for each locus clearly shows an excess of recent mutations at high frequency at every locus. This generalized excess suggests a multilocus effect that could be the result of demographic factors, independent selective sweeps, or introgression. The sign of the test statistic, the sampling strategy used in the tests (see MATERIALS AND METHODS), and the lack of correlation between geographic and genetic distance (results of Mantel tests not shown) exclude population subdivision as the main cause of the detected deviation. It is also highly unlikely that the multilocus deviation was due to positive selection since it would require independent selective sweeps at a majority of the loci studied. Population reduction could also explain a negative tendency of the Fay and Wu H values across the genome (results not shown). If this were the case, this process should be also observed in positive Tajima's D and Fu and Li's D and F values, which is not observed (Table 9). Therefore, we also exclude that hypothesis. Finally, recurrent mutation also cannot explain the observed data, as we included that process in coalescent simulations (see RESULTS). Thus, the most parsimonious explanation of the significant results of the multilocus H test in the closely related species A. halleri and A. l. petraea is reacquisition of ancestral mutations by introgression (![]()
Selection could also play an important role in the maintenance (or loss) of variation added by introgression. Indeed, the pattern of variation detected in A. halleri and A. l. petraea for some loci could be better explained by introgression (see above), despite that haplotype sharing was not generally observed. This observation, together with the detection in HKA tests of the same genes showing similar deviations from neutral expectations in both species (Fig 2), might be explained more easily by selection sweeping (or maintaining) the additional introgressed variation in some loci and not in others than by the stochasticity associated with genetic drift.
A hybridization event is biologically plausible, as interspecific crosses between A. halleri (pollen) and A. l. petraea have resulted in interfertile F1 individuals (![]()
The existence of gene flow between species alters the pattern of both within- and between-species variation. Indeed, in a simple scenario with two species 1 and 2 that possess mutations only in classes Sx1, Sx2, Sf1, and Sf2 (Fig 1), asymmetrical gene flow from species 1 into species 2 would result in some mutations switching to another class. Some mutations would switch from class Sx1 to either class Ss or Sx2, some from class Sf1 to class Sf1x2, and some from class Sf2 to Sx2. There would be, therefore, a decrease of exclusive variation in species 1 (where fixed and polymorphic mutations would switch to the ancestral mutation class), and an increase in species 2 (by addition of polymorphic mutations). Therefore, one asymmetrical gene flow event between two species with similar effective population sizes (and thus levels of variation) would result in increased variability in the recipient species, a reduction in the number of exclusive polymorphisms in the donor species, and reduction of the estimated time since isolation. On the other hand, reciprocal gene flow would result in more ancestral mutations and fewer fixed mutations in both species, which would increase the apparent level of variation in the ancestral population and reduce the estimated time since isolation. This pattern is consistent with the results of the WH model for the A. halleri/A. l. petraea comparison, supporting, therefore, the hypothesis of introgression occurring after the split of these species. The inclusion of outgroup sequences in our study has dramatically increased the number of ancestral mutations detected (Table 6) and therefore the information regarding the ancestral history of the two species. Speciation models would greatly benefit both from the addition of an outgroup species and from considering migration subsequent to species divergence since this would improve the estimates of isolation time and levels of variation in the ancestral and extant species and may improve the power to detect introgression.
Levels of nucleotide variability in A. halleri and A. lyrata:
Genetic differentiation between geographic locations can be suggestive of (although not equivalent to) population subdivision (Table 3, Snn tests). In our study, significant genetic differentiation between populations is observed more frequently in A. l. petraea than in A. halleri, indicating that the former species (with a discontinuous distribution extending throughout North and Central Europe) has a higher level of population structure than A. halleri (distributed across Central Europe).
Several factors including migration rate between demes and number of demes might influence a species' effective population size when there is population substructure and might consequently affect the species-wide level of variation (e.g., ![]()
![]()
![]()
![]()
![]()
![]()
The low level of variability detected in A. l. lyrata, relative to A. l. petraea, might be explained by a bottleneck accompanying the relatively recent colonization event from the Old World (![]()
![]()
![]()
Considerations regarding the levels of variation in the outbreeding species A. halleri and A. lyrata relative to the inbreeding species A. thaliana:
Comparison of within-species levels of variation in outbreeding and inbreeding species may reveal the effect of mating system on intraspecific variation. If the assumptions of a Wright-Fisher model hold for the two species being compared (except for mating system in the selfing species), a twofold reduction in the effective population size is expected under complete selfing, and consequently the level of variation should be lower in the inbreeding than in the outbreeding species (![]()
![]()
![]()
![]()
We also developed a simple analytical expression of the expected ratio of species-wide variation in inbreeding vs. outbreeding species under the same hypothetical population structure in both species (Appendix). In the absence of population extinction (Appendix, Equation A3), the expected ratio is approximated by the expression (1/2M + 1)/(M + 1), where M is the population migration parameter 4Nm. A twofold reduction would thus be expected under high migration, with a milder reduction under low migration. On the other hand, the effective rate of recombination is greatly reduced in inbreeding species (![]()
![]()
![]()
![]()
![]()
We have not detected, however, the very drastic reduction of variation in A. thaliana that would be expected from differences in mating system, the effect of gene flow in the outbreeders' levels of variation, and the increase in hitchhiking and background selection effects resulting from effectively reduced recombination in the inbreeder. Indeed, A. thaliana has a worldwide distribution and exhibits a metapopulation structure (![]()
![]()
![]()
| FOOTNOTES |
|---|
Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos.
AJ619855,
AJ619856,
AJ619857,
AJ619858,
AJ619859,
AJ619860,
AJ619861,
AJ619862,
AJ619863,
AJ619864,
AJ619865,
AJ619866,
AJ619867,
AJ619868,
AJ619869,
AJ619870,
AJ619871,
AJ619872,
AJ619873,
AJ619874,
AJ619875,
AJ619876,
AJ619877,
AJ619878,
AJ619879,
AJ619880,
AJ619881,
AJ619882,
AJ619883,
AJ619884,
AJ619885,
AJ619886,
AJ619887,
AJ619888,
AJ619889,
AJ619890,
AJ619891,
AJ619892,
AJ619893,
AJ619894,
AJ619895,
AJ619896,
AJ619897,
AJ619898,
AJ619899,
AJ619900,
AJ619901,
AJ619902,
AJ619903,
AJ619904,
AJ619905,
AJ619906,
AJ619907,
AJ619908,
AJ619909,
AJ619910,
AJ619911,
AJ619912,
AJ619913,
AJ619914,
AJ619915,
AJ619916,
AJ619917,
AJ619918,
AJ619919,
AJ619920,
AJ619921,
AJ619922,
AJ619923,
AJ619924,
AJ619925,
AJ619926,
AJ619927,
AJ619928,
AJ619929,
AJ619930,
AJ619931,
AJ619932,
AJ619933,
AJ619934,
AJ619935,
AJ619936,
AJ619937,
AJ619938,
AJ619939 and
AJ582819,
AJ582820,
AJ582821,
AJ582822,
AJ582823,
AJ582824,
AJ582825,
AJ582826,
AJ582827,
AJ582828,
AJ582829,
AJ582830,
AJ582831,
AJ582832,
AJ582833,
AJ582834,
AJ582835,
AJ582836,
AJ582837,
AJ582838,
AJ582839,
AJ582840,
AJ582841,
AJ582842,
AJ582843,
AJ582844,
AJ582845,
AJ582846,
AJ582847,
AJ582848,
AJ582849,
AJ582850,
AJ582851,
AJ582852,
AJ582853,
AJ582854,
AJ582855,
AJ582856,
AJ582857,
AJ582858,
AJ582859,
AJ582860,
AJ582861,
AJ582862,
AJ582863,
AJ582864,
AJ582865,
AJ582866,
AJ582867,
AJ582868,
AJ582869,
AJ582870,
AJ582871,
AJ582872,
AJ582873,
AJ582874,
AJ582875,
AJ582876,
AJ582877,
AJ582878,
AJ582879,
AJ582880,
AJ582881,
AJ582882,
AJ582883,
AJ582884,
AJ582885,
AJ582886,
AJ582887,
AJ582888,
AJ582889,
AJ582890,
AJ582891,
AJ582892,
AJ582893,
AJ582894,
AJ582895,
AJ582896,
AJ582897,
AJ582898,
AJ582899,
AJ582900,
AJ582901,
AJ582902,
AJ582903,
AJ582904,
AJ582905,
AJ582906,
AJ582907,
AJ582908,
AJ582909,
AJ582910. ![]()
1 These authors contributed equally to this article. ![]()
2 Present address: Department of Genetics and Evolution, Max Planck Institute of Chemical Ecology, 07745 Jena, Germany. ![]()
3 Present address: Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, 08028 Barcelona, Spain. ![]()
| ACKNOWLEDGMENTS |
|---|
We thank J. Wakeley and W. Stephan for analytical suggestions, M. J. Clauss for helpful comments on the manuscript, H. Kuittinen and P. Saumitou-Laprade for providing several A. halleri and A. l. petraea DNA samples, and J. Kroymann for MAM-L amplification primers and unpublished A. thaliana MAM-L sequences. S.R.-O. and M.A. also thank G. Blasco for her excellent technical assistance and Serveis Científico-Tècnics, Universitat de Barcelona for sequencing facilities. B.S. and T.M.-O. thank D. Schnabelrauch and A. Figuth for outstanding sequencing services and M. Voigt and J. Fritsche for excellent technical assistance. This work was supported by the Bundesministerium für Bildung und Forschung, the Max-Planck Gesellschaft, European Unio


2 test statistic caused by polymorphism levels at each locus; solid symbols represent contributions caused by divergence. Points above the line indicate deviation in the direction of an excess of polymorphism or an excess of divergence relative to the neutral expectation. Likewise, points below the line indicate deviation in the direction of too little polymorphism or too little divergence relative to expectation. (A) A. halleri; (B) A. l. petraea; (C) A. l. lyrata.