Admixture between genetically divergent populations facilitates genomic studies of the mechanisms involved in adaptation, reproductive isolation, and speciation, including mapping of the loci involved in these phenomena. Little is known about how pre- and postzygotic barriers will affect the prospects of “admixture mapping” in wild species. We have studied 93 mapped genetic markers (microsatellites, indels, and sequence polymorphisms, ∼60,000 data points) to address this topic in hybrid zones of Populus alba and P. tremula, two widespread, ecologically important forest trees. Using genotype and linkage information and recently developed analytical tools we show that (1) reproductive isolation between these species is much stronger than previously assumed but this cannot prevent the introgression of neutral or advantageous alleles, (2) unexpected genotypic gaps exist between recombinant hybrids and their parental taxa, (3) these conspicuous genotypic patterns are due to assortative mating and strong postzygotic barriers, rather than recent population history. We discuss possible evolutionary trajectories of hybrid lineages between these species and outline strategies for admixture mapping in hybrid zones between highly divergent populations. Datasets such as this one are still rare in studies of natural hybrid zones but should soon become more common as high throughput genotyping and resequencing become feasible in nonmodel species.
ADMIXTURE or hybrid zones between genetically divergent populations are increasingly being explored for their use in studies of adaptation, reproductive isolation, and speciation (Rieseberg et al. 1999; Martinsen et al. 2001; Wu 2001; Vines et al. 2003; Payseur et al. 2004; reviewed by Coyne and Orr 2004), especially for their potential in identifying recombinants for gene mapping (otherwise known as “admixture mapping”; Chakraborty and Weiss 1988; Briscoe et al. 1994; Rieseberg et al. 1999; Reich et al. 2005; Slate 2005; Zhu et al. 2005; Lexer et al. 2007; Nolte et al. 2009). In many taxa of animals and plants, recombinants are created by admixture between divergent populations or species in hybrid zones or ecotones (Buerkle and Lexer 2008; Gompert and Buerkle 2009). The growing interest of evolutionary geneticists in admixture has its roots in both basic evolutionary genetics and breeding.
With respect to evolutionary genetics, admixed populations have been viewed as important resources for studying the genetics of adaptation and speciation, since the discovery that by fitting geographical clines of allele frequencies across hybrid zones, the strength of intrinsic and extrinsic (ecological) barriers to gene flow can be estimated (Barton and Hewitt 1985; Barton and Gale 1993). More recently, the genomics era has taken these concepts to a new level by providing genetic or physical genome maps for many species so that clines or introgression patterns of individual loci can be compared to their genomic background (see below; Falush et al. 2003; Gompert and Buerkle 2009). Thus, hybrid zones permit the identification and study of quantitative trait loci (QTL), genes, or other genetic elements involved in reproductive isolation and speciation in situ, directly in natural populations, if sufficient genetic recombination has occurred (Rieseberg and Buerkle 2002). In applied genetics, studies of hybrid zones yield information on the genomic architecture of barriers to introgression, which is of great interest to breeders concerned with the establishment of pedigrees for tree selection and domestication (Stettler et al. 1996).
Most animal or plant hybrid zones studied to date involve hybridization between parental populations that are much more divergent than the admixed human populations that have been used successfully for gene mapping in human medical genetics (e.g., Reich et al. 2005; Zhu et al. 2005). Little experience exists with interpreting genomic patterns of ancestry and admixture in such highly divergent, nonhuman populations. Early genomic work on hybrid zones, based on dominant genetic markers, suggested the feasibility of mapping genome regions involved in reproductive isolation and speciation (Rieseberg et al. 1999; Rogers et al. 2001), but these studies did not allow tests for selection on genotypes at single loci in different genomic backgrounds. This became possible only recently due to the development of novel analytical tools suited to large numbers of codominant markers, especially linkage models of Bayesian admixture analysis (Falush et al. 2003, 2007) and methods to fit “genomic clines” of codominant marker genotypes across complete genomic admixture gradients (Lexer et al. 2007; Gompert and Buerkle 2009; Nolte et al. 2009; Teeter et al. 2010). Great advances also have been made in interpreting single-locus estimates of genetic divergence between populations and species (Beaumont 2005; Foll and Gaggiotti 2008; Excoffier et al. 2009a). Here, we bring these approaches together to yield novel insights into genomic patterns of reproductive isolation and mating in hybrid zones of two widespread and important members of the “model tree” genus Populus. Our goal was to infer patterns of reproductive isolation and the likely evolutionary trajectories of hybrid populations and to develop strategies for genetic mapping in admixed populations.
Populus alba (white poplar) and P. tremula (European aspen) are ecologically divergent (floodplain vs. upland habitat) hybridizing tree species related to P. trichocarpa, the first completely sequenced forest tree (Tuskan et al. 2006). The two species are highly differentiated for neutral DNA-based markers (Lexer et al. 2007) and numerous phenotypic and ecological traits (Lexer et al. 2009). Mosaic hybrid zones between these species often form in riparian habitats (Lexer et al. 2005; hybrids sometimes referred to as P. × canescens) and have been proposed as potential “mapping populations” for identifying QTL and genes of interest in evolutionary biology (Lexer et al. 2007; Buerkle and Lexer 2008) and breeding (Fossati et al. 2004; Lexer et al. 2004). Previous studies of these hybrid zones were conducted with a relatively small number of genetic markers and without making use of linkage information; the genomic composition of hybrid zones between these species has never been studied with a genomewide panel of codominant markers with known linkage relationships. Specifically, we address the following questions in this contribution:
(1) What does an analysis of admixture and differentiation based on a genome-wide panel of mapped markers tell us about patterns of reproductive isolation and mating in hybrid zones of European Populus species? (2) What are the likely roles of pre- and postzygotic barriers vs. recent, localized historical factors in generating the observed genomic patterns? (3) What are the practical implications for admixture mapping in hybrid zones between highly divergent populations? We showcase where the genetic peculiarities of hybrid zones will limit their use for gene mapping and where they suggest new approaches that were perhaps not foreseen by geneticists with a focus on human medical applications.
MATERIALS AND METHODS
Sampling of admixture zones:
Three interspecific “mosaic” hybrid zones and adjacent parental populations of P. alba and P. tremula were sampled. These are defined by the three river drainage systems in which they are situated: Danube (Northeastern Austria), Ticino (Northwestern Italy), and Tisza (Northeastern Hungary) (Table 1). The three hybrid or admixture zones were previously discovered with the help of genetic markers (Bartha 1991; Fossati et al. 2004; Lexer et al. 2005). The main focus of this article is on the Danube hybrid zone; the Ticino and Tisza populations were used as additional “replicates” to check whether genotypic patterns observed for the Danube could be generalized to other localities.
In each zone, widely spaced trees were sampled with a minimum distance of 50 m to avoid sampling of asexually derived ramets, based on results of a previous analysis of fine-scale spatial structure (van Loo et al. 2008). Sampling was carried out without regard to morphology, and emphasis was placed on broad geographic coverage within each river valley (Table 1); sampling transects is not feasible in mosaic hybrid zones and was not required for the statistical analyses employed in this study (Gompert and Buerkle 2009). Parental reference populations were sampled adjacent to each hybrid zone in lowland forest (P. alba) and upland habitat (P. tremula), with widely spaced individuals as described by Lexer et al. (2005).
Within-genome sampling and laboratory analyses:
Ninety-three polymerase chain reaction (PCR) or DNA sequence-based molecular genetic markers were typed in the Danube hybrid zone and its parental reference populations. These included 83 microsatellites, eight sequence-based markers, and two insertion–deletion polymorphisms (supporting information, Table S1). Seventy-six of the markers were developed from genomic libraries or directly from the P. trichocarpa genome assembly v.1 (http://www.ornl.gov/sci/ipgc; van der Schoot et al. 2000; Smulders et al. 2001; Tuskan et al. 2004), and 17 were developed from expressed sequence tags (ESTs) (Joseph and Lexer 2008; De Carvalho et al. 2010). The markers were distributed across all 19 chromosomes of the Populus genome with an average of 5 ± 1 (SE) markers per linkage group (Table S1). Different sets of markers were used at various stages of the work, depending on the requirements of each statistical method (see below).
To facilitate marker genotyping, genomic DNA was extracted from silica-dried leaves with the Dneasy plant mini kit (QIAGEN). Subsequently, all microsatellites and indel polymorphisms were PCR amplified using the protocols described previously by Lexer et al. (2005) and precisely sized using an Applied Biosystems (ABI) 3100 Genetic Analyzer and associated fluorescent dyes and software. All sequence-based polymorphisms were examined by PCR amplification and direct sequencing, using a Biomek NX S8 Laboratory Automated Workstation (Beckman Coulter) and an ABI 3730 Genetic Analyzer with accompanying chemistry, following precisely the protocols of Joseph and Lexer (2008).
Since European hybrid zones of P. alba and P. tremula were characterized for their genetic diversity and structure elsewhere (Fossati et al. 2004; Lexer et al. 2005), descriptive population genetic analyses were kept to a minimum here: populations were characterized for their gene diversity equivalent to expected (HE) heterozygosity, observed (HO) heterozygosity (for fully codominant markers), inbreeding coefficient (FIS) and allelic richness corrected for sample size by rarefaction, and the following measures of genetic divergence: FST, Hedrick's (2005) G′ST, and the allele frequency differential (δ). These calculations were carried out using the FSTAT software (Goudet 1995) or manually on the basis of the output of this program.
Bayesian admixture analysis with linkage model:
Bayesian admixture analysis of the focal hybrid zone (Danube/Austria) was carried out with Structure 2.2, using a linkage model (Falush et al. 2003) that uses genotype and linkage information for each marker to estimate admixture parameters. Admixture analysis can reveal information regarding patterns of reproductive isolation (RI) and mating in recent generations. For example, mating events involving F1's may be rare or absent (Milne et al. 2003), or F1's may mate frequently with one or both parental species (Minder et al. 2007). Alternatively, mating may happen preferentially among F1's or other early hybrid classes, potentially leading to partial or complete reproductive isolation between hybrids and their parents (Rieseberg et al. 2003; Jiggins et al. 2008). These aspects are of great interest in both speciation genetics and admixture mapping, and each scenario leaves a characteristic signature in admixture coefficients obtained by a linkage model.
Since 12 of the 80 microsatellites used at this step were scored as band present/absent (Table S1; dominant markers), we used the Structure module of Falush et al. (2007) that is able to resolve incomplete or ambiguous marker information. The analysis was carried out for a K = 2 gene pool model based on previous results for this hybrid zone (Lexer et al. 2005; 2007) using a burn-in of 50,000 and 100,000 subsequent iterations, assuming unphased genotypic data. Then, the site-by-site output of Structure 2.2 was used to estimate the probabilities of interspecific heterozygosity or homozygosity for each locus in each individual. With unphased data, Structure estimates two probabilities informative regarding heterozygosity (ss2 and ss3; assignment probabilities of each allele at an unphased locus in a diploid individual), and two probabilities informative regarding homozygosity (ss1 and ss4; probabilities of assignment of both alleles of a locus to one or the other parental species). The following linear combinations of these probabilities were used to plot and explore hetero- and homozygosity in the Danube hybrid zone: (ss2 + ss3)−(ss1 + ss4), which quantifies evidence for interspecific heterozygosity for each locus and individual, with values ranging from −1 to +1; and ss1-ss4, which summarizes the evidence for interspecific homozgosity for alleles from P. tremula (−1) vs. P. alba (+1).
Effect of null alleles:
Allele nonamplification is a potential source of error when interpreting homo- and heterozygosity of PCR-based markers. Commonly used methods for estimating null allele frequencies are not easily applicable to hybrid zones. Thus, we assessed the role of null alleles by comparing interspecific heterozygosities from Structure to the actual marker heterozygosities in the raw data. Interspecific heterozygosity as examined by Structure refers to the ancestry of alleles in each individual, not to actual marker heterozygosities. For example, an individual may present a heterozygous marker genotype at a focal locus but may nevertheless be classified as an interspecific homozygote by Structure, because both alleles of that locus are derived from the same parental species. This scenario should be encountered fairly frequently for highly variable microsatellites. It can be used to rule out null alleles as the cause of interspecific homozygosity for the individuals in question.
Genome scan for interspecific divergence:
Admixture analysis can only reveal the effects of very recent gene flow across hybrid zones because admixture proportions Q from the donor species will decrease by approximately one-half with each new backcross generation. Thus, gene flow further back in the past is better captured by estimating divergence between hybridizing populations. A genome scan for interspecific divergence was carried out for the two parental populations of P. alba and P. tremula sampled adjacent to the focal (Danube) hybrid zone, using all 93 marker loci (Table S1). Interspecific divergence was estimated using FST and Hedrick's (2005) G′ST. Wright's FST offers the advantage that plenty of experience is available for it (including potentials and pitfalls; Jost 2008) and that it can be estimated for different kinds of markers within an analysis-of-variance framework (Weir and Cockerham 1984). Hedrick's (2005) G′ST, on the other hand, is useful because it takes within-population heterozygosity into account. Thus, a comparative analysis of FST and G′ST for the same set of loci should be informative regarding the amount of interspecific differentiation at each locus considering its within-population diversity.
To complete the interspecific divergence analysis, a standard neutrality test was carried out that compares FST to neutral coalescent simulations, using the frequentist approach of Beaumont and Nichols (1996), also discussed by Beaumont and Balding (2004). This test is often used to detect candidate loci for divergent selection, i.e., loci that are more divergent between two populations or species than expected under neutrality (Beaumont 2005). Many aspects of population demography can affect the outcome of this type of test, e.g., hierarchical population structure (Excoffier et al. 2009a), thus highly divergent “outlier” loci from FST-based tests need not necessarily indicate the action of selection (Foll and Gaggiotti 2008; Excoffier et al. 2009a). Nevertheless, the test allowed us to compare the pattern of divergence between our study species to published work on other hybridizing taxa. It was carried out with two rounds of simulations, one to obtain “neutral” FST and a second round to identify outliers. This was done for two datasets, one including all 93 loci genotyped for the Danube hybrid zone, including microsatellites, indels, and sequence polymorphisms and the other one comprising microsatellites only. Since the distributions were very similar in shape, only the results of the full analysis are discussed below.
Heterozygosity vs. hybrid index in observed and simulated data:
Our interest in speciation genetics and admixture mapping led us to explore patterns of reproductive isolation (RI) and mating in the Danube hybrid zone more deeply. This was achieved by examining interspecific heterozygosity vs. genomic admixture for each individual, using the 68 fully codominant microsatellites typed for the Danube hybrid zone (Table S3). We were also interested in checking the generality of our results, and thus we analyzed two additional replicate hybrid zones, Ticino (Italy) and Tisza (Hungary), for 18 widely spaced, fully codominant microsatellites (Table S3). Our rationale was as follows: if patterns of RI and mating are primarily determined by ecological differences among localities or by local history (e.g., human influence), then we may expect genotypic patterns to vary across localities. Conversely, if RI and mating are determined primarily by intrinsic organismal features shared by all populations, then we would expect genotypic patterns to be similar across localities.
For all three zones, microsatellite alleles were first combined into two allelic classes (converted into biallelic markers) with frequency differentials between species equal to the observed differential based on all alleles, following Gompert and Buerkle (2009). These biallelic data were then used to estimate interclass heterozygosity, which corresponds to a measure of interspecific heterozygosity, similar to but simpler than that obtained by Structure (above). This was plotted against genomic admixture estimated by a maximum likelihood hybrid index, all calculations carried out with the R package Introgress (Gompert and Buerkle 2010). This two-dimensional representation of genomic composition for each individual (heterozygosity vs. hybrid index) was then compared to simulated data for parental genotypes and F1's. Our rationale was to check whether observed, genetically intermediate hybrid genotypes in nature were more compatible with the simulated F1's (maximum heterozygosity with intermediate hybrid index) or with recombinant hybrids of subsequent generations (reduced interspecific heterozygosity compared to simulated F1's). In simulations, 500 individuals were sampled from the allele frequency distributions for each locus for each genotypic class (parentals and F1's). Note that this analysis follows similar principles to the method by Anderson and Thompson (2002) implemented in their NewHybrids software. Compared to their approach, ours makes fewer assumptions, because we do not attempt to assign hybrids to artificial categories such as F1, F2, backcross (BC), or “parental-like.”
Tests for epistatic interactions:
The presence of genetic discontinuities between recombinant hybrids and their parental species (steep clines despite fertile F1's) can be explained by several competing hypotheses, some of which can be addressed by available data for these species. A plausible hypothesis for which no data are currently available involves epistasis, i.e., negative genomic interactions among loci (Gavrilets 1997). Thus we used a recently developed method (Teeter et al. 2010) to test for epistatic interactions in the Danube hybrid zone, the focal population with the best genomic coverage. To keep the number of pairwise tests manageable, the analysis was restricted to those 29 fully codominant microsatellites with the greatest differentiation between the parental species (δ ≥ 0.8).
In a first step, single locus genomic clines of genotype frequencies at each locus were fitted against genomic admixture gradients (i.e., against the genetic background) using multinomial logistic regression, as in Lexer et al. (2007) but using the methodological improvements by Gompert and Buerkle (2009). Comparisons of observed clines against simulated neutral expectations allow the detection of under- or overrepresentation of interspecific homo- and heterozygotes, respectively (Gompert and Buerkle 2009). However, those were not at the center of attention here. Instead, two-locus interactions were explored by adding a second predictor locus to each regression model, and the two models for each focal locus (with or without the predictor) were compared using the Aikake information criterion (AIC = −2 L + 2p, where L is the log-likelihood of the model and p is the number of parameters in the model). Since smaller AICs represent higher likelihoods and better fits, ΔAIC was calculated as AICbasic model − AICwith predictor, so that large positive values of ΔAIC indicate loci for which the model fit is greatly improved by including the predictor locus, consistent with epistatic interactions. Since differences in AIC are used for interpretation of the results, rather than formal statistical tests of null hypotheses, there is no adjustment for multiple tests. The results can be interpreted on the basis of the magnitudes of ΔAIC for each pairwise locus combination. Particular pairwise interactions may be studied further in the future, on the basis of candidate gene information, the magnitude of ΔAIC, or other analyses.
Use of different types of genetic markers in statistical analyses:
All 93 marker loci were used for estimating interspecific genomic divergence (FST and G′ST) between P. alba and P. tremula in the focal hybrid zone (Danube/Austria; Table S1), and a subset of 80 microsatellites was used for Bayesian admixture analysis in Structure 2.2 (Table S2). These 80 loci comprised all those for which information on genomic location was regarded adequate for analysis on the basis of a linkage model. The 80 loci included 12 microsatellites for which allele calling was difficult and that were thus scored conservatively via the presence/absence of a well-defined focal band. The 68 remaining, easily scorable and fully codominant microsatellites were used for analyzing heterozygosity vs. hybrid index and for epistasis tests in the Danube population. Out of these, 18 widely spaced, codominant markers were used to assess the generality of the results using the replicate localities Ticino (Italy) and Tisza (Hungary) (Table S3).
Genetic diversity in admixture zones:
All three hybrid zones exhibited increased diversity compared to the parental species, measured in the form of expected (HE) and observed (HO) heterozygosity and allelic richness (Table 1). Inbreeding coefficients (FIS) were significantly positive, consistent with previously observed Wahlund effects arising from wide local sampling of the parental reference populations (De Carvalho et al. 2010) and widespread departures from random mating due to assortative mating in hybrid zones (van Loo et al. 2008).
Patterns of genomic ancestry inferred by linkage-based Bayesian analysis:
Bayesian-based admixture analysis based on 80 mapped microsatellites indicated an admixed ancestry for 31 (16%) individuals (0.05 < Q < 0.95) out of 192 sampled from the Danube hybrid zone of P. alba and P. tremula. Locus-by-locus analysis of these intermediate genotypes revealed “genomic mosaics” consisting of heterozygous loci and loci homozygous for alleles derived from one or another of the parental species (Figure 1, A and B), as expected for advanced recombinant hybrids derived from crosses of F1's. Pronounced gaps in the genotype distribution were uncovered between these recombinant hybrids and the two parental species (Figure 1, A and B; genotype data in File S1). The results are unlikely to be systematically biased by null alleles, as on average 44% of “interspecific homozygotes” classified by Structure were indeed true heterozygous marker genotypes in the raw data. Thus, classification as interspecific homozygotes at most loci was due to allelic ancestry rather than null alleles mimicking homozygosity.
Genome scan for species divergence:
Between-locus variation for genetic divergence (FST) between the parental populations of the Danube hybrid zone was great (0.369 ± 0.289 SD), and the same was true for G′ST, which takes within-population variation into account (0.632 ± 0.354 SD) (Figure 2). At the 99% level, 33 loci (35%) were more divergent than expected under a neutral model following Beaumont and nichols (1996). This number is not readily interpretable in terms of divergent natural selection maintaining the species barrier, but it facilitates comparisons to recent studies of other groups of hybridizing species. The large proportion of loci with low interspecific divergence (low FST or G′ST; lower left corner of Figure 2) indicates extensive allele sharing, which is most plausibly attributed to past gene flow between these ecologically divergent, parapatric species (discussed below).
Genomic composition of replicate hybrid zones:
Analysis of interspecific heterozygosity vs. hybrid index in the focal (Danube/Austria) and two additional replicate hybrid zones of the same two species (Ticino/Italy and Tisza/Hungary) revealed a similar genomic composition of all three hybrid populations (Figure 3; genotype data in File S2 and File S3). In all three localities, most hybrids had intermediate hybrid index (horizontal axis) with interspecific heterozygosities (vertical axis) that fell below the 95% confidence intervals of simulated F1's (Figure 3). The increased variance in the observed and simulated data for the Ticino and Tisza populations stems from the lower number of loci genotyped for these localities (N = 18 loci) compared to the Danube (N = 68 codominantly scored loci in this analysis; see materials and methods). The reduced interspecific heterozygosity in hybrids indicates their recombinant nature, beyond the F1 generation, as also visible in the linkage-based Structure analysis for the Danube population (above; Figure 1) and from initial inspection of the raw data.
Conspicuous gaps in the genotype distributions between hybrids and parental species were present in all three hybrid zones (Figure 3), which was not expected from previous studies of these species and hybrids with smaller genomic coverage and less powerful analytical tools (discussed below). In each locality, many advanced generation introgressants had a hybrid index that was distinguishable from the simulated “pure” parental populations (hybrid index further away from 0.0 or 1.0 than expected of parental genotypes in the simulations). This is consistent with the locus-to-locus variation in marker ancestry seen in the Structure graph (Figure 1) and with the high proportion of weakly differentiated loci detected by FST and G′ST (Figure 2).
Tests for epistatic interactions:
Tests for epistatic interactions were informative regarding the possible origins of the genomic patterns seen in hybrid zones of P. alba and P. tremula (Figures 1 and 3). Among 29 loci with allele frequency differentials >0.8 in the Danube hybrid zone, a high proportion showed evidence for two-locus interactions (Figure 4). This is visible from the great number of two-locus comparisons for which an epistasis model for predicting the genotypes at a focal locus was strongly preferred over a simple one-locus model (see materials and methods for the details to these models). The results indicate that epistasis needs to be considered explicitly when discussing the origin of genotypic discontinuities between these two ecologically divergent forest trees despite the presence of fertile F1's. In this study we were interested primarily in interactions between loci, not in individual genomic clines. Nevertheless, basic cline parameters for all loci are shown in Table 2 and Table S3.
Despite a longstanding interest of evolutionary geneticists in using hybrid zones as natural laboratories for studying the evolutionary process (Barton and Hewitt 1985; Barton and Gale 1993; Rieseberg et al. 1999; Martinsen et al. 2001; Wu 2001; Vines et al. 2003; Coyne and Orr 2004; Payseur et al. 2004; Minder et al. 2007; Buerkle and Lexer 2008), there is little experience in the interpretation of genomic patterns of admixture and ancestry in terms of reproductive isolation and speciation (but see Payseur et al. 2004; Lexer et al. 2007; Nolte et al. 2009; Teeter et al. 2010). The results of our study allow us to address this topic and to comment on the utility of gene mapping in hybrid zones of wild species in animals and plants.
Genomic patterns of ancestry and admixture:
Analysis of a genome-wide panel of mapped markers (Figures 1–4⇑⇑⇑) yields a different picture of patterns of reproductive isolation and mating in hybrid zones of P. alba and P. tremula compared to previous work based on much smaller numbers of codominant markers and less powerful analytical tools (Rajora and Dancik 1992; Fossati et al. 2004; Lexer et al. 2005). Early phenotype- and isozyme-based studies suggested that hybrid zones between these species consisted primarily of backcrosses to one parental species, namely P. alba (Bartha 1991; Rajora and Dancik 1992). Previous DNA-based studies (Lexer et al. 2005, 2007) did not allow rejection of this hypothesis, most likely due to the large error associated with individual-based analysis of a small number of markers. Here, linkage-, FST-, and genomic cline-based analyses of a genome-wide panel of mapped markers reveal the presence and positioning (in genotypic space) of unexpectedly strong reproductive barriers between these species and hybrids and suggest hypotheses on their origin and future fate.
The Bayesian analysis of genotypic ancestry (using Structure) indicates that early generation hybrids detected in the Danube hybrid zone of P. alba and P. tremula (Table 1) are genetically intermediate, recombinant hybrid genotypes (a genomic mosaic, Via and West 2008) separated from their parental species by sharp genomic discontinuities (Figure 1), equivalent to steep genomic clines sensu Gompert and Buerkle (2009) and similar to steep clines seen in simulations of hybrid zones with strong postzygotic barriers (Gavrilets 1997). Note that most trees from the Danube hybrid zone were P. alba-like (very low admixture coefficients Q; see genotypic data in File S1); only genetically intermediate individuals are shown in Figure 1, ordered by their Qs. Blocks of LD along the chromosomes are clearly visible in these genetically intermediate hybrids on the basis of the similarity of colors (probability of homo- or heterozygosity) along the horizontal axes (Figure 1, A and B). Also, occasional introgression of particular genome segments or loci across the barrier in either direction can be inferred from Figure 1. These observations are mirrored by characteristic patterns of genomic differentiation visible in our FST- and G′ST-based genome scan (Figure 2).
The genomic scan for pairwise species differentiation (Figure 2) is informative regarding gene flow in the more distant past—on the order of dozens or hundreds of generations (Whitlock 1992). Our pairwise genome scan reveals many loci with weak interspecific differentiation (FST or G′ST; Figure 2), despite the fact that the proportion of highly differentiated loci (FST greater than expected in neutral coalescent simulations) was much greater in our study (35%) than in studies of a different pair of outcrossing forest trees (12% in Quercus; Scotti-Saintagne et al. 2004) and a pair of self-incompatible annual plants (5% in Helianthus; Yatabe et al. 2007). The numerous weakly differentiated loci in our genome scan (Figure 2; lower portion of graph) indicate occasional but persistent introgression over long time scales, despite strong barriers to gene exchange. This makes sense: we know that even strong barriers to gene flow cannot prevent the introgression of neutral or advantageous alleles (Pialek and Barton 1997; Faure et al. 2008), and emerging phylogeographic data for these species suggest repeated contact upon colonization from divergent lineages (Fussi et al. 2010).
Ancestral polymorphism is an unlikely explanation for allele sharing at the weakly differentiated loci seen in Figure 2. These species diverged at least several million years ago (Stettler et al. 1996), and P. alba and P. tremula are not even sister species in most molecular phylogenetic analyses of the genus (Hamzeh and Dayanandan 2004). Accordingly, allele frequency differentials between these taxa often reach values of 0.90 or higher (Table 2; Table S3). Also, intermediate genotypes in hybrid zones often carry blocks of neighboring markers from one or the other parental species (Figure 1), which is consistent with interspecific recombination but not ancestral polymorphism.
We did not wish to interpret the highly differentiated outlier loci in our genome scan as being under divergent selection (see Foll and Gaggiotti 2008 and Excoffier et al. 2009a for a discussion of these issues). Rather, our primary interest was in comparing the overall distribution of interspecific genomic differentiation to expectations from comparable pairs of sympatric, hybridizing taxa. Nevertheless, it is noteworthy that most EST-derived markers used in our study are on the diagonal in our FST/G′ST plot (Figure 2), in contrast to numerous other markers developed from the Populus genome sequence or genomic libraries (Figure 2; Table S1). This indicates that the EST-derived polymorphisms have attained maximum differentiation for their within-species heterozygosity (Hedrick 2005), which is consistent with their placement outside the 99% confidence interval of the neutral distribution of FST.
Repeatability of genomic patterns and mechanisms of reproductive isolation:
Genomic discontinuities in hybrid zones of the type seen in Figure 1 are highly informative regarding the genetics of species boundaries and can be explained by several competing hypotheses: (1) assortative mating, an important prezygotic barrier in both animals and plants (Coyne and Orr 2004), easily mediated by differences in flowering time in the case of plants; (2) strong postzygotic barriers affecting hybrid generations beyond the F1, i.e., nuclear genetic incompatibilities caused by Dobzhansky–Bateson–Muller (DBM)-type epistatic selection (Gavrilets 1997); and (3) localized factors affecting only particular populations, e.g., natural or human-mediated disturbance in a particular locality may have triggered a recent “wave” of hybridization, mimicking a steep cline. Our data allow us to address each of these hypotheses. Additional alternative hypotheses include postmating barriers involving uniparentally inherited genetic factors, but these will result in strongly asymmetric intrinsic barriers (Turelli and Moyle 2007) for which there is no evidence in these species to date (Lexer et al. 2005).
To rule out recent local disturbance (hypothesis 3), we studied two replicate hybrid zones (Tisza and Ticino; Table 1) for 18 widely spaced codominant microsatellites. Our comparative analysis of interspecific heterozygosity vs. hybrid genomic composition in these populations (Figure 3) indicates similar patterns of reproductive isolation in all three localities: intermediate hybrids have reduced heterozygosity compared to simulated F1's, in agreement with their recombinant nature, backcrosses are infrequent (although possible in both directions), and many parental-like genotypes exhibit hybrid indices (horizontal axes) that suggest low levels of introgression in the distant past. These observations are consistent with those from Structure-based ancestry analysis (Figure 1) and the FST-based genome scan (Figure 2). More importantly, the repeatability of the results strongly contradicts the third hypothesis: these three river drainages have very different postglacial histories (timing of recolonization and onset of human disturbance); thus it is unlikely that recent events would have affected all three of them in identical ways.
So, are genomic discontinuities in hybrid zones of these hybridizing forest trees caused by prezygotic isolation (assortative mating) or by postzygotic barriers? The two parental species differ in their peak timing of flowering by up to several weeks (Fischer et al. 2005) and appreciable genetic variation for phenological traits exists even at the within-species level (Hall et al. 2007). Thus assortative mating due to phenological differences is plausible. A study of fine-scale spatial genetic structure (SGS) of the Danube hybrid zone is consistent with this hypothesis: SGS was much stronger in hybrids compared to parental-like genotypes, even after correcting for differences in the propensity of hybrids and parentals to form clones (Van Loo et al. 2008). Also, hybrids tended to form larger clones (more ramets extending over larger areas) than the parental species, which can easily lead to preferential mating between adjacent hybrid genotypes with similar phenology, i.e., to discontinuities in spatial and temporal patterns of flowering. A molecular marker-based study of progeny arrays from natural hybrids is currently underway to test the amount of assortative mating in these populations.
With respect to postzygotic barriers, our logistic regression-based analysis of the Danube hybrid zone indicates that epistasis is widespread (Figure 4). This is consistent with theoretical predictions under DBM-type incompatibilities among nuclear genes: strong epistatic selection will keep fit recombinants at low frequency, effectively maintaining the barrier and leading to steep clines despite the presence of fertile F1's (Gavrilets 1997), which would result in genotypic patterns such as those seen in Figures 1 and 3. It is thus likely that genomic patterns of admixture in hybrid zones of P. alba and P. tremula—in particular the conspicuous genomic discontinuities between hybrids and their parental species—are due to both assortative mating and DBM incompatibilities. Note that the presence of isolated adaptive peaks (e.g., strong ecological divergence between hybridizing species) may affect the dynamics and outcome of epistatic selection, compared to hybrid zones maintained by a simple balance of selection and migration (Gavrilets 1997).
Potential evolutionary trajectories of hybrid zones:
Several possible trajectories have been proposed for the evolution of hybrid populations between diploid taxa, including swamping of the species with the smaller effective population size (Ne) by gene flow from the more abundant species (Wolf et al. 2001), massive introgression from the earlier colonizing (“native”) into the later colonizing species (Excoffier et al. 2009b), homoploid hybrid speciation (Rieseberg et al. 2003), and the transfer of adaptive traits across “porous” species boundaries (Kim and Rieseberg 1999; Barton 2001; Martin et al. 2006; Whitney et al. 2006). The first two of these (genetic swamping and massive introgression of genes from the local species) appear unlikely here: P. alba is the species with the smaller effective population size Ne (Lexer et al. 2005) and phylogeographic data indicate that it recolonized Central Europe after P. tremula (Fussi et al. 2010); yet integrity of its gene pool appears to be protected by surprisingly strong reproductive barriers (Figure 1; Figure 3; Figure 4) and possibly by sufficient gene flow at the within-species level (Nem > 3.0; Lexer et al. 2005); intraspecific gene flow in the later colonizer is expected to prevent or slow down introgression and “surfing” of genes originating from the local species (Excoffier et al. 2009b).
The third trajectory, hybrid speciation, appears already more likely, since our data suggest the build-up of partial RI between intermediate hybrids and their parents (Figure 1; Figure 3; Lexer et al. 2009). The great ecological divergence between the hybridizing species (flood plain vs. upland habitats) is also in line with expectations for conditions prevailing at the onset of hybrid speciation (Buerkle et al. 2000; Nolte and Tautz 2010). Nevertheless, opportunities for interspecific gene flow still exist (Figures 1 and 3), and interspecific heterozygosities in hybrids are higher than expected, based on models of recombinational speciation (Buerkle et al. 2000); note that drift in isolated hybrid lineages tends to reduce interspecific heterozygosity. Hybrid speciation in European Populus spp. would require range shifts leading to stronger spatial separation of P. × canescens hybrids from both parental species or the presence of multiple genes with strong positive effects on hybrid fitness (“overdominance”; Buerkle et al. 2000). The latter pattern was found for 11 out of 68 loci (16%) in our regression-based genomic cline analysis of the Danube population, compared to 19% of loci with indications of underdominance (Table 2; underrepresentation of heterozygotes). On the basis of the data, P. × canescens may be on a trajectory to hybrid speciation, but genetic data on the depth of coalescences of gene sequences are required to test this further.
We regard the fourth scenario, adaptive introgression, as the most likely evolutionary consequence of hybridization between these species: our Structure-based ancestry analysis (Figure 1), genome scan of divergence between parental species (Figure 2), and two-dimensional analysis of heterozygosity vs. hybrid genomic composition (Figure 3) all indicate low but persistent (or recurrent) introgression into both parental species, consistent with the notion of the “porous genome” in speciation genetics (Wu 2001; Scotti-Saintagne et al. 2004; Minder et al. 2007). This suggests that the transfer of traits across porous species boundaries (Kim and Rieseberg 1999, Martin et al. 2006; Whitney et al. 2006) is possible in these ecologically important forest trees, and this is corroborated by the results of recent morphometric analyses of many phenotypic characters in these species and hybrids (Lexer et al. 2009). A combination of common garden experiments and genetic mapping are currently underway to test whether introgressed morphological and chemical traits are adaptive in the recipient species.
Admixture mapping in highly divergent populations:
Admixture mapping in hybrid zones has been suggested as a means of mapping loci involved in reproductive isolation and adaptive differentiation in the wild (Rieseberg and Buerkle 2002; Buerkle and Lexer 2008). The methodological approach is analogous to the mapping of disease gene loci in admixed human populations (Reich et al. 2005; Zhu et al. 2005). Our data allow us to comment on the prospects for genetic mapping in hybrid zones stemming from admixture between highly divergent lineages.
First, it is clear from Figure 3 that coarse-scale mapping of isolating factors in highly divergent populations with strong reproductive barriers (as often encountered in evolutionary genetics) will require the screening of multiple localities for recombinant, early-generation hybrids and subsequent pooling of these individuals to form a QTL mapping population of sufficient size. The same principle was used in Homo sapiens (Reich et al. 2005; Zhu et al. 2005), so this should not be a major obstacle in organisms with weak or absent population structure such as highly outcrossing, wind-pollinated forest trees. The linear models used for gene mapping in admixed populations are readily extended to accommodate the use of multiple populations and environments (Buerkle and Lexer 2008). Sample sizes of several hundred individuals will be required for admixture mapping in early recombinant hybrid generations, as in QTL mapping experiments with early generation intercrosses.
Perhaps of even greater relevance is the potential created by ultra-high throughput genotyping and resequencing: the unstoppable movement of neutral or advantageous alleles across porous genomes (this study; Barton 2001; Martin et al. 2006; Yatabe et al. 2007; Faure et al. 2008) indicates a great potential for the use of advanced introgressed populations (rather than hybrid zones with recent admixture) for high-resolution mapping using next generation sequencing approaches (Hohenlohe et al. 2010; Turner et al. 2010). Divergent, introgressed alleles from related species should readily be identified in highly advanced introgressants using the genomic coverage afforded by these methods, and appropriate neutrality tests (Nielsen 2005) facilitate the detection of fitness-related genes among those loci that do move across the barrier. This should allow the identification of functionally important heterospecific alleles in the process of spread through positive selection or neutral introgressed genes in the process of surfing on the colonization wave of rapidly expanding populations (Excoffier et al. 2009b). The marker coverage required at this step will depend on how much time (in terms of generations) has passed since the onset of admixture, and population size requirements will depend primarily on the desired threshold frequency for the detection of introgressed variants.
Clearly, admixture mapping projects in wild species are confronted with challenges not seen in admixed human populations used for gene mapping, but they also bring novel opportunities to unravel the genetics of reproductive isolation and to identify fitness-related genes that function well in divergent genomic backgrounds. Much remains to be learned about the evolutionary consequences of the origin and breakdown of reproductive barriers from genomic studies of admixture in wild species of animals and plants.
We thank Hans Herz, Wilfried Nebenführ, Stefano Gomarasca, István Asztalos, and other colleagues for help during field work; Robyn Cowan and Ludwika Sygnarski for help in the lab; Zach Gompert for help with data analysis; and Loren Rieseberg, Roger Butlin, and Sergey Gavrilets for helpful discussions. C.L.'s research on the evolutionary genomics of species barriers in Populus is supported by grant no. NE/E016731/1 of the U.K. Natural Environment Research Council and grant no. 31003A_127059 of the Swiss National Science Foundation.
Supporting information is available online at http://www.genetics.org/cgi/content/full/genetics.110.118828/DC1.
Communicating editor: M. Kirst
- Received May 13, 2010.
- Accepted July 21, 2010.
- Copyright © 2010 by the Genetics Society of America