Genealogy-based methods were used to estimate migration of the fungal virus Cryphonectria hypovirus 1 between vegetative compatibility types of the host fungus, Cryphonectria parasitica, as a means of estimating horizontal transmission within two host populations. Vegetative incompatibility is a self/non-self recognition system that inhibits virus transmission under laboratory conditions but its effect on transmission in nature has not been clearly demonstrated. Recombination within and among different loci in the virus genome restricted the genealogical analyses to haplotypes with common mutation and recombinational histories. The existence of recombination necessitated that we also use genealogical approaches that can take advantage of both the mutation and recombinational histories of the sample. Virus migration between populations was significantly restricted. In contrast, estimates of migration between vegetative compatibility types were relatively high within populations despite previous evidence that transmission in the laboratory was restricted. The discordance between laboratory estimates and migration estimates from natural populations highlights the challenges in estimating pathogen transmission rates. Genealogical analyses inferred migration patterns throughout the entire coalescent history of one viral region in natural populations and not just recent patterns of migration or laboratory transmission. This application of genealogical analyses provides markedly stronger inferences on overall transmission rates than laboratory estimates do.
GENEALOGY-BASED methods have been shown to enhance precision over FST-based estimators for estimating migration because they consider the temporal relationships of individuals in a population sample in addition to their frequency and spatial distributions (Hudsonet al. 1992b; Templeton 1998). This temporal component is important for detecting and separating migration from other population diversifying forces, such as recombination (Hudsonet al. 1992b) and isolation (Nielsen and Wakeley 2001). Coalescent-based simulations of genealogical relationships further enhance the precision of migration estimates by allowing for unequal population sizes and asymmetries in migration rates in the reconstruction of ancestral migration patterns (Beerli and Felsenstein 1999, 2001; Bahlo and Griffiths 2000; Nielsen and Wakeley 2001). Because most coalescent-based approaches assume neutrality and no recombination they are most powerful when used in conjunction with other genealogical methods that can (i) test the neutral mutation hypothesis and (ii) identify potential recombination events in the history of the sample (Carbone and Kohn 2001a). Although these analytical methods have proven to be powerful for making inferences on migration and effective population sizes on simulated data sets (Beerli and Felsenstein 1999; Nielsen and Wakeley 2001; Vitalis and Couvet 2001; Wakeley and Aliacar 2001), relatively few empirical examples have been tested (Congdonet al. 2000; Beerli and Felsenstein 2001; Carbone and Kohn 2001a; Heathet al. 2002; Kinnisonet al. 2002).
One potential application of migration analyses is to track the movement of specific genotypes within and between populations. A specific case in point is the estimation of transmission of pathogens between host individuals (horizontal transmission). Horizontal transmission is a pivotal parameter affecting the evolution of virulence (Bull 1994; Levin 1996) and the potential for pathogens to invade host populations (Anderson and May 1986). Information is available about horizontal transmission of RNA viruses in some host systems, but relatively little detail is known about horizontal virus transmission in filamentous fungi. Horizontal transmission of viruses in filamentous fungi occurs when the cells of different individuals fuse, allowing viruses to move from the cytoplasm of one individual to the other. However, horizontal transmission is restricted by vegetative or heterokaryon incompatibility (reviewed in Cortesiet al. 2001), a genetically controlled self/non-self recognition system that results in programmed cell death between incompatible genotypes (Biellaet al. 2002; Glass and Kaneko 2003). Cell death associated with vegetative incompatibility often, but not always, reduces horizontal transmission of viruses, plasmids, or other cytoplasmic elements (e.g., Caten 1972; Hartlet al. 1975; Griffithset al. 1990; Cortesiet al. 2001).
In the chestnut blight fungus, Cryphonectria parasitica, vegetative incompatibility has been hypothesized to be a major barrier to horizontal transmission and, therefore, a significant constraint on biological control of chestnut blight with viruses (Anagnostakiset al. 1986; Milgroom 1995; Liuet al. 2000). Vegetative incompatibility is controlled by alleles at six or more vegetative incompatibility (vic) loci in C. parasitica; multilocus vic genotypes are referred to as vegetative compatibility (vc) types (Cortesi and Milgroom 1998). Detailed laboratory studies have shown that each vic allele has a different quantitative effect reducing the probability of virus transmission (Cortesiet al. 2001). Although it is possible to estimate the average transmission between vic genotypes in the laboratory, these estimates do not necessarily reflect actual transmission under natural conditions. In nature, transmission may be affected by the number of contacts between individuals, length of time individuals are in contact, spatial heterogeneity of genotypes, etc. Therefore, estimating migration of viruses using genealogical methods offers an alternative means of estimating horizontal transmission between vic genotypes independent of laboratory estimates.
The primary objective in this research was to estimate migration of viruses between groups of vegetatively incompatible individuals (i.e., vic genotypes) in two populations of the host fungus, C. parasitica. A gene genealogical approach to estimating migration requires that we test assumptions concerning recombination and population subdivision. Therefore, our subsidiary objectives were to test for recombination in the virus genome and estimate virus migration between populations.
MATERIALS AND METHODS
Cryphonectria hypovirus 1 (CHV-1) is a double-stranded RNA (dsRNA) virus in the family Hypoviridae that infects the chestnut blight fungus, C. parasitica (Hillmanet al. 2000). Although classified as a dsRNA virus, the 12.7-kb RNA genome of CHV-1 (Figure 1) is more closely related to positive-sense RNA viruses and shares their essential replication properties (Kooninet al. 1991; Fahimaet al. 1993; Hillmanet al. 2000). Interest in this virus derives from its potential as a biological control agent for chestnut blight (Van Alfen et al. 1975; Nuss 1992). CHV-1 has been found extensively in Europe (Allemannet al. 1999); it has also been found in Japan and China (Peeveret al. 1998; Liuet al. 2003) and, more recently, in Korea (J. Lim,B. Cha and M. G. Milgroom, unpublished observations). CHV-1 was probably introduced into Europe with C. parasitica, which was introduced into North America and Europe from eastern Asia in the early 1900s (Milgroomet al. 1996).
Populations sampled: We sampled CHV-1 from two populations of C. parasitica: Teano from Caserta Province in southern Italy and Bergamo from Bergamo Province in northern Italy. Populations of C. parasitica in Italy are dominated by a few vic genotypes (or vc types; Cortesiet al. 1996; Milgroom and Cortesi 1999), making it possible to obtain multiple isolates of CHV-1 from the same vc types. In Teano (n = 194), 96% of the fungal isolates were in vc types EU-10 (19%) and EU-12 (77%; Liuet al. 1996; Milgroom and Cortesi 1999). In Bergamo (n = 158), 68% of the fungal isolates were in vc types EU-1 (33%) and EU-2 (35%; Cortesiet al. 1996). In contrast to fungal populations, we had no prior knowledge of diversity or population structure of CHV-1.
Virus sampling: To sample CHV-1, we screened C. parasitica isolates from Teano for dsRNA using a method described previously (Morriset al. 1983; Liuet al. 2003). C. parasitica isolates from Bergamo were screened for CHV-1 using an immunoblot method (Peeveret al. 2000). For both samples, the presence of CHV-1 was confirmed by dsRNA isolation in cellulose column chromatography (Morris and Dodds 1979; Peeveret al. 1997). In addition to CHV-1 sampled from vc types EU-1 and EU-2 in Bergamo, we included one isolate from each of vc types EU-6, EU-15, and EU-17, which were isolated for a different study (Peeveret al. 2000).
From each CHV-1 isolate, we obtained nucleotide sequences from three different regions of the genome (Figure 1): 471 bp from the 5′-noncoding region and 57 bp from the 5′ end of open reading frame (ORF) A (referred to as NC); 283 bp from the 3′ end of ORF A and the contiguous 481 bp from the 5′ end of ORF B (ORF A/B); and a 1096-bp segment approximately in the middle of ORF B. cDNA synthesis by reverse transcription was obtained using random hexamer primers as described (Hillmanet al. 1992; Chunget al. 1994) and the GeneAmp RNA PCR kit (Perkin-Elmer-Cetus, Norwalk, CT). First-strand cDNA was synthesized from dsRNA primed with random hexamer primers. Oligonucleotide primers for amplifying three regions of CHV-1 by PCR (Table 1) anneal to conserved sequences and were designed to amplify a fragment from CHV-1 and the related hypovirus CHV-2 (Hillmanet al. 2000). PCR was catalyzed with ExTaq polymerase (Mirus, Madison, WI) in the presence of 0.2 mm of each dNTP and 0.15 μm of the primers. cDNA template was denatured at 94° for 2 min. Thirty-three cycles of denaturation at 94° for 30 sec, annealing at 55° for 1.5 min, and polymerization at 72° for 1.5 min were conducted. Sequencing of both strands was done by the Cornell University Sequencing Facility or at Rutgers University.
Virus cDNA sequence analysis: The following steps in the analysis are outlined in the flowchart in Figure 2. The implementation of several programs in the flowchart was facilitated using a Java program (SNAP Workbench) that manages and coordinates a series of other programs (Price and Carbone 2003). SNAP Workbench was developed in the Carbone laboratory and is distributed over the Internet, http://www.cals.ncsu.edu/plantpath/faculty/carbone/home.html; additional programs in the SNAP (Suite of Nucleotide Analysis Programs) series are also accessible from this URL. An important functionality of our workbench software is the implementation of protocols consisting of a defined set of programs, assumptions, and parameter settings for following a particular path in the flowchart (e.g., see paths 1 and 2 in Figure 2). SNAP Workbench provided a single graphical user interface for performing the analyses described below.
To begin the analysis, viral cDNA sequences corresponding to the coding strand of the dsRNA were aligned using CLUS-TAL W version 1.7 (Thompsonet al. 1994). Multiple DNA sequence alignments were generated for the three sequenced regions; alignments were accessioned in GenBank (AY125727–AY125813). Sequence data from multiple loci were combined into one alignment file using SNAP Combine (Aylor and Carbone 2003). Using SNAP Map (Aylor and Carbone 2003) and SITES version 1.1 (Hey and Wakeley 1997) we collapsed sequences into unique haplotypes, recoding insertion or deletions (indels) and categorizing base substitutions as phylogenetically informative or uninformative, transitions vs. transversions, and replacement vs. synonymous amino acid changes in coding regions. A phylogenetic analysis with unweighted parsimony was performed using PAUP* 4.0 (Swofford 1998); if heuristic searches yielded two or more equally parsimonious trees, we inferred a strict consensus tree.
We generated site compatibility matrices for all subsets of haplotypes that share a common ancestor in the strict consensus tree using SNAP Clade (Markwordtet al. 2003). Compatibility methods examine the overall support or conflict among variable sites in a DNA sequence alignment and are useful for visual identification of partitions or blocks undergoing reticulate evolution (Jakobsenet al. 1997). In this article, we adapt compatibility methods for identifying reticulation events among groups of two or more haplotypes or clades in a phylogeny. Compatibility matrices were graphically illustrated using SNAP Matrix (Markwordtet al. 2003) and examined for the existence of two or more sites showing an identical pattern of compatibility/incompatibility. Clusters of two or more identical sites in a matrix define a recombination block. If recombination blocks were evident we followed two possible analysis paths, as illustrated in the flowchart (Figure 2). In path 1, we identified recombination block boundaries and the minimum number of recombination events in a clade using RecMin (Myers and Griffiths 2003). We used the putative recombination boundaries identified within each recombining clade from RecMin as a guide to identify the common or shared recombination block boundaries among all clades. We then collapsed the common nonrecombining blocks among all clades into haplotypes using SNAP Map. In path 2, we used RecPars (Hein 1990; http://www.daimi.au.dk/~compbio/recpars/recpars.html) to infer the number of recombinant segments (i.e., recombination blocks) in the combined multilocus data set. The best phylogeny inferred using RecPars is the most parsimonious reconstruction for all phylogenetically informative sites within an inferred recombinant block and represents a minimal path through the ancestral recombination graph (Song and Hein 2003). By comparing the Rec-Pars tree to the strict consensus tree we were able to reconstruct phylogenetic relationships among haplotypes that were previously unresolved in the strict consensus tree and determine the relative timing of recombination events in the history of the sample.
Virus tests of neutrality and population subdivision: For each population and nonrecombining block we estimated the population mutation rate, θ, from the number of segregating sites, s (Watterson 1975), and performed tests of neutrality using DNASP version 3.53 (Rozas and Rozas 1999). This provided a priori population parameter estimation and inferences on population growth patterns, i.e., equilibrium vs. non-equilibrium, and evolutionary forces, i.e., neutrality vs. selection. To test for population differentiation we used SNAP Map to generate the appropriate sequence file, Seqtomatrix (Hudsonet al. 1992a) to convert the sequence file to a distance matrix, and Permtest (Hudsonet al. 1992a) to test for geographic subdivision in samples of DNA sequences from two or more localities; this latter test was done using a nonparametric permutation method based on Monte Carlo simulations. Hudson’s test statistics (KST, KS, and KT) were calculated for each recombination block (path 1) and again for the entire data set (path 2). Testing the hypothesis of no genetic differentiation is an important first step and determines whether we assume a panmictic or a subdivided population model for examining population divergence using Genetree (path 1) and recombination parameter estimation using Recom58 (path 2).
Virus coalescent analysis: If we found evidence for geographic subdivision, the observed patterns of genetic divergence were further explored to determine whether there was evidence for constant (i.e., equilibrium) migration among populations or shared ancestral polymorphisms followed by population divergence but no migration. Two migration models were used for this analysis. First, MDIV was used to test for equilibrium migration vs. shared ancestral polymorphisms between two subdivided populations (Nielsen and Wakeley 2001). The approach of Nielsen and Wakeley (2001) uses either an infinite sites or a finite sites model without recombination and implements both likelihood and Bayesian methods using Markov chain Monte Carlo coalescent simulations for jointly estimating the population mean mutation rate (θ), divergence time (T), migration rate (M), and time since the most recent common ancestor (TMRCA) between two subdivided populations. This approach assumes that both populations descended from one panmictic population that may (or may not) have been followed by migration. Second, if MDIV showed evidence of equilibrium migration, MIGRATE was used to estimate migration rates assuming equilibrium migration rates (symmetrical or asymmetrical) in the ancestral history of populations (Beerli and Felsenstein 2001). Both MDIV and MIGRATE assume neutrality and no recombination and therefore these analyses must be restricted to neutral nonrecombining blocks in the data set (path 1). If we found evidence of equilibrium migration from our analyses in path 1, then we used this inference in path 2 (see Figure 2) to estimate migration and recombination rate parameters simultaneously using LAMARC (http://evolution.genetics.washington.edu/lamarc/lamarc_about.html). If we found evidence of population subdivision and no migration in path 1, then we reconstructed the ancestral history of the sample for each nonrecombining block using Genetree version 9.0 (http://www.stats.ox.ac.uk/~griff/software.html). There was no counterpart for the Genetree analysis in path 2 (see Figure 2). The genealogy with the highest root probability, the ages of mutations, and the TMRCA of the sample were estimated from coalescent simulations in path 1. We further identified the position (i.e., coalescent time) of mutations giving rise to replacement and synonymous amino acid changes for substitutions in coding regions.
Finally, we estimated the recombination parameter, ρ, within each population using Recom58 (Griffiths and Marjoram 1996), which assumes no population subdivision. The magnitude of the recombination parameter is a measure of the likelihood of recombination over mutation at a specific site. Although a more efficient method for estimating the joint likelihood of ρ and θ has been developed (Fearnhead and Donnelly 2001), we were able to greatly reduce the computational time of Recom58 by restricting our estimation of recombination rates within specific clades in the consensus tree. The novelty of our approach was in identifying clades with distinct recombination blocks and testing for population subdivision before estimating recombination parameters (path 2). We were able to corroborate our inference of recombination block boundaries (path 1) by estimating the average number of recombination events per site in the consensus sequence for each clade (path 2). Specifically, the sites with the most events were compared to the position of putative recombination boundaries identified using the site compatibility analysis, as described above.
All coalescent analyses were greatly facilitated using our workbench software. SNAP Workbench executes multiple, computationally intensive programs simultaneously on a single machine, Linux cluster, or supercomputer. This functionality was particularly important for programs using coalescent simulations (MIGRATE, LAMARC, MDIV, Recom58, and Genetree) because many independent coalescent runs were necessary to ensure convergence.
Vertical transmission of CHV-1 by sexual reproduction: Inferences about horizontal transmission from estimates of virus migration are based on the assumption that viruses cannot move between vc types except by horizontal transmission. An alternative hypothesis is that viruses are vertically transmitted from virus-infected parents to offspring with recombinant vic genotypes during sexual reproduction. Vertical transmission of CHV-1 in C. parasitica occurs through asexual spores (Russin and Shain 1985; Peeveret al. 2000) but these give rise to individuals with the same vic genotype as their parents. Vertical transmission reportedly does not occur through sexual spores (ascospores) of C. parasitica (Anagnostakis 1982; Elliston 1985), which are often of recombinant vic genotypes, but documentation of this is poor. Therefore, to test this hypothesis we set up seven crosses, using three CHV-1-infected isolates and six virus-free isolates as parents (Table 2). Crosses were performed on autoclaved chestnut stems and analyzed as described previously (Liu and Milgroom 1996). Four months after fertilization, we examined crosses for sexual fruiting bodies (perithecia) produced by each maternal parent. From each cross, we sampled 50–100 ascospore progeny from each of two or more perithecia. Progeny were sampled from all perithecia produced by CHV-1-infected maternal parents. Progeny isolates were assayed for CHV-1 by colony morphology by culturing on potato dextrose agar (PDA; Difco, Detroit); CHV-1-infected isolates of C. parasitica have characteristically less pigmentation when grown on PDA with low to moderate light (Hillmanet al. 1990). Ascospore isolates sampled from a cross between two CHV-1-infected parents were also screened for dsRNA as described above (Morriset al. 1983; Liuet al. 2003).
Virus cDNA sequence analysis: A total of 2331 nucleotides were sequenced from each of 12 and 17 isolates of CHV-1 in Teano and Bergamo, respectively. There were 14 variable sites in NC (Table 3), 51 in ORF A/B (Table 3), and 53 in ORF B (data not shown); CHV-1 isolates within each haplotype are shown in Table 4. Figure 3 shows unrooted cladograms of the combined NC, ORF A/B, and ORF B regions of CHV-1. A parsimony analysis with heuristic searching in PAUP (assumes no recombination) resulted in 10 equally parsimonious trees each with a consistency index of 0.8321. The unrooted strict consensus tree is shown in Figure 3a. The most parsimonious reconstruction of the same haplotypes using RecPars (assumes recombination) is shown in Figure 3b. RecPars found two recombinant segments; however, the boundaries of these regions did not coincide with the boundaries inferred using site compatibility matrices (Figure 4). Since RecPars does not allow for indels and SNAP Clade does not allow for variable positions that violate an infinite sites model, both indels and variable positions violating infinite sites were excluded in parsimony analyses. This was necessary for direct comparison of parsimony trees inferred with and without recombination (Figure 3) and site compatibility matrices inferred using SNAP Clade (Figure 4). Clades were inferred using SNAP Clade and are defined as groups of two or more haplotypes with a distinct pattern of mutation and recombination in their history as illustrated in the site compatibility matrices in Figure 4. Distinct recombination blocks were detected in clades A, C, and D (Figure 4) in the strict consensus. All isolates from Teano (clade A) share a unique history of recombination and the two recombination blocks are well supported, with 6 sites (positions 121, 130, 1363, 1678, 1996, and 2299) in one block, spanning the NC and ORF B regions, that are incompatible with 4 sites (584, 598, 695, and 1165) in the other block, spanning ORF A/B (Figure 4). There was no evidence of recombination within each of these blocks in the entire data set. The actual sizes of these recombination blocks cannot be determined without sequencing the intervening regions between the NC, ORF A/B, and ORF B regions (Figure 4). In contrast to Teano, at least two different recombination histories (four distinct recombination blocks) were inferred for the Bergamo isolates, with two recombination blocks in each of clades C and D (Figure 4). No homoplasy (i.e., no incompatibility) was detected in clades B and E and only these clades were monophyletic in both parsimony trees (Figure 3).
The uncertainty in the branching patterns in the unresolved polytomies (clades A and D) in the strict consensus tree (Figure 3a) arises from an inherent limitation of the parsimony method itself, which assumes that there is no recombination in the history of the sequences. Therefore, we used RecPars (Hein 1990) to find the most parsimonious reconstruction of our set of haplotypes using only base substitutions and inferred recombination events; RecPars cannot handle insertions or deletions. This analysis yielded a partial rooting of unrooted trees, which provided inference on ancestral recombination events in the sample. For example, the two largest unresolved polytomies observed in the strict consensus (clade D and a subset of the haplotypes in clade A) were further resolved when we assume recombination in the ancestral history of the sample (Figure 3b); clade A, composed of all isolates from Teano, was the inferred root of the sample. The best parsimony tree inferred by RecPars shows how different clades sharing common recombination blocks are related to each other. For example, when we assume recombination (Figure 3b) clades C and D are split up into two different histories. This is consistent with the compatibility analysis, which revealed two distinct blocks in the ORF B region in clade C (Figure 4).
The results from the SNAP Clade and RecMin were used to determine the nonrecombining sequence blocks for coalescent analysis (see Figure 4). Specifically, we identified NC or ORF A/B as the regions that showed no intragenic recombination among all clades. In contrast, the putative recombination event falling in ORF B in clade C resulted in significant homoplasy. A phylogenetic analysis of ORF B resulted in 348 equally parsimonious trees. Even a small amount of homoplasy (C.I. = 0.857) yielded a significant amount of phylogenetic uncertainty; in this case, the strict consensus of all 348 trees was predominantly fan shaped, i.e., with a topology similar to clade D in Figure 3a. If we examined only nonrecombining blocks with ORF B, there would be too few mutations within each block to yield any useful genealogy. Therefore, the sequences from ORF B were excluded from further analysis. From the remaining data we inferred 13 distinct haplotypes for NC and 26 for ORF A/B (Table 3). The distribution and characterization of base substitutions and indels in these regions are summarized in Table 3. In ORF A/B the ratio of nonsynonymous (r) to synonymous (s) substitutions was different within the two contiguous segments. In the 3′ ORF A segment r/s = 11/13, whereas in the 5′ ORF B segment r/s = 8/18.
Virus tests of neutrality and population subdivision: A priori inferences of population processes were based on tests of neutrality (Tajima 1989; Fu and Li 1993; Fu 1997) that assume a constant population size, no recombination, and no migration. A significant test result may suggest a departure from neutrality, the presence of population subdivision (Simonsenet al. 1995; Fu 1996), population growth, or background selection (Fu 1997). We distinguished among population growth and background selection by comparing Tajima’s D, Fu and Li’s D* and F*, and Fu’s FS tests. If only Fu and Li’s D* and F* are significant this suggests background selection (Fu 1997). If only Fu’s FS is significant this suggests population growth or genetic hitchhiking. Estimates of D for NC and 3′ ORF A were not significantly different from 0 (P > 0.1; Table 5) within Teano and Bergamo; we cannot reject the hypothesis of selective neutrality for DNA sequence variation in these regions. The D values for 5′ ORF B are only marginally significant (0.01 < P < 0.10) and the FS values are moderately to highly significant (0.001 < P < 0.05). This was evidence in support of weak (negative) selection and population growth. In the presence of population subdivision the power of these tests decreases (Fu 1996) and alternative tests for population subdivision as proposed by Hudson et al. (1992a) should be used to avoid making false positives. A significant P value (P < 0.0001) for Hudson’s tests (KST = 0.1021, KS = 11.5003, KT = 12.8079) indicated that Teano and Bergamo were genetically differentiated (Hudsonet al. 1992a).
Virus coalescent analysis: The next objective was to determine whether there was any evidence of migration between the two localities. MDIV showed some evidence for intermediate levels of gene flow between Teano and Bergamo in NC (M = 1) and 3′ ORF A (M = 0.7); however, the hypothesis of M = 0 could not be rejected given the shape of the posterior distribution (Figure 5). The decrease in power may be the result of insufficient sequence variation and shorter sequences in NC (l = 470, s = 12, M = 1, T = 0.3, θ= 2.3) and 3′ ORF B (l = 283, s = 24, M = 0.7, T = 0.5, θ= 4.4) vs. 5′ ORF B (l = 481, s = 24, M = 0, T = 1.0, θ= 5) and ORF A/B (l = 764, s = 48, M = 0, T = 0.7, θ= 12). The results from MIGRATE also showed no evidence of migration between Teano and Bergamo in the 5′ ORF B region but some evidence of asymmetrical migration from Teano to Bergamo in the NC, 3′ ORF A, and the combined ORF A/B loci (Table 5). However, the inference of asymmetrical equilibrium migration (i.e., migrants exchanged at a constant rate) from MIGRATE may not be valid because there is no evidence of migration between Teano and Bergamo (Figure 5). Collectively we interpret these results as evidence of shared ancestral polymorphisms between Teano and Bergamo but little or no detectable migration.
The hypothesis of no genetic differentiation of viruses between the dominant vc types in Teano (EU-10 and EU-12) and Bergamo (EU-1 and EU-2) could not be rejected using Hudson’s tests (P = 0.524 and P = 0.131, respectively). Although there was no evidence of subdivision we performed the analysis with MDIV (Figure 6) and MIGRATE (Table 6) to examine how these methods perform when there is no population subdivision. The migration posterior probability distributions estimated using MDIV did not reach a maximum (Figure 6). In Teano, parameter estimation using MIGRATE for 3′ ORF A and 5′ ORF B resulted in very high estimates for θ and migration rates indicating that a maximum was not reached in the likelihood surface (Table 6). Our analyses of the combined ORF A/B region using MIGRATE suggested that migration between vc types was approximately equal in both directions in Teano and asymmetrical in Bergamo; however, these results are strongly influenced by the assumption of equilibrium migration in MIGRATE. The results from all analyses suggest unrestricted movement of CHV-1 between the dominant vc types in Teano and Bergamo.
In simulations using Genetree for NC (Figure 7) and ORF A/B (Figure 8) regions, we assumed a very low level of migration between Teano and Bergamo (close to 0) for the starting backward migration matrix. The coalescent-based genealogy inferred for NC did not have enough mutations (Figure 7) to resolve the Teano-Bergamo population splitting event. In contrast, the genealogy for ORF A/B was informative for inferring the mutational history of this region with respect to variation between populations and vc types within populations (Figure 8). On the basis of estimates of the mean ages of mutations from coalescent simulations, measured in units of N generations, the oldest mutations in Teano are 30, 1, and 17 with mean ages of mutations 0.726 (SD = 0.181), 0.421 (SD = 0.147), and 0.349 (SD = 0.141), respectively (Figure 8). These mutations are more likely to have arisen in Teano or in the progenitor of the founding population. The oldest mutation in Bergamo, 31, has a mean age of 0.569 (SD = 0.163). These results suggest two possible alternatives: (i) viral introduction and divergence started in the geographical area around Teano or (ii) viral divergence predates introduction into Teano; i.e., the three ancestral haplotypes that existed at T = 0.8 (see Figure 8) were immigrant haplotypes from a founding population in Asia. All mutations along branches in the genealogy are population specific and in some cases vc type specific as seen in Bergamo, e.g., mutations 3 and 4, which are found in vc types EU-17 and EU-15, respectively. Finally, in simulations using Recom58, the recombination boundaries that we identified in Teano (positions 500 and 1200 in Figure 9) were found only in a subset of isolates from Bergamo (clade D; Figure 9). The existence of an additional clade in Bergamo (clade C) with a recombination boundary around position 1700 (not found in Teano) suggests that viral evolution has been driven by local dynamics in Teano and Bergamo.
Vertical transmission of CHV-1 by sexual reproduction: We assayed a total of 951 ascospore progeny from seven reciprocal crosses to test for vertical transmission of CHV-1; 629 of these were from eight perithecia produced by CHV-1-infected maternal isolates (Table 2). None of the ascospore progeny contained CHV-1, supporting the hypothesis that CHV-1 is not vertically transmitted in sexual reproduction.
The main objective of this research was to estimate migration of viruses between host genotypes as a means of estimating horizontal virus transmission in nature. Overall, we found evidence for unrestricted migration of CHV-1 between dominant vc types in two populations. In Teano, results from Hudson’s tests and MDIV found no evidence of subdivision among vc types EU-10 and EU-12. This was corroborated using MIGRATE, which showed that migration was approximately equal in both directions, with slightly more migration from vc type EU-12 to vc type EU-10 than vice versa (Table 6). The ORF A/B gene genealogy (Figure 8) corroborated this result, showing a small asymmetry in migration in which vc type EU-10 was associated more frequently with the tip (i.e., more recently derived) haplotypes (17, 18, 19, 20, and 21) than with interior (i.e., ancestral) haplotypes (1 and 7). In contrast, EU-12 was associated more frequently with interior haplotypes (1, 2, and 3) than with tips (14). The gene genealogy based on the 5′-noncoding region was less informative because there were relatively fewer mutations in this region.
Similar to our analyses for Teano, results from Hudson’s tests and MDIV in Bergamo found no evidence of subdivision among vc types EU-1 and EU-2. As above, this was corroborated using MIGRATE, which showed that migration between the dominant vc types was high and markedly asymmetric, with migration greater from EU-2 to EU-1 than vice versa (Table 6). Furthermore, the coalescent-based genealogy of ORF A/B (Figure 8) corroborates migration estimates from MIGRATE. The inferred ancestral haplotype 15, with no recently evolved mutations (Figure 8), is the interior-most haplotype in the gene genealogy and was isolated from vc type EU-2. Viral haplotypes associated with vc types EU-6, EU-15, and EU-17 are more recently evolved and share a common ancestor with viral haplotypes associated with EU-1. Consequently, the inferred order of haplotype divergence (for a representative of each vc type), from oldest to youngest, in Bergamo is 15, 9, (10, 13, 16); the haplotypes in parentheses diverged at approximately the same time.
To make inferences about horizontal transmission of viruses between vc types from migration analyses required that we test the assumption that viruses are not transmitted vertically to nonparental vc types during sexual reproduction; vertical transmission into recombinant vic genotypes would have the same effect on migration estimates as horizontal transmission. However, no viruses were transmitted into sexual offspring, even when both parents were virus infected (Table 2), supporting the few tests made previously (Anagnostakis 1982; Elliston 1985). Therefore, migration of viruses between vc types must have occurred via horizontal transmission.
Approaching the study of virus transmission from the perspective of migration provides an alternative approach—resulting in different estimates—to laboratory studies on transmission. Migration estimates do not correlate well with the probability of transmission between vc types estimated in the laboratory (Table 6). For example, transmission between EU-10 and EU-12 in the laboratory is markedly asymmetric with greater transmission occurring from EU-10 to EU-12 (Cortesiet al. 2001). However, migration between these vc types is approximately equal in both directions (Table 6). In Bergamo, the asymmetry in migration is opposite to the asymmetry in virus transmission (Table 6). Furthermore, the average probability of transmission between dominant vc types (ignoring asymmetries) is greater in Teano than in Bergamo, but migration estimates are much larger within Bergamo than in Teano (Table 6). At least two different factors make these approaches fundamentally different. First, the coalescent-based migration analysis and laboratory studies provide estimates of transmission on different time scales. The coalescent estimates migration over the entire genealogical history of a locus and can assume either a constant or a varying population size. Laboratory studies, in contrast, are based on finite numbers of brief interactions between individuals to obtain estimates of probabilities of transmission per contact (e.g., Cortesiet al. 2001). The second factor distinguishing these two approaches is that laboratory conditions are highly artificial, in which transmission may differ from natural conditions because of environmental factors affecting fungal physiology and cellular interactions. Coalescent-based methods integrate the transmission of viruses under natural conditions, over long periods of time, taking into account the interaction of numerous host individuals.
There was little or no detectable migration of CHV-1 between Teano and Bergamo as inferred from our analyses using MDIV and MIGRATE. The results from RecPars (Figure 3) suggest that the CHV-1 in Teano and Bergamo share a common ancestor that predates population divergence. The geographic distribution of mutations in the coalescent also supports the hypothesis that Teano is the older population. This suggests that the mutational structure of the founding viral population can be reconstructed more accurately from strains in Teano than from those in Bergamo. This does not, however, imply that Teano is the founding population. C. parasitica was first found in Europe (near Genoa, in northern Italy) in 1938 (Biraghi 1946). Assuming that CHV-1 was introduced with C. parasitica from Asia, these two populations were colonized within a relatively short period of time, either directly from Asia or from additional founder events within Italy. To test this hypothesis properly we would need to look at the same genomic regions in viral samples from Asia, the hypothesized centers of origin of the fungus and virus.
Genetic isolation and rare recombination in CHV-1 has the effect of creating distinct recombination blocks in evolutionarily older populations. Ancestral viral immigration to Bergamo initiated population subdivision such that the contemporary Teano and Bergamo populations are highly structured with restricted migration between them. Fungal subpopulations of C. parasitica in Teano and Bergamo are highly differentiated with respect to vic allele frequencies (Milgroom and Cortesi 1999), with 44% of variation at these loci attributable to differences between populations (M. G. Milgroom, unpublished data). Because fungal viruses are entirely dependent on their hosts for dispersal, restricted migration of C. parasitica likely restricts migration of CHV-1 also.
Homologous recombination is common in RNA viruses and has been extensively studied in the related Picornaviridae and in the plant-infecting Bromoviridae (Bujarski and Nagy 1996; Agol 1997). Although recombination may occur anywhere along the viral genome, there may be preferred sites on the genome at which RNA recombination is more frequent (e.g., Bruyereet al. 2000). Current understanding of CHV-1 replication is poor, and there is no published information on preferred sites for homologous RNA recombination. While recombination in viruses is interesting per se, detection and localization of recombination events to specific clades in the phylogeny was an essential prerequisite for estimating migration using coalescence. Recombination blocks can extend over long stretches of nucleic acid and have been reported as a common phenomenon in the haplotype structure of fungi (Carbone and Kohn 2001a,b) and the human genome (Dalyet al. 2001; Gabrielet al. 2002; Stumpf 2002). Recombination detected within ORF B of CHV-1 precluded its use for estimating migration. While it would have been possible to analyze smaller nonrecombining blocks within ORF B, these smaller partitions have too few characters for phylogenetic inference.
In addition to examining the distribution of migration, divergence, and mutation events in the ancestral histories of CHV-1 in Teano and Bergamo, the gene genealogies based on the coalescent allow us to examine patterns of divergence at the amino acid level in ORF A/B. Although the mutation rates are constant in 3′ ORF A and 5′ ORF B (see estimates of θ and Tajima’s D in Table 5), the ratio of nonsynonymous to synonymous substitutions is different between these two loci. In 3′ ORF A, r/s = 11/13, suggesting selective neutrality, whereas in 5′ ORF B r/s = 8/18, suggesting negative selection. Interestingly, the replacement substitutions in 5′ ORF B (shaded circles in Figure 8) evolved more recently than replacement substitutions in 3′ ORF A (unshaded circles in Figure 8). This suggests that negative selection may play an important role, in addition to drift, in shaping contemporary viral evolution in nature (also recently shown as an important diversifying mechanism in HIV-1 by Kils-Hüttenet al. 2001). Perhaps this represents a mechanism that allows invading haplotypes to adapt to a new environment or resident haplotypes to adapt to a changing environment. These patterns can also be explained using a neutral mutation hypothesis whereby deleterious or beneficial mutations arise spontaneously and then either are purged or become fixed in the population.
Genealogical analysis has proven to be a powerful tool in this study for providing insight into the transmission dynamics of a virus in two host populations. An important advantage of estimating migration rates using the coalescent is that the accuracy of migration estimates is limited only by the number of informative characters for phylogenetic reconstruction. In this study, the coalescent provides a good approximation of the ancestral history even though the sample is small and may fail to capture the full range of genetic heterogeneity. Coalescent methods are powerful because they go beyond variation that is present in the sample to infer missing or ancestral genotypes in the population. At present, coalescent models have been developed that can accommodate the mutation process with or without migration (Genetree), recombination (Recom58), migration and recombination (LAMARC), or selection (Neuhauser and Krone 1997) but not a combination of all these forces occurring simultaneously. The isolation of Teano and Bergamo coupled with viral recombination, purifying selection, and unrestricted migration among vc types within populations cannot be examined in a holistic fashion using the available coalescent models. LAMARC, for example, assumes equilibrium migration and recombination throughout the entire coalescent history of the sample of DNA sequences. An appropriate model would need to distinguish between shared ancestral polymorphisms, migration, recombination, and selection on several different temporal and spatial scales: among haplotypes, clades, and populations. A future challenge will be the development of multilocus coalescent models that can deal with recombination, migration, and selection, simultaneously. On the basis of the viral dynamics elucidated in this study, these models will be key in reconstructing the evolutionary dynamics of viral populations.
We thank Peter Beerli, Robert Griffiths, Linda Kohn, and Rasmus Nielsen for helpful comments and discussion on earlier drafts of this manuscript. We also thank Tobin Peever, Daniel Saha, and Alex Kay for help with screening for viruses in C. parasitica and Robert Foglia and Marcus Riedhammer for help with sequencing. This research was funded in part by the U.S. Department of Agriculture National Research Initiatives Competitive Grants Program grants 95-37303-1708 and 98-35303-6431 to M.G.M. and by McIntire-Stennis funds to B.I.H.
Communicating editor: S. Tavaré
Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos. AY125727–AY125813.
- Received September 5, 2003.
- Accepted January 6, 2004.
- Copyright © 2004 by the Genetics Society of America