Genetics, Vol. 166, 1611-1629, April 2004, Copyright © 2004

Recombination and Migration of Cryphonectria hypovirus 1 as Inferred From Gene Genealogies and the Coalescent

Ignazio Carbonea, Yir-Chung Liu1,b, Bradley I. Hillmanc, and Michael G. Milgroomb
a Center for Integrated Fungal Research, Department of Plant Pathology, North Carolina State University, Raleigh, North Carolina 27695,
b Department of Plant Pathology, Cornell University, Ithaca, New York 14853
c Department of Plant Pathology, Rutgers University, New Brunswick, New Jersey 08901

Corresponding author: Michael G. Milgroom, 334 Plant Science Bldg., Cornell University, Ithaca, NY 14853., mgm5{at}cornell.edu (E-mail)

Communicating editor: S. TAVARÉ


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Genealogy-based methods were used to estimate migration of the fungal virus Cryphonectria hypovirus 1 between vegetative compatibility types of the host fungus, Cryphonectria parasitica, as a means of estimating horizontal transmission within two host populations. Vegetative incompatibility is a self/non-self recognition system that inhibits virus transmission under laboratory conditions but its effect on transmission in nature has not been clearly demonstrated. Recombination within and among different loci in the virus genome restricted the genealogical analyses to haplotypes with common mutation and recombinational histories. The existence of recombination necessitated that we also use genealogical approaches that can take advantage of both the mutation and recombinational histories of the sample. Virus migration between populations was significantly restricted. In contrast, estimates of migration between vegetative compatibility types were relatively high within populations despite previous evidence that transmission in the laboratory was restricted. The discordance between laboratory estimates and migration estimates from natural populations highlights the challenges in estimating pathogen transmission rates. Genealogical analyses inferred migration patterns throughout the entire coalescent history of one viral region in natural populations and not just recent patterns of migration or laboratory transmission. This application of genealogical analyses provides markedly stronger inferences on overall transmission rates than laboratory estimates do.


GENEALOGY-BASED methods have been shown to enhance precision over FST-based estimators for estimating migration because they consider the temporal relationships of individuals in a population sample in addition to their frequency and spatial distributions (HUDSON et al. 1992B Down; TEMPLETON 1998 Down). This temporal component is important for detecting and separating migration from other population diversifying forces, such as recombination (HUDSON et al. 1992B Down) and isolation (NIELSEN and WAKELEY 2001 Down). Coalescent-based simulations of genealogical relationships further enhance the precision of migration estimates by allowing for unequal population sizes and asymmetries in migration rates in the reconstruction of ancestral migration patterns (BEERLI and FELSENSTEIN 1999 Down, BEERLI and FELSENSTEIN 2001 Down; BAHLO and GRIFFITHS 2000 Down; NIELSEN and WAKELEY 2001 Down). Because most coalescent-based approaches assume neutrality and no recombination they are most powerful when used in conjunction with other genealogical methods that can (i) test the neutral mutation hypothesis and (ii) identify potential recombination events in the history of the sample (CARBONE and KOHN 2001A Down). Although these analytical methods have proven to be powerful for making inferences on migration and effective population sizes on simulated data sets (BEERLI and FELSENSTEIN 1999 Down; NIELSEN and WAKELEY 2001 Down; VITALIS and COUVET 2001 Down; WAKELEY and ALIACAR 2001 Down), relatively few empirical examples have been tested (CONGDON et al. 2000 Down; BEERLI and FELSENSTEIN 2001 Down; CARBONE and KOHN 2001A Down; HEATH et al. 2002 Down; KINNISON et al. 2002 Down).

One potential application of migration analyses is to track the movement of specific genotypes within and between populations. A specific case in point is the estimation of transmission of pathogens between host individuals (horizontal transmission). Horizontal transmission is a pivotal parameter affecting the evolution of virulence (BULL 1994 Down; LEVIN 1996 Down) and the potential for pathogens to invade host populations (ANDERSON and MAY 1986 Down). Information is available about horizontal transmission of RNA viruses in some host systems, but relatively little detail is known about horizontal virus transmission in filamentous fungi. Horizontal transmission of viruses in filamentous fungi occurs when the cells of different individuals fuse, allowing viruses to move from the cytoplasm of one individual to the other. However, horizontal transmission is restricted by vegetative or heterokaryon incompatibility (reviewed in CORTESI et al. 2001 Down), a genetically controlled self/non-self recognition system that results in programmed cell death between incompatible genotypes (BIELLA et al. 2002 Down; GLASS and KANEKO 2003 Down). Cell death associated with vegetative incompatibility often, but not always, reduces horizontal transmission of viruses, plasmids, or other cytoplasmic elements (e.g., CATEN 1972 Down; HARTL et al. 1975 Down; GRIFFITHS et al. 1990 Down; CORTESI et al. 2001 Down).

In the chestnut blight fungus, Cryphonectria parasitica, vegetative incompatibility has been hypothesized to be a major barrier to horizontal transmission and, therefore, a significant constraint on biological control of chestnut blight with viruses (ANAGNOSTAKIS et al. 1986 Down; MILGROOM 1995 Down; LIU et al. 2000 Down). Vegetative incompatibility is controlled by alleles at six or more vegetative incompatibility (vic) loci in C. parasitica; multilocus vic genotypes are referred to as vegetative compatibility (vc) types (CORTESI and MILGROOM 1998 Down). Detailed laboratory studies have shown that each vic allele has a different quantitative effect reducing the probability of virus transmission (CORTESI et al. 2001 Down). Although it is possible to estimate the average transmission between vic genotypes in the laboratory, these estimates do not necessarily reflect actual transmission under natural conditions. In nature, transmission may be affected by the number of contacts between individuals, length of time individuals are in contact, spatial heterogeneity of genotypes, etc. Therefore, estimating migration of viruses using genealogical methods offers an alternative means of estimating horizontal transmission between vic genotypes independent of laboratory estimates.

The primary objective in this research was to estimate migration of viruses between groups of vegetatively incompatible individuals (i.e., vic genotypes) in two populations of the host fungus, C. parasitica. A gene genealogical approach to estimating migration requires that we test assumptions concerning recombination and population subdivision. Therefore, our subsidiary objectives were to test for recombination in the virus genome and estimate virus migration between populations.


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Cryphonectria hypovirus 1 (CHV-1) is a double-stranded RNA (dsRNA) virus in the family Hypoviridae that infects the chestnut blight fungus, C. parasitica (HILLMAN et al. 2000 Down). Although classified as a dsRNA virus, the 12.7-kb RNA genome of CHV-1 (Fig 1) is more closely related to positive-sense RNA viruses and shares their essential replication properties (KOONIN et al. 1991 Down; FAHIMA et al. 1993 Down; HILLMAN et al. 2000 Down). Interest in this virus derives from its potential as a biological control agent for chestnut blight (VAN ALFEN et al. 1975 Down; NUSS 1992 Down). CHV-1 has been found extensively in Europe (ALLEMANN et al. 1999 Down); it has also been found in Japan and China (PEEVER et al. 1998 Down; LIU et al. 2003 Down) and, more recently, in Korea (J. LIM, B. CHA and M. G. MILGROOM, unpublished observations). CHV-1 was probably introduced into Europe with C. parasitica, which was introduced into North America and Europe from eastern Asia in the early 1900s (MILGROOM et al. 1996 Down).



View larger version (14K):
In this window
In a new window
Download PPT slide
 
Figure 1. Genome organization of CHV-1 (SHAPIRA et al. 1991 Down) showing open reading frames (ORFs A and B) with proteases (PR), RNA polymerase (POL), and helicase (HEL) genes. Locations, sizes, and abbreviations for sequenced regions are also shown (see text for details).

Populations sampled:
We sampled CHV-1 from two populations of C. parasitica: Teano from Caserta Province in southern Italy and Bergamo from Bergamo Province in northern Italy. Populations of C. parasitica in Italy are dominated by a few vic genotypes (or vc types; CORTESI et al. 1996 Down; MILGROOM and CORTESI 1999 Down), making it possible to obtain multiple isolates of CHV-1 from the same vc types. In Teano (n = 194), 96% of the fungal isolates were in vc types EU-10 (19%) and EU-12 (77%; LIU et al. 1996 Down; MILGROOM and CORTESI 1999 Down). In Bergamo (n = 158), 68% of the fungal isolates were in vc types EU-1 (33%) and EU-2 (35%; CORTESI et al. 1996 Down). In contrast to fungal populations, we had no prior knowledge of diversity or population structure of CHV-1.

Virus sampling:
To sample CHV-1, we screened C. parasitica isolates from Teano for dsRNA using a method described previously (MORRIS et al. 1983 Down; LIU et al. 2003 Down). C. parasitica isolates from Bergamo were screened for CHV-1 using an immunoblot method (PEEVER et al. 2000 Down). For both samples, the presence of CHV-1 was confirmed by dsRNA isolation in cellulose column chromatography (MORRIS and DODDS 1979 Down; PEEVER et al. 1997 Down). In addition to CHV-1 sampled from vc types EU-1 and EU-2 in Bergamo, we included one isolate from each of vc types EU-6, EU-15, and EU-17, which were isolated for a different study (PEEVER et al. 2000 Down).

From each CHV-1 isolate, we obtained nucleotide sequences from three different regions of the genome (Fig 1): 471 bp from the 5'-noncoding region and 57 bp from the 5' end of open reading frame (ORF) A (referred to as NC); 283 bp from the 3' end of ORF A and the contiguous 481 bp from the 5' end of ORF B (ORF A/B); and a 1096-bp segment approximately in the middle of ORF B. cDNA synthesis by reverse transcription was obtained using random hexamer primers as described (HILLMAN et al. 1992 Down; CHUNG et al. 1994 Down) and the GeneAmp RNA PCR kit (Perkin-Elmer-Cetus, Norwalk, CT). First-strand cDNA was synthesized from dsRNA primed with random hexamer primers. Oligonucleotide primers for amplifying three regions of CHV-1 by PCR (Table 1) anneal to conserved sequences and were designed to amplify a fragment from CHV-1 and the related hypovirus CHV-2 (HILLMAN et al. 2000 Down). PCR was catalyzed with ExTaq polymerase (Mirus, Madison, WI) in the presence of 0.2 mM of each dNTP and 0.15 µM of the primers. cDNA template was denatured at 94° for 2 min. Thirty-three cycles of denaturation at 94° for 30 sec, annealing at 55° for 1.5 min, and polymerization at 72° for 1.5 min were conducted. Sequencing of both strands was done by the Cornell University Sequencing Facility or at Rutgers University.


 
View this table:
In this window
In a new window

 
Table 1. Oligonucleotide primers used for amplifying parts of three regions of CHV-1

Virus cDNA sequence analysis:
The following steps in the analysis are outlined in the flowchart in Fig 2. The implementation of several programs in the flowchart was facilitated using a Java program (SNAP Workbench) that manages and coordinates a series of other programs (PRICE and CARBONE 2003 Down). SNAP Workbench was developed in the Carbone laboratory and is distributed over the Internet, http://www.cals.ncsu.edu/plantpath/faculty/carbone/home.html; additional programs in the SNAP (Suite of Nucleotide Analysis Programs) series are also accessible from this URL. An important functionality of our workbench software is the implementation of protocols consisting of a defined set of programs, assumptions, and parameter settings for following a particular path in the flowchart (e.g., see paths 1 and 2 in Fig 2). SNAP Workbench provided a single graphical user interface for performing the analyses described below.



View larger version (37K):
In this window
In a new window
Download PPT slide
 
Figure 2. Flowchart illustrating the sequence of steps and tools used in the analysis of the CHV-1. The analysis is guided by a series of questions shown in diamonds and, depending on the answers to these questions (yes or no), one or more analysis paths are possible. At the recombination block junction, two paths are possible depending on whether the analysis is performed for each recombination block separately (path 1) or for all blocks combined (path 2). The analyses for path 1 should be performed first so that inferences from MDIV, which assumes no recombination, can be used for path 2, which assumes recombination (see arrow connecting MDIV analyses in paths 1 and 2). The names of software tools that were used to perform specific analyses are italicized; Hudson refers to the programs Seqtomatrix and Permtest developed by R. Hudson. The presence of recombination increases the power of Hudson's tests to detect subdivision. This may be due to increased haplotype diversity in the total population or, more importantly, to recombination giving rise to unique haplotype blocks (see Fig 4) that distinguish the different subpopulations (clades). These tests may be more powerful in detecting population differences when certain assumptions of population processes are not satisfied or are unknown. The coalescent models implemented in MIGRATE, Recom58, and Genetree are positioned at the ends of the flowchart. These models make specific assumptions of the mutation process that must be validated a priori (see text for details). A coalescent model with population subdivision but no migration and with recombination has yet to be implemented.

To begin the analysis, viral cDNA sequences corresponding to the coding strand of the dsRNA were aligned using CLUSTAL W version 1.7 (THOMPSON et al. 1994 Down). Multiple DNA sequence alignments were generated for the three sequenced regions; alignments were accessioned in GenBank (AY125727, AY125728, AY125729, AY125730, AY125731, AY125732, AY125733, AY125734, AY125735, AY125736, AY125737, AY125738, AY125739, AY125740, AY125741, AY125742, AY125743, AY125744, AY125745, AY125746, AY125747, AY125748, AY125749, AY125750, AY125751, AY125752, AY125753, AY125754, AY125755, AY125756, AY125757, AY125758, AY125759, AY125760, AY125761, AY125762, AY125763, AY125764, AY125765, AY125766, AY125767, AY125768, AY125769, AY125770, AY125771, AY125772, AY125773, AY125774, AY125775, AY125776, AY125777, AY125778, AY125779, AY125780, AY125781, AY125782, AY125783, AY125784, AY125785, AY125786, AY125787, AY125788, AY125789, AY125790, AY125791, AY125792, AY125793, AY125794, AY125795, AY125796, AY125797, AY125798, AY125799, AY125800, AY125801, AY125802, AY125803, AY125804, AY125805, AY125806, AY125807, AY125808, AY125809, AY125810, AY125811, AY125812, AY125813). Sequence data from multiple loci were combined into one alignment file using SNAP Combine (AYLOR and CARBONE 2003 Down). Using SNAP Map (AYLOR and CARBONE 2003 Down) and SITES version 1.1 (HEY and WAKELEY 1997 Down) we collapsed sequences into unique haplotypes, recoding insertion or deletions (indels) and categorizing base substitutions as phylogenetically informative or uninformative, transitions vs. transversions, and replacement vs. synonymous amino acid changes in coding regions. A phylogenetic analysis with unweighted parsimony was performed using PAUP* 4.0 (SWOFFORD 1998 Down); if heuristic searches yielded two or more equally parsimonious trees, we inferred a strict consensus tree.

We generated site compatibility matrices for all subsets of haplotypes that share a common ancestor in the strict consensus tree using SNAP Clade (MARKWORDT et al. 2003 Down). Compatibility methods examine the overall support or conflict among variable sites in a DNA sequence alignment and are useful for visual identification of partitions or blocks undergoing reticulate evolution (JAKOBSEN et al. 1997 Down). In this article, we adapt compatibility methods for identifying reticulation events among groups of two or more haplotypes or clades in a phylogeny. Compatibility matrices were graphically illustrated using SNAP Matrix (MARKWORDT et al. 2003 Down) and examined for the existence of two or more sites showing an identical pattern of compatibility/incompatibility. Clusters of two or more identical sites in a matrix define a recombination block. If recombination blocks were evident we followed two possible analysis paths, as illustrated in the flowchart (Fig 2). In path 1, we identified recombination block boundaries and the minimum number of recombination events in a clade using RecMin (MYERS and GRIFFITHS 2003 Down). We used the putative recombination boundaries identified within each recombining clade from RecMin as a guide to identify the common or shared recombination block boundaries among all clades. We then collapsed the common nonrecombining blocks among all clades into haplotypes using SNAP Map. In path 2, we used RecPars (HEIN 1990 Down; http://www.daimi.au.dk/~compbio/recpars/recpars.html) to infer the number of recombinant segments (i.e., recombination blocks) in the combined multilocus data set. The best phylogeny inferred using RecPars is the most parsimonious reconstruction for all phylogenetically informative sites within an inferred recombinant block and represents a minimal path through the ancestral recombination graph (SONG and HEIN 2003 Down). By comparing the RecPars tree to the strict consensus tree we were able to reconstruct phylogenetic relationships among haplotypes that were previously unresolved in the strict consensus tree and determine the relative timing of recombination events in the history of the sample.

Virus tests of neutrality and population subdivision:
For each population and nonrecombining block we estimated the population mutation rate, {theta}, from the number of segregating sites, s (WATTERSON 1975 Down), and performed tests of neutrality using DNASP version 3.53 (ROZAS and ROZAS 1999 Down). This provided a priori population parameter estimation and inferences on population growth patterns, i.e., equilibrium vs. nonequilibrium, and evolutionary forces, i.e., neutrality vs. selection. To test for population differentiation we used SNAP Map to generate the appropriate sequence file, Seqtomatrix (HUDSON et al. 1992A Down) to convert the sequence file to a distance matrix, and Permtest (HUDSON et al. 1992A Down) to test for geographic subdivision in samples of DNA sequences from two or more localities; this latter test was done using a nonparametric permutation method based on Monte Carlo simulations. Hudson's test statistics (KST, KS, and KT) were calculated for each recombination block (path 1) and again for the entire data set (path 2). Testing the hypothesis of no genetic differentiation is an important first step and determines whether we assume a panmictic or a subdivided population model for examining population divergence using Genetree (path 1) and recombination parameter estimation using Recom58 (path 2).

Virus coalescent analysis:
If we found evidence for geographic subdivision, the observed patterns of genetic divergence were further explored to determine whether there was evidence for constant (i.e., equilibrium) migration among populations or shared ancestral polymorphisms followed by population divergence but no migration. Two migration models were used for this analysis. First, MDIV was used to test for equilibrium migration vs. shared ancestral polymorphisms between two subdivided populations (NIELSEN and WAKELEY 2001 Down). The approach of NIELSEN and WAKELEY 2001 Down uses either an infinite sites or a finite sites model without recombination and implements both likelihood and Bayesian methods using Markov chain Monte Carlo coalescent simulations for jointly estimating the population mean mutation rate ({theta}), divergence time (T), migration rate (M), and time since the most recent common ancestor (TMRCA) between two subdivided populations. This approach assumes that both populations descended from one panmictic population that may (or may not) have been followed by migration. Second, if MDIV showed evidence of equilibrium migration, MIGRATE was used to estimate migration rates assuming equilibrium migration rates (symmetrical or asymmetrical) in the ancestral history of populations (BEERLI and FELSENSTEIN 2001 Down). Both MDIV and MIGRATE assume neutrality and no recombination and therefore these analyses must be restricted to neutral nonrecombining blocks in the data set (path 1). If we found evidence of equilibrium migration from our analyses in path 1, then we used this inference in path 2 (see Fig 2) to estimate migration and recombination rate parameters simultaneously using LAMARC (http://evolution.genetics.washington.edu/lamarc/lamarc_about.html). If we found evidence of population subdivision and no migration in path 1, then we reconstructed the ancestral history of the sample for each nonrecombining block using Genetree version 9.0 (http://www.stats.ox.ac.uk/~griff/software.html). There was no counterpart for the Genetree analysis in path 2 (see Fig 2). The genealogy with the highest root probability, the ages of mutations, and the TMRCA of the sample were estimated from coalescent simulations in path 1. We further identified the position (i.e., coalescent time) of mutations giving rise to replacement and synonymous amino acid changes for substitutions in coding regions.

Finally, we estimated the recombination parameter, {rho}, within each population using Recom58 (GRIFFITHS and MARJORAM 1996 Down), which assumes no population subdivision. The magnitude of the recombination parameter is a measure of the likelihood of recombination over mutation at a specific site. Although a more efficient method for estimating the joint likelihood of {rho} and {theta} has been developed (FEARNHEAD and DONNELLY 2001 Down), we were able to greatly reduce the computational time of Recom58 by restricting our estimation of recombination rates within specific clades in the consensus tree. The novelty of our approach was in identifying clades with distinct recombination blocks and testing for population subdivision before estimating recombination parameters (path 2). We were able to corroborate our inference of recombination block boundaries (path 1) by estimating the average number of recombination events per site in the consensus sequence for each clade (path 2). Specifically, the sites with the most events were compared to the position of putative recombination boundaries identified using the site compatibility analysis, as described above.

All coalescent analyses were greatly facilitated using our workbench software. SNAP Workbench executes multiple, computationally intensive programs simultaneously on a single machine, Linux cluster, or supercomputer. This functionality was particularly important for programs using coalescent simulations (MIGRATE, LAMARC, MDIV, Recom58, and Genetree) because many independent coalescent runs were necessary to ensure convergence.

Vertical transmission of CHV-1 by sexual reproduction:
Inferences about horizontal transmission from estimates of virus migration are based on the assumption that viruses cannot move between vc types except by horizontal transmission. An alternative hypothesis is that viruses are vertically transmitted from virus-infected parents to offspring with recombinant vic genotypes during sexual reproduction. Vertical transmission of CHV-1 in C. parasitica occurs through asexual spores (RUSSIN and SHAIN 1985 Down; PEEVER et al. 2000 Down) but these give rise to individuals with the same vic genotype as their parents. Vertical transmission reportedly does not occur through sexual spores (ascospores) of C. parasitica (ANAGNOSTAKIS 1982 Down; ELLISTON 1985 Down), which are often of recombinant vic genotypes, but documentation of this is poor. Therefore, to test this hypothesis we set up seven crosses, using three CHV-1-infected isolates and six virus-free isolates as parents (Table 2). Crosses were performed on autoclaved chestnut stems and analyzed as described previously (LIU and MILGROOM 1996 Down). Four months after fertilization, we examined crosses for sexual fruiting bodies (perithecia) produced by each maternal parent. From each cross, we sampled 50–100 ascospore progeny from each of two or more perithecia. Progeny were sampled from all perithecia produced by CHV-1-infected maternal parents. Progeny isolates were assayed for CHV-1 by colony morphology by culturing on potato dextrose agar (PDA; Difco, Detroit); CHV-1-infected isolates of C. parasitica have characteristically less pigmentation when grown on PDA with low to moderate light (HILLMAN et al. 1990 Down). Ascospore isolates sampled from a cross between two CHV-1-infected parents were also screened for dsRNA as described above (MORRIS et al. 1983 Down; LIU et al. 2003 Down).


 
View this table:
In this window
In a new window

 
Table 2. Reciprocal crosses with CHV-1-infected and virus-free isolates of C. parasitica to test for vertical transmission of viruses into ascospore progeny


*  RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Virus cDNA sequence analysis:
A total of 2331 nucleotides were sequenced from each of 12 and 17 isolates of CHV-1 in Teano and Bergamo, respectively. There were 14 variable sites in NC (Table 3), 51 in ORF A/B (Table 3), and 53 in ORF B (data not shown); CHV-1 isolates within each haplotype are shown in Table 4. Fig 3 shows unrooted cladograms of the combined NC, ORF A/B, and ORF B regions of CHV-1. A parsimony analysis with heuristic searching in PAUP (assumes no recombination) resulted in 10 equally parsimonious trees each with a consistency index of 0.8321. The unrooted strict consensus tree is shown in Fig 3A. The most parsimonious reconstruction of the same haplotypes using RecPars (assumes recombination) is shown in Fig 3B. RecPars found two recombinant segments; however, the boundaries of these regions did not coincide with the boundaries inferred using site compatibility matrices (Fig 4). Since RecPars does not allow for indels and SNAP Clade does not allow for variable positions that violate an infinite sites model, both indels and variable positions violating infinite sites were excluded in parsimony analyses. This was necessary for direct comparison of parsimony trees inferred with and without recombination (Fig 3) and site compatibility matrices inferred using SNAP Clade (Fig 4). Clades were inferred using SNAP Clade and are defined as groups of two or more haplotypes with a distinct pattern of mutation and recombination in their history as illustrated in the site compatibility matrices in Fig 4. Distinct recombination blocks were detected in clades A, C, and D (Fig 4) in the strict consensus. All isolates from Teano (clade A) share a unique history of recombination and the two recombination blocks are well supported, with 6 sites (positions 121, 130, 1363, 1678, 1996, and 2299) in one block, spanning the NC and ORF B regions, that are incompatible with 4 sites (584, 598, 695, and 1165) in the other block, spanning ORF A/B (Fig 4). There was no evidence of recombination within each of these blocks in the entire data set. The actual sizes of these recombination blocks cannot be determined without sequencing the intervening regions between the NC, ORF A/B, and ORF B regions (Fig 4). In contrast to Teano, at least two different recombination histories (four distinct recombination blocks) were inferred for the Bergamo isolates, with two recombination blocks in each of clades C and D (Fig 4). No homoplasy (i.e., no incompatibility) was detected in clades B and E and only these clades were monophyletic in both parsimony trees (Fig 3).



View larger version (27K):
In this window
In a new window
Download PPT slide
 
Figure 3. Unrooted haplotype cladograms inferred from the combined NC, ORF A/B, and ORF B regions of CHV-1; indels and variable positions violating infinite sites were excluded. (a) The unrooted strict consensus haplotype tree inferred using parsimony alone. Each of the five clades has a distinct mutation and recombination history as inferred from site compatibility matrices (see Fig 4). No homoplasy was detected in clades B and E. Each haplotype is represented by a single CHV-1 isolate (shown in parentheses). Isolates with the prefix TE are from Teano; those with PC or HPC are from Bergamo. (b) The most parsimonious reconstruction of the same haplotypes in a assuming the existence of recombination in their evolutionary histories. Only clades B and E are monophyletic in both a and b.



View larger version (32K):
In this window
In a new window
Download PPT slide
 
Figure 4. Site compatibility matrices for homoplasious clades A, C, and D in the combined analysis of portions of the 5'-noncoding, ORF A/B, and ORF B regions in Teano and Bergamo, Italy. The numbers along the top of the matrix indicate variable positions in the multiple DNA sequence alignment of the combined data set; the second line of numbers in parentheses is nucleotide positions relative to CHV-1/Euro7 (GenBank accession no. AF082191). Incompatible sites are indicated by solid squares; all other sites in the matrix are compatible. The shaded and unshaded rectangles indicate the two inferred recombination blocks in each clade, designated numerically as blocks 1 and 2. The actual boundaries were determined by examining blocks among all clades. The dashed rectangles indicate the two regions (NC and ORF A/B) with common blocks spanning all clades. These regions were considered separately in coalescent analyses using Genetree (see Fig 7 and Fig 8). Putative boundaries fall within the range of possible solutions as specified using RecMin (data not shown).


 
View this table:
In this window
In a new window

 
Table 3. Distribution of haplotypes, base substitutions, and insertion/deletion events in NC and ORF A/B of CHV-1 in Teano and Bergamo


 
View this table:
In this window
In a new window

 
Table 4. Distribution of viral haplotypes, fungal host isolates, and vegetative compatibility types in the 5'-noncoding and ORF A/B regions of CHV-1 from Teano and Bergamo, Italy

The uncertainty in the branching patterns in the unresolved polytomies (clades A and D) in the strict consensus tree (Fig 3A) arises from an inherent limitation of the parsimony method itself, which assumes that there is no recombination in the history of the sequences. Therefore, we used RecPars (HEIN 1990 Down) to find the most parsimonious reconstruction of our set of haplotypes using only base substitutions and inferred recombination events; RecPars cannot handle insertions or deletions. This analysis yielded a partial rooting of unrooted trees, which provided inference on ancestral recombination events in the sample. For example, the two largest unresolved polytomies observed in the strict consensus (clade D and a subset of the haplotypes in clade A) were further resolved when we assume recombination in the ancestral history of the sample (Fig 3B); clade A, composed of all isolates from Teano, was the inferred root of the sample. The best parsimony tree inferred by RecPars shows how different clades sharing common recombination blocks are related to each other. For example, when we assume recombination (Fig 3B) clades C and D are split up into two different histories. This is consistent with the compatibility analysis, which revealed two distinct blocks in the ORF B region in clade C (Fig 4).

The results from the SNAP Clade and RecMin were used to determine the nonrecombining sequence blocks for coalescent analysis (see Fig 4). Specifically, we identified NC or ORF A/B as the regions that showed no intragenic recombination among all clades. In contrast, the putative recombination event falling in ORF B in clade C resulted in significant homoplasy. A phylogenetic analysis of ORF B resulted in 348 equally parsimonious trees. Even a small amount of homoplasy (C.I. = 0.857) yielded a significant amount of phylogenetic uncertainty; in this case, the strict consensus of all 348 trees was predominantly fan shaped, i.e., with a topology similar to clade D in Fig 3A. If we examined only nonrecombining blocks with ORF B, there would be too few mutations within each block to yield any useful genealogy. Therefore, the sequences from ORF B were excluded from further analysis. From the remaining data we inferred 13 distinct haplotypes for NC and 26 for ORF A/B (Table 3). The distribution and characterization of base substitutions and indels in these regions are summarized in Table 3. In ORF A/B the ratio of nonsynonymous (r) to synonymous (s) substitutions was different within the two contiguous segments. In the 3' ORF A segment r/s = 11/13, whereas in the 5' ORF B segment r/s = 8/18.

Virus tests of neutrality and population subdivision:
A priori inferences of population processes were based on tests of neutrality (TAJIMA 1989 Down; FU and LI 1993 Down; FU 1997 Down) that assume a constant population size, no recombination, and no migration. A significant test result may suggest a departure from neutrality, the presence of population subdivision (SIMONSEN et al. 1995 Down; FU 1996 Down), population growth, or background selection (FU 1997 Down). We distinguished among population growth and background selection by comparing Tajima's D, Fu and Li's D* and F*, and Fu's FS tests. If only Fu and Li's D* and F* are significant this suggests background selection (FU 1997 Down). If only Fu's FS is significant this suggests population growth or genetic hitchhiking. Estimates of D for NC and 3' ORF A were not significantly different from 0 (P > 0.1; Table 5) within Teano and Bergamo; we cannot reject the hypothesis of selective neutrality for DNA sequence variation in these regions. The D values for 5' ORF B are only marginally significant (0.01 < P < 0.10) and the FS values are moderately to highly significant (0.001 < P < 0.05). This was evidence in support of weak (negative) selection and population growth. In the presence of population subdivision the power of these tests decreases (FU 1996 Down) and alternative tests for population subdivision as proposed by HUDSON et al. 1992A Down should be used to avoid making false positives. A significant P value (P < 0.0001) for Hudson's tests (KST = 0.1021, KS = 11.5003, KT = 12.8079) indicated that Teano and Bergamo were genetically differentiated (HUDSON et al. 1992A Down).


 
View this table:
In this window
In a new window

 
Table 5. Population statistics, neutrality tests, and migration estimates based on variation in NC, the 3' ORF A and 5' ORF B segments of ORF A/B, and the entire ORF A/B region of CHV-1 from Teano and Bergamo

Virus coalescent analysis:
The next objective was to determine whether there was any evidence of migration between the two localities. MDIV showed some evidence for intermediate levels of gene flow between Teano and Bergamo in NC (M = 1) and 3' ORF A (M = 0.7); however, the hypothesis of M = 0 could not be rejected given the shape of the posterior distribution (Fig 5). The decrease in power may be the result of insufficient sequence variation and shorter sequences in NC (l = 470, s = 12, M = 1, T = 0.3, {theta} = 2.3) and 3' ORF B (l = 283, s = 24, M = 0.7, T = 0.5, {theta} = 4.4) vs. 5' ORF B (l = 481, s = 24, M = 0, T = 1.0, {theta} = 5) and ORF A/B (l = 764, s = 48, M = 0, T = 0.7, {theta} = 12). The results from MIGRATE also showed no evidence of migration between Teano and Bergamo in the 5' ORF B region but some evidence of asymmetrical migration from Teano to Bergamo in the NC, 3' ORF A, and the combined ORF A/B loci (Table 5). However, the inference of asymmetrical equilibrium migration (i.e., migrants exchanged at a constant rate) from MIGRATE may not be valid because there is no evidence of migration between Teano and Bergamo (Fig 5). Collectively we interpret these results as evidence of shared ancestral polymorphisms between Teano and Bergamo but little or no detectable migration.



View larger version (39K):
In this window
In a new window
Download PPT slide
 
Figure 5. The migration and divergence time posterior probability distributions between Teano and Bergamo estimated for each nonrecombining locus using MDIV. The data were simulated assuming an infinite sites model, 2,000,000 steps in the chain for estimating the posterior distribution and an initial 500,000 steps to ensure that enough genealogies were simulated before approximating the posterior distribution (NIELSEN and WAKELEY 2001 Down). Ten independent replicates using different starting random number seeds were simulated under the same model and parameters; results were similar between runs. Data points from MDIV were plotted using gnuplot version 3.7 available from http://www.gnuplot.info/.

The hypothesis of no genetic differentiation of viruses between the dominant vc types in Teano (EU-10 and EU-12) and Bergamo (EU-1 and EU-2) could not be rejected using Hudson's tests (P = 0.524 and P = 0.131, respectively). Although there was no evidence of subdivision we performed the analysis with MDIV (Fig 6) and MIGRATE (Table 6) to examine how these methods perform when there is no population subdivision. The migration posterior probability distributions estimated using MDIV did not reach a maximum (Fig 6). In Teano, parameter estimation using MIGRATE for 3' ORF A and 5' ORF B resulted in very high estimates for {theta} and migration rates indicating that a maximum was not reached in the likelihood surface (Table 6). Our analyses of the combined ORF A/B region using MIGRATE suggested that migration between vc types was approximately equal in both directions in Teano and asymmetrical in Bergamo; however, these results are strongly influenced by the assumption of equilibrium migration in MIGRATE. The results from all analyses suggest unrestricted movement of CHV-1 between the dominant vc types in Teano and Bergamo.



View larger version (32K):
In this window
In a new window
Download PPT slide
 
Figure 6. The migration posterior probability distributions between the dominant vc types in Teano (EU-10 and EU-12) and Bergamo (EU-1 and EU-2) estimated for each nonrecombining locus using MDIV. The data were simulated assuming an infinite sites model, 2,000,000 steps in the chain with an initial 500,000 steps before estimating the posterior distribution. Ten independent replicates using different starting random number seeds were simulated under the same model and parameters; results were similar between runs. Data points from MDIV were plotted using gnuplot version 3.7.


 
View this table:
In this window
In a new window

 
Table 6. Viral migration estimates based on variation in the 3' ORF A and 5' ORF B segments of ORF A/B and the entire ORF A/B region of CHV-1 from the dominant vc types in Teano and Bergamo

In simulations using Genetree for NC (Fig 7) and ORF A/B (Fig 8) regions, we assumed a very low level of migration between Teano and Bergamo (close to 0) for the starting backward migration matrix. The coalescent-based genealogy inferred for NC did not have enough mutations (Fig 7) to resolve the Teano-Bergamo population splitting event. In contrast, the genealogy for ORF A/B was informative for inferring the mutational history of this region with respect to variation between populations and vc types within populations (Fig 8). On the basis of estimates of the mean ages of mutations from coalescent simulations, measured in units of N generations, the oldest mutations in Teano are 30, 1, and 17 with mean ages of mutations 0.726 (SD = 0.181), 0.421 (SD = 0.147), and 0.349 (SD = 0.141), respectively (Fig 8). These mutations are more likely to have arisen in Teano or in the progenitor of the founding population. The oldest mutation in Bergamo, 31, has a mean age of 0.569 (SD = 0.163). These results suggest two possible alternatives: (i) viral introduction and divergence started in the geographical area around Teano or (ii) viral divergence predates introduction into Teano; i.e., the three ancestral haplotypes that existed at T = 0.8 (see Fig 8) were immigrant haplotypes from a founding population in Asia. All mutations along branches in the genealogy are population specific and in some cases vc type specific as seen in Bergamo, e.g., mutations 3 and 4, which are found in vc types EU-17 and EU-15, respectively. Finally, in simulations using Recom58, the recombination boundaries that we identified in Teano (positions 500 and 1200 in Fig 9) were found only in a subset of isolates from Bergamo (clade D; Fig 9). The existence of an additional clade in Bergamo (clade C) with a recombination boundary around position 1700 (not found in Teano) suggests that viral evolution has been driven by local dynamics in Teano and Bergamo.



View larger version (21K):
In this window
In a new window
Download PPT slide
 
Figure 7. Coalescent-based gene genealogy with the highest root probability (likelihood is 6.8590 x 10–15, SD = 2.9723 x 10–13) showing the distribution of mutations in Teano (T) and Bergamo (B) for the NC region of CHV-1. The inferred genealogy is based on 10 different starting random number seeds, using 5 million simulations of the coalescent for each run, a Watterson's estimate of {theta} = 2.0, and constant population sizes and growth rates. All runs converged to the same genealogy. The time scale is in coalescent units of effective population size. In the gene genealogy, the direction of divergence is from the top of the genealogy (oldest, i.e., the past) to the bottom (youngest, i.e., the present); coalescence is from the bottom (present) to the top (past). Since the gene genealogy is rooted, all the mutations and bifurcations are also time ordered from top to bottom. The numbers below the tree from top to bottom designate each distinct haplotype and its frequency (i.e., the number of occurrences of the haplotype in the sample), the frequency of each haplotype in Teano and Bergamo, and the vc type associated with each haplotype (see Table 4).



View larger version (27K):
In this window
In a new window
Download PPT slide
 
Figure 8. Coalescent-based gene genealogy with the highest root probability (likelihood is 1.9763 x 10–35, SD = 3.5786 x 10–33) showing the distribution of mutations and amino acid substitutions in Teano (T) and Bergamo (B) for the contiguous 3' ORF A and 5' ORF B loci (denoted collectively as ORF A/B) of CHV-1. The shaded circles designate replacement substitutions in 5' ORF B; the unshaded circles are for replacement substitutions in 3' ORF A. T and B in the circles denote mutations more likely to have arisen in Teano and Bergamo, respectively. Simulations were based on a Watterson's estimate of {theta} = 7.0. See Fig 7 for further explanation.



View larger version (25K):
In this window
In a new window
Download PPT slide
 
Figure 9. The average number of recombination events ("hits") per site in the consensus sequence for each clade based on coalescent analysis using Recom58, assuming recombination in the history of clades A, C, and D. The data were simulated assuming an infinite sites model with 1 million simulations of the coalescent, {theta} = 7.0, a starting recombination parameter {rho} = 0.5, and constant population sizes and growth rates. Ten different starting random number seeds were simulated under the same model and parameters; results were similar between runs. The simulated maximum likelihood estimates of {rho} for clades A, C, and D were 1.0333, 0.8820, and 1.6385, respectively. The magnitude of the recombination parameter, {rho}, is a measure of the likelihood of recombination over mutation at a specific site. In each of clades A, C, and D, sites with the most recombination hits correspond to putative recombination boundaries shown in Fig 4.

Vertical transmission of CHV-1 by sexual reproduction:
We assayed a total of 951 ascospore progeny from seven reciprocal crosses to test for vertical transmission of CHV-1; 629 of these were from eight perithecia produced by CHV-1-infected maternal isolates (Table 2). None of the ascospore progeny contained CHV-1, supporting the hypothesis that CHV-1 is not vertically transmitted in sexual reproduction.


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

The main objective of this research was to estimate migration of viruses between host genotypes as a means of estimating horizontal virus transmission in nature. Overall, we found evidence for unrestricted migration of CHV-1 between dominant vc types in two populations. In Teano, results from Hudson's tests and MDIV found no evidence of subdivision among vc types EU-10 and EU-12. This was corroborated using MIGRATE, which showed that migration was approximately equal in both directions, with slightly more migration from vc type EU-12 to vc type EU-10 than vice versa (Table 6). The ORF A/B gene genealogy (Fig 8) corroborated this result, showing a small asymmetry in migration in which vc type EU-10 was associated more frequently with the tip (i.e., more recently derived) haplotypes (17, 18, 19, 20, and 21) than with interior (i.e., ancestral) haplotypes (1 and 7). In contrast, EU-12 was associated more frequently with interior haplotypes (1, 2, and 3) than with tips (14). The gene genealogy based on the 5'-noncoding region was less informative because there were relatively fewer mutations in this region.

Similar to our analyses for Teano, results from Hudson's tests and MDIV in Bergamo found no evidence of subdivision among vc types EU-1 and EU-2. As above, this was corroborated using MIGRATE, which showed that migration between the dominant vc types was high and markedly asymmetric, with migration greater from EU-2 to EU-1 than vice versa (Table 6). Furthermore, the coalescent-based genealogy of ORF A/B (Fig 8) corroborates migration estimates from MIGRATE. The inferred ancestral haplotype 15, with no recently evolved mutations (Fig 8), is the interior-most haplotype in the gene genealogy and was isolated from vc type EU-2. Viral haplotypes associated with vc types EU-6, EU-15, and EU-17 are more recently evolved and share a common ancestor with viral haplotypes associated with EU-1. Consequently, the inferred order of haplotype divergence (for a representative of each vc type), from oldest to youngest, in Bergamo is 15, 9, (10, 13, 16); the haplotypes in parentheses diverged at approximately the same time.

To make inferences about horizontal transmission of viruses between vc types from migration analyses required that we test the assumption that viruses are not transmitted vertically to nonparental vc types during sexual reproduction; vertical transmission into recombinant vic genotypes would have the same effect on migration estimates as horizontal transmission. However, no viruses were transmitted into sexual offspring, even when both parents were virus infected (Table 2), supporting the few tests made previously (ANAGNOSTAKIS 1982 Down; ELLISTON 1985 Down). Therefore, migration of viruses between vc types must have occurred via horizontal transmission.

Approaching the study of virus transmission from the perspective of migration provides an alternative approach—resulting in different estimates—to laboratory studies on transmission. Migration estimates do not correlate well with the probability of transmission between vc types estimated in the laboratory (Table 6). For example, transmission between EU-10 and EU-12 in the laboratory is markedly asymmetric with greater transmission occurring from EU-10 to EU-12 (CORTESI et al. 2001 Down). However, migration between these vc types is approximately equal in both directions (Table 6). In Bergamo, the asymmetry in migration is opposite to the asymmetry in virus transmission (Table 6). Furthermore, the average probability of transmission between dominant vc types (ignoring asymmetries) is greater in Teano than in Bergamo, but migration estimates are much larger within Bergamo than in Teano (Table 6). At least two different factors make these approaches fundamentally different. First, the coalescent-based migration analysis and laboratory studies provide estimates of transmission on different time scales. The coalescent estimates migration over the entire genealogical history of a locus and can assume either a constant or a varying population size. Laboratory studies, in contrast, are based on finite numbers of brief interactions between individuals to obtain estimates of probabilities of transmission per contact (e.g., CORTESI et al. 2001 Down). The second factor distinguishing these two approaches is that laboratory conditions are highly artificial, in which transmission may differ from natural conditions because of environmental factors affecting fungal physiology and cellular interactions. Coalescent-based methods integrate the transmission of viruses under natural conditions, over long periods of time, taking into account the interaction of numerous host individuals.

There was little or no detectable migration of CHV-1 between Teano and Bergamo as inferred from our analyses using MDIV and MIGRATE. The results from RecPars (Fig 3) suggest that the CHV-1 in Teano and Bergamo share a common ancestor that predates population divergence. The geographic distribution of mutations in the coalescent also supports the hypothesis that Teano is the older population. This suggests that the mutational structure of the founding viral population can be reconstructed more accurately from strains in Teano than from those in Bergamo. This does not, however, imply that Teano is the founding population. C. parasitica was first found in Europe (near Genoa, in northern Italy) in 1938 (BIRAGHI 1946 Down). Assuming that CHV-1 was introduced with C. parasitica from Asia, these two populations were colonized within a relatively short period of time, either directly from Asia or from additional founder events within Italy. To test this hypothesis properly we would need to look at the same genomic regions in viral samples from Asia, the hypothesized centers of origin of the fungus and virus.

Genetic isolation and rare recombination in CHV-1 has the effect of creating distinct recombination blocks in evolutionarily older populations. Ancestral viral immigration to Bergamo initiated population subdivision such that the contemporary Teano and Bergamo populations are highly structured with restricted migration between them. Fungal subpopulations of C. parasitica in Teano and Bergamo are highly differentiated with respect to vic allele frequencies (MILGROOM and CORTESI 1999 Down), with 44% of variation at these loci attributable to differences between populations (M. G. MILGROOM, unpublished data). Because fungal viruses are entirely dependent on their hosts for dispersal, restricted migration of C. parasitica likely restricts migration of CHV-1 also.

Homologous recombination is common in RNA viruses and has been extensively studied in the related Picornaviridae and in the plant-infecting Bromoviridae (BUJARSKI and NAGY 1996 Down; AGOL 1997 Down). Although recombination may occur anywhere along the viral genome, there may be preferred sites on the genome at which RNA recombination is more frequent (e.g., BRUYERE et al. 2000 Down). Current understanding of CHV-1 replication is poor, and there is no published information on preferred sites for homologous RNA recombination. While recombination in viruses is interesting per se, detection and localization of recombination events to specific clades in the phylogeny was an essential prerequisite for estimating migration using coalescence. Recombination blocks can extend over long stretches of nucleic acid and have been reported as a common phenomenon in the haplotype structure of fungi (CARBONE and KOHN 2001A Down, CARBONE and KOHN 2001B Down) and the human genome (DALY et al. 2001 Down; GABRIEL et al. 2002 Down; STUMPF 2002 Down). Recombination detected within ORF B of CHV-1 precluded its use for estimating migration. While it would have been possible to analyze smaller nonrecombining blocks within ORF B, these smaller partitions have too few characters for phylogenetic inference.

In addition to examining the distribution of migration, divergence, and mutation events in the ancestral histories of CHV-1 in Teano and Bergamo, the gene genealogies based on the coalescent allow us to examine patterns of divergence at the amino acid level in ORF A/B. Although the mutation rates are constant in 3' ORF A and 5' ORF B (see estimates of {theta} and Tajima's D in Table 5), the ratio of nonsynonymous to synonymous substitutions is different between these two loci. In 3' ORF A, r/s = 11/13, suggesting selective neutrality, whereas in 5' ORF B r/s = 8/18, suggesting negative selection. Interestingly, the replacement substitutions in 5' ORF B (shaded circles in Fig 8) evolved more recently than replacement substitutions in 3' ORF A (unshaded circles in Fig 8). This suggests that negative selection may play an important role, in addition to drift, in shaping contemporary viral evolution in nature (also recently shown as an important diversifying mechanism in HIV-1 by KILS-HUTTEN et al. 2001 Down). Perhaps this represents a mechanism that allows invading haplotypes to adapt to a new environment or resident haplotypes to adapt to a changing environment. These patterns can also be explained using a neutral mutation hypothesis whereby deleterious or beneficial mutations arise spontaneously and then either are purged or become fixed in the population.

Genealogical analysis has proven to be a powerful tool in this study for providing insight into the transmission dynamics of a virus in two host populations. An important advantage of estimating migration rates using the coalescent is that the accuracy of migration estimates is limited only by the number of informative characters for phylogenetic reconstruction. In this study, the coalescent provides a good approximation of the ancestral history even though the sample is small and may fail to capture the full range of genetic heterogeneity. Coalescent methods are powerful because they go beyond variation that is present in the sample to infer missing or ancestral genotypes in the population. At present, coalescent models have been developed that can accommodate the mutation process with or without migration (Genetree), recombination (Recom58), migration and recombination (LAMARC), or selection (NEUHAUSER and KRONE 1997 Down) but not a combination of all these forces occurring simultaneously. The isolation of Teano and Bergamo coupled with viral recombination, purifying selection, and unrestricted migration among vc types within populations cannot be examined in a holistic fashion using the available coalescent models. LAMARC, for example, assumes equilibrium migration and recombination throughout the entire coalescent history of the sample of DNA sequences. An appropriate model would need to distinguish between shared ancestral polymorphisms, migration, recombination, and selection on several different temporal and spatial scales: among haplotypes, clades, and populations. A future challenge will be the development of multilocus coalescent models that can deal with recombination, migration, and selection, simultaneously. On the basis of the viral dynamics elucidated in this study, these models will be key in reconstructing the evolutionary dynamics of viral populations.


*  FOOTNOTES

Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos. AY125727, AY125728, AY125729, AY125730, AY125731, AY125732, AY125733, AY125734, AY125735, AY125736, AY125737, AY125738, AY125739, AY125740, AY125741, AY125742, AY125743, AY125744, AY125745, AY125746, AY125747, AY125748, AY125749, AY125750, AY125751, AY125752, AY125753, AY125754, AY125755, AY125756, AY125757, AY125758, AY125759, AY125760, AY125761, AY125762, AY125763, AY125764, AY125765, AY125766, AY125767, AY125768, AY125769, AY125770, AY125771, AY125772, AY125773, AY125774, AY125775, AY125776, AY125777, AY125778, AY125779, AY125780, AY125781, AY125782,