Originally published as Genetics Published Articles Ahead of Print on April 19, 2006.

Genetics, Vol. 173, 1705-1723, July 2006, Copyright © 2006
doi:10.1534/genetics.105.054502

Estimating the Contribution of Mutation, Recombination and Gene Conversion in the Generation of Haplotypic Diversity

Department of Ecology and Evolutionary Biology, University of California, Irvine, California 92697

1 Corresponding author: Department of Ecology and Evolutionary Biology, 321 Steinhaus Hall, University of California, Irvine, CA 92697.
E-mail: mclegg{at}uci.edu

Manuscript received December 16, 2005. Accepted for publication April 11, 2006.

ABSTRACT

Recombination occurs through both homologous crossing over and homologous gene conversion during meiosis. The contribution of recombination relative to mutation is expected to be dramatically reduced in inbreeding organisms. We report coalescent-based estimates of the recombination parameter ({rho}) relative to estimates of the mutation parameter ({theta}) for 18 genes from the highly self-fertilizing grass, wild barley, Hordeum vulgare ssp. spontaneum. Estimates of {rho}/{theta} are much greater than expected, with a mean Formula/Formula {approx} 1.5, similar to estimates from outcrossing species. We also estimate Formula with and without the contribution of gene conversion. Genotyping errors can mimic the effect of gene conversion, upwardly biasing estimates of the role of conversion. Thus we report a novel method for identifying genotyping errors in nucleotide sequence data sets. We show that there is evidence for gene conversion in many large nucleotide sequence data sets including our data that have been purged of all detectable sequencing errors and in data sets from Drosophila melanogaster, D. simulans, and Zea mays. In total, 13 of 27 loci show evidence of gene conversion. For these loci, gene conversion is estimated to contribute an average of twice as much as crossing over to total recombination.


THERE are two sources of genetic diversity, mutation and recombination. Mutation, broadly defined here as novel heritable change in nucleotide state, introduces new variants while recombination reassorts the variants along a chromosome into novel combinations or haplotypes. Recombination can occur through both homologous crossover and homologous (intralocus) gene conversion, processes that occur as part of meiosis in diploid (or higher ploidy) organisms (WIUF and HEIN 2000). Under the Holliday junction model (HOLLIDAY 1964), homologous gene conversion is thought to occur when only a short tract of the alternate chromosome (usually a few hundred base pairs) is incorporated during meiotic exchange (e.g., STAHL 1994).

Inbreeding dramatically reduces the role of recombination. Recurrent inbreeding can rapidly increase homozygosity; the recombination process continues to exchange chromosomal segments during gamete formation but with little effective recombination of mutations. Thus the primary impact of inbreeding is expected to be a reduction of the contribution of recombination, relative to mutation, to total genetic diversity.

Under coalescent theory and assuming a standard neutral model, the impact of inbreeding can be measured as a reduction in the ratio of the recombination parameter {rho} to the mutation parameter {theta}, i.e., {rho}/{theta} (where {rho} = 4Ner and {theta} = 4Neµ and where Ne is the effective population size, r is the rate of recombination, and µ is the rate of mutation) (symbols used are listed in Table 1). It is predicted that both {rho} and {theta} are reduced by inbreeding, but the impact on recombination is expected to be much greater (NORDBORG 2000). NORDBORG (2000) showed that {rho} is predicted to be reduced under partial self-fertilization based on the relationship {rho}s = {rho}(1 – s), where s is the selfing rate; {theta} will be affected as {theta}s = {theta}/(1 + (s/(2 – s))). As inbreeding approaches maximal values, i.e., 98–99%, the value of {rho} is reduced by 40- to 50-fold, while {theta} is reduced by only 2-fold relative to that expected under outcrossing.


View this table:
In this window
In a new window

 
TABLE 1

Symbols and abbreviations used

 
The relative roles of gene conversion and crossing over are important because they influence the degree of the association among segregating sites, particularly at the intragenic level. The gene conversion process results in the exchange of small tracts of a chromosome, creating a mosaic sequence. At the population level, gene conversion interrupts linkage disequilibrium (LD, the association among segregating sites) in a very localized manner while long-distance LD remains largely unaffected (ANDOLFATTO and NORDBORG 1998). This can result in a reduction in LD among closely linked markers, while flanking markers remain in complete association. Thus the relative role of gene conversion is a topic of considerable practical importance. For example, the density of the mapped polymorphic sites needed for disease association studies in humans and for marker-assisted selection in crops and domesticated animals is dependent on the degree to which extrapolations from larger-scale estimates of LD predict the degree of association between genetic markers and causative mutations (PTAK et al. 2004).

We focus on the impact of recombination in wild barley (Hordeum vulgare ssp. spontaneum), a species with an estimated selfing rate of 98.4% (BROWN et al. 1978). We estimate the relative contribution of recombination and mutation ({rho}/{theta}) on the basis of nucleotide sequence-level diversity. We also examine the relative contributions of gene conversion and crossing over to estimated levels of recombination.

There are a number of methods for estimation of the recombination parameter ({rho}) from nucleotide sequence polymorphism data. Most methods rely on a standard model of recombination that includes the assumptions that recombination results from homologous crossover events during normal meiosis and that the recombination rate per base pair is constant with the probability of recombination proportional to the distance between sites. For the majority of estimators the population history is assumed to conform to a coalescent model with recombination (HUDSON 1990; GRIFFITHS and MARJORAM 1996). Many methods assume that samples are drawn from a large and panmictic population of constant size, evolving under neutrality (reviewed in FEARNHEAD and DONNELLY 2002; STUMPF and MCVEAN 2003). The infinite-sites model of mutation is also often assumed (each mutation affects a unique site).

Estimation of the population recombination parameter {rho} = 4Ner is challenging. When based on nucleotide sequence polymorphism, the estimated value is always a product of the effective population size and rate of recombination. Methods for estimating {rho} from nucleotide sequence include product moment estimators (HUDSON 1987; HEY and WAKELEY 1997), "composite-likelihood" methods that result from a product of coalescent likelihoods for a series of two-site or three-site configurations (HUDSON 2001; WALL 2004), "approximate-likelihood" methods that combine summary statistics from the data with estimated histories with recombination (WALL 2000), and "full-likelihood" methods (GRIFFITHS and MARJORAM 1996; KUHNER et al. 2000) that attempt to fit parameter estimates to the estimated underlying coalescent history with recombination (reviewed in STUMPF and MCVEAN 2003).

We also estimate the relative contribution of gene conversion ({rho}g) and crossing over ({rho}c) to total recombination (f = {rho}g/{rho}c) (FRISSE et al. 2001), using a composite-likelihood estimator (HUDSON 2001) and a method that matches patterns of nucleotide sites with coalescent simulations of gene conversion (PADHUKASAHASRAM et al. 2004). We compare these methods using an empirical data set from 18 loci sequenced from a common sample of 25 wild barley accessions (MORRELL et al. 2005) and population genetic data sets from Zea mays (maize), Drosophila melanogaster, D. pseudoobscura, and D. simulans.


METHODS

Sequence data:

Sequence diversity from 18 loci for 25 wild barley individuals from across the species' range has been reported previously (CUMMINGS and CLEGG 1998; LIN et al. 2001, 2002; MORRELL et al. 2003, 2005). The sequences are fully resolved haplotypes with a minimum quality criterion of a phred score ≥20 for both the forward and the reverse strands. Singleton mutations were confirmed through a second PCR amplification and resequencing of both the forward and the reverse strands. The total data set includes 678 segregating sites, 420 of which are parsimony informative. Five sites (0.74%) have more than two nucleotide states; there are 4 sites with three nucleotide states and a single site with four states. Detailed methods for sequencing and sequence assembly are included in MORRELL et al. (2003). Diversity statistics and the levels of LD within and between loci are reported in MORRELL et al. (2005). Two abutting portions of the Pepc locus were sequenced separately (MORRELL et al. 2003, 2005), but in a combined length of 3173 bp contain only four parsimony-informative segregating sites and are treated here as a single locus, referred to as PepcC.

In addition to data from the wild barley loci, we have analyzed additional nucleotide sequence data sets to assess the relative role and extent of evidence for gene conversion.

Estimates of the role of recombination, particularly the relative role of gene conversion, depend on sampling a relatively large number of segregating sites. To infer the role of gene conversion using the pattern-matching methods of PADHUKASAHASRAM et al. (2004) we focus on published nucleotide sequence data sets ≥1000 bp aligned length, with ≥20 sampled chromosomes and ≥20 parsimony-informative segregating sites, at least two detected recombination events (see below), and minimal missing data. Data from seven of the wild barley loci we have sequenced meet these criteria. We also considered sequence data from all of the 98 D. melanogaster loci compiled into a single list by PRESGRAVES (2005). This resulted in inclusion of data from 10 loci from multiple populations of D. melanogaster (BEGUN and AQUADRO 1995; HARR et al. 2002; ZUROVCOVA and AYALA 2002; RILEY et al. 2003; BALAKIREV and AYALA 2004a,b; DUMONT et al. 2004), 1 locus from multiple populations of D. pseudoobscura (SCHAEFFER and MILLER 1992), 2 loci from multiple populations of D. simulans (DUMONT et al. 2004), 4 loci from cultivated maize (TENAILLON et al. 2001), 1 locus from both maize and its wild progenitor teosinte (Z. mays ssp. mays and ssp. parviglumis) (BOMBLIES and DOEBLEY 2005), and 3 loci from a separate sample of wild barley (CALDWELL et al. 2005). Descriptive statistics for all sampled loci are in Table 2 .


View this table:
In this window
In a new window

 
TABLE 2

Descriptive statistics and estimates of nucleotide sequence diversity for a common set of 25 samples at 18 loci in wild barley

 

Estimating the number of recombination events:

To estimate the number of recombination events in a data set, we employed five estimators that vary in the algorithm they use to detect recombination. The estimators Rm, Rh, Rs, Rl, and Ru were calculated using the programs RecMin (MYERS and GRIFFITHS 2003) (to estimate Rm, Rh, and Rs), HapBound (to estimate Rl), and shrub (to estimate Ru) (SONG et al. 2005) (see supplemental material at http://www.genetics.org/supplemental/ for links to all software used). The estimators use distinct methods for calculating a minimum number of recombination events for a data set and are related such that Rm ≤ Rh ≤ Rs ≤ Rl ≤ Ru (SONG et al. 2005). The Rm estimate is based on the four-gamete test. For any pair of nucleotide sites, only three configurations (represented in binary form as 00, 01, 10) are possible on the basis of unique mutations (HUDSON and KAPLAN 1985). Producing all four possible gametic combinations requires either recombination or a second mutation of one of the nucleotide sites. When the probability of recurrent mutation is low (i.e., the data are consistent with the infinite-sites model) algorithms can be used to process the results of the four-gamete tests and provide the minimum number of nonoverlapping intervals involved in recombination. Rh is calculated on the basis of the difference (hS – 1) between the number of observed haplotypes (h) in the sample and the number of segregating sites (S). Rs uses a simplified approximation of the sample history such that any true history of the data would include a larger number of recombination events. Rl and Ru are lower and upper bounds on the minimum number of recombination events required to reconstruct an evolutionary history compatible with the sequence. Ru is computed relative to an ancestral recombination graph (ARG) compatible with the data (SONG et al. 2005). The input for each of the estimators is the segregating sites from the nucleotide sequence data set encoded as binary characters. In this study, the minor allele state was represented as 1 and the majority allele as 0. RecMin input can include sites with missing data; thus we have treated as missing the third state at sites with more than two nucleotide states and segregating sites within indels. These sites must be excluded in HapBound and shrub input. Haplotype configurations for 18 wild barley loci for all parsimony-informative sites are presented in MORRELL et al. (2005).

Estimating the population recombination rate:

The methods discussed above focus on the number of recombination events observable within a sequenced region. Parameterizing recombination in terms of {rho} = 4Ner permits an evaluation of the per base pair input of recombination, in terms of the rearranging of mutations, throughout the coalescent history of the sampled population. Parametric estimates of {rho} also provide a useful comparison to estimates of {theta} = 4Neµ in that they describe the relative importance of recombination and mutation in the history of the organism.

Estimates of {rho} for each locus in our wild barley data set were calculated using seven different estimators. This permits a comparison of estimators using a common set of samples across a set of loci with very different numbers of informative mutations and recombination events (Table 2). Thus we briefly examine the utility of estimators across loci and the variance among estimators for each sampled locus.

We used the programs maxhap and LDHat for the composite-likelihood-based estimates FormulaH01 (HUDSON 2001) and FormulaMAF02 (MCVEAN et al. 2002), mss_conv for the summary-statistic-likelihood estimate FormulaW00 (WALL 2000), rhothetapost for a summary-statistic-based Bayesian estimator with rejection-sampling algorithm for FormulaT05 (HADDRILL et al. 2005), rholike and sequenceLD for the approximate- or "marginal"-likelihood estimates FormulaLS03 (LI and STEPHENS 2003), and FormulaFD02 (FEARNHEAD and DONNELLY 2002) and Lamarc for the full-likelihood estimate FormulaLamarc (KUHNER et al. 2000, 2002). Because low-frequency mutations necessarily occur in only a minimal number of haplotype configurations, they are less informative as to the extent of recombination. In this study, for methods that apply a frequency filter, only mutations that occurred at least twice in the sample (i.e., those that are "parsimony informative") are considered. A number of methods permit the use of either an infinite-sites model or a specific nucleotide substitution model. All analyses reported here have assumed an infinite-sites model unless otherwise specified.

The composite-likelihood estimator FormulaH01 of HUDSON (2001) considers the frequency of each of the two-site haplotypes (00, 01, 10, 11) for each pair of sites. The method uses a simulation of the neutral coalescent to identify values of {rho} compatible with the observed frequencies for pairs of sites. The composite likelihood is the product of the likelihoods for each {rho}-value across pairs of sites. We have used lookup tables where likelihood values have been precalculated (HUDSON 2001) (see supplemental material at http://www.genetics.org/supplemental/). The maxhap software, used for composite-likelihood estimation, can estimate Formula with or without a simultaneous estimate of Formula, the relative contribution of gene conversion.

The composite-likelihood method FormulaMAF02 of MCVEAN et al. (2002) differs from the HUDSON (2001) method in that likelihood tables are generated using the sample size and values of {theta} that match estimates for the locus being evaluated, rather than a grid of {rho}-values for a given sample size. We have generated likelihood tables on the basis of WATTERSON's (1975) {theta}-estimate (FormulaW) for each locus as this approach may improve the accuracy of the composite-likelihood method (MCVEAN et al. 2002).

The summary statistic method FormulaW00 of WALL (2000) uses a simulation of the neutral coalescent process to find a value of {rho} that maximizes the proportion of simulations that match the observed number of haplotypes (h) and the number of recombination events (Rm) in a chromosomal segment from a sample of individuals. Inputs into the simulation include the number of segregating sites (S), the length of the region (l), and the number of chromosomes sampled (n). For a diploid, outcrossing organism, n is two times the number of individuals sampled. For wild barley, which is >98% self-fertilizing, the sample more closely approximates a haploid sample, and thus we treat n as the actual number of unique sequences observed at each locus. This number can slightly exceed the 25 individuals sampled due to occasional heterozygous individuals in the sample (MORRELL et al. 2005).

The summary statistic method FormulaT05 of Thornton (HADDRILL et al. 2005; THORNTON and ANDOLFATTO 2006) combines the summary of the data used by WALL (2000) and the Rh relationship described above (MYERS and GRIFFITHS 2003) with a rejection-sampling algorithm to produce a series of independent, joint estimates of Formula and Formula. The method provides a simple means to estimate confidence intervals for Formula, Formula, and Formula/Formula (HADDRILL et al. 2005). We plotted the estimated posterior distribution of Formula and Formula from an initial round of analysis for each locus to assure that posterior estimates were not bounded by the priors. When the distribution of posterior values appeared to be constrained by the priors, priors were adjusted to avoid problems with the boundary and the analysis was rerun. Priors for the second round were the 0.01 and 0.99 percentile values of the estimated posterior distribution from the initial round. Point estimates used to summarize the posterior distributions of FormulaT05 and FormulaT05 are the maximum a posteriori estimates and confidence intervals are defined by the 0.025 and 0.975 percentiles.

The approximate-likelihood method FormulaFD02 of FEARNHEAD and DONNELLY (2002) uses a list of observed haplotype configurations [defined by parsimony-informative (nonsingleton) sites (Sp)] with l and n for each locus to produce a joint estimate of FormulaFD02 and FormulaFD02. As with the WALL (2000) method, the value of n we have used is the actual number of unique sequences observed at each locus. For each round of analysis we used 200,000 runs with four driving values (values at which the search is initiated) for both {rho} and {theta}. Driving values and limits on {rho} and {theta} were adjusted after an initial round of analysis, and the estimator was run a second time. The likelihood surface for each value of {rho} and {theta} was calculated on the basis of 251 values of {rho} and 3 values of {theta}.

The conditional probabilities method FormulaLS03 of LI and STEPHENS (2003) is based on a model of linkage disequilibrium where the probability of observing a particular set of haplotypes is evaluated across values of {rho}. The conditional probabilities represent the probability of observing each haplotype, given all previously observed haplotypes and given a {rho}-value. The method of estimation is referred to as "product of approximate conditionals" (PAC) likelihood. Because the order of the observed haplotypes is important, LPAC is averaged over several random orders of the haplotypes (we used the default of 10 random orders). The method does not assume an infinite-sites mutation model. The approximate conditional probabilities consider haplotypes as a unit, differing from the composite-likelihood method in which sites are considered on a pairwise basis.

Kuhner's full-likelihood method FormulaLamarc implemented in Lamarc (KUHNER et al. 2000, 2002) estimates coalescent histories with recombination compatible with input data and then estimates parameter values compatible with the genealogy. We used as input full-length sequence alignments, treating all samples for each locus as a single population, and estimated FormulaLamarc and FormulaLamarc, the per generation rate of recombination, for each locus. Program setup and search strategy are similar to that reported in MORRELL et al. (2003), including the use of the Felsenstein 1984 nucleotide substitution model (KISHINO and HASEGAWA 1989; SWOFFORD et al. 1996) (rather than an infinite-sites model) and empirical base frequencies and transition/transversion ratios. Results of an initial analysis using FormulaW and FormulaLamarc = 0.5 were used as starting values of a second round of analysis with 20 initial chains of 1000 and four final chains of 20,000 genealogies with 2000 genealogies discarded per chain. Adaptive heating was used to improve the search of parameter space. Finally, start parameters from the second-round analysis were plugged into a third round of analysis. Results of the third-round analysis are reported.

Estimating the role of gene conversion:

Estimating gene conversion from nucleotide sequence data is difficult in part because estimation involves four unknowns, {theta}, {rho}, f (the proportion of gene conversion events relative to crossover events), and L (the conversion tract length) (PTAK et al. 2004). Two primary methods of estimating the parameter f have been reported: one method jointly estimates {rho} and f using an extension of the composite-likelihood approach (FRISSE et al. 2001; HUDSON 2001; WALL 2004); a second method matches patterns of nucleotide sites that show evidence of recombination with values of {rho} and f using coalescent simulations (PADHUKASAHASRAM et al. 2004), referred to here as Formula. Previous studies have emphasized that because the distance among sampled loci almost always exceeds likely conversion tract length, the relative roles of gene conversion and crossover can be inferred subtractively from multilocus data (ANDOLFATTO and NORDBORG 1998; PTAK et al. 2004; WALL 2004; PLAGNOL et al. 2006). However, genotyping errors tend to upwardly bias Formula, causing an overestimate of the role of gene conversion (PTAK et al. 2004; WALL 2004), and the issue of typing errors is not remedied by multilocus estimation of f. Thus our focus is on inferring the role of gene conversion within individual loci and, when possible, utilizing data that has been rigorously purged of all detectable genotyping errors.

The composite-likelihood estimator program maxhap uses a lookup table that permits rapid estimation of {rho} and f. However, composite-likelihood estimators can have a high root mean square error (WALL 2004; SMITH and FEARNHEAD 2005). We estimate FormulaH01 and Formula for all sampled loci using maxhap. We also explore the utility of maxhap estimates using coalescent simulations with parameter estimates based on the wild barley empirical data. Specifically, we asked, what is the minimum contribution of gene conversion (or the minimum value of f > 0) that can be detected with the two-site composite-likelihood method with 95% confidence? We then asked, when simulations are generated without any gene conversion, what is the probability of estimating Formula > 0? The simulations were performed across a dense grid of values, with 10,000 replications per grid point with simulation output sent directly to the composite-likelihood estimator software maxhap through the mstoexhap (THORNTON 2003) and exhap utilities. Sample size, the length of regions simulated, and parameter values used in the simulation were chosen to reflect mean values from the wild barley empirical data; thus, simulations were based on l = 1500 bp of sequence from n = 25 individuals, with {theta} = 8 x 10–3/bp, and {rho} = 8 x 10–3/bp for simulations with no gene conversion and then with {rho} decreased in proportion to increasing values of f, with f from 0.01 to 7 with nine values between 0 and 2 and thereafter increasing by increments of 0.5. For the simulations without gene conversion we used a grid of {rho}-values that spanned the range of empirical values estimated from the wild barley loci, i.e., {rho} from 0 to 0.032 (including 0.0001, 0.0002, and then increasing from 0.001 by a factor of 2), using tract lengths L = 250 and 500 bp.

PADHUKASAHASRAM et al. (2004) defined descriptive statistics designed to estimate the role of gene conversion. The first summary statistic is the frequency of "pattern a," where a set of three parsimony-informative segregating sites designated sites A, B, and C includes external sites (A and C) compatible with the four-gamete test; i.e., three or fewer configurations are present, but where each of the external sites is incompatible with the internal site (A and B, B and C) based on the four-gamete test; that is, all four states are present (Figure 1). Statistics were also defined for evaluating four-site configurations, where we can designate segregating sites A, B, C, and D (Figure 1). For four-site configurations, "pattern b" and "pattern d" were defined. In pattern b, both the outer pair of sites (A and D) and the inner pair of sites (B and C) are incompatible (all four pairwise states are present). In pattern d the outer pair of sites (A and D) and the inner pair of sites (B and C) are compatible pairs, but there is incompatibility between the two outer sites and their corresponding adjacent inner site (A and B, C and D). Both patterns a and d imply that either a gene conversion event or a double recombination has effectively replaced a tract of the chromosome that included the internal site(s).


Figure 1
View larger version (7K):
In this window
In a new window
Download PPT slide
 
FIGURE 1.—

Patterns a, b, and d depend on the absence of all four gametic configurations between sites indicated by brackets, but the presence of four gametes between sites indicated by curved arrows. In patterns a and d, the sites indicated in red have been subject to either double recombination or gene conversion.

 
The proportions of patterns a, b, and d for the empirical data were considered by comparing them to those observed in coalescent simulations. Simulated data reflecting n, S, and l from the empirical data for each locus were generated using the program ms (HUDSON 2002). With S used as a proxy for {theta} and tract lengths (L) held constant, simulations can explore a grid of {rho}- and f-values. Initial values of {rho} and f within the simulations were based on estimates from the HUDSON (2001) two-site likelihood method described above; values of L = 250 and 500 were used. These values bracket the estimate of L = 352 from D. melanogaster (HILLIKER et al. 1994).

Coalescent simulations with proportions of patterns a, b, and d within 20% of that observed in the empirical data were accepted; the proportion of accepted simulations for each set of simulation parameters was then determined for pattern a and for simultaneous acceptance based on both patterns b and d. The product of these two proportions is referred to as the likelihood of the given simulation parameters.

All analyses were performed using single nodes of the Linux cluster at the Bioinformatics Core facility at the University of California, Riverside.

Genotyping errors and gene conversion:

Because triplets and quadruplets in patterns a and d are based on the incompatibility of the internal site or sites with flanking sequence, genotyping errors can generate the same pattern as a conversion event. Base call errors, particularly those arising from the failure to detect heterozygous sites within an individual, can potentially be identified by examining the frequency of each of the haplotypic classes (00, 01, 10, 11) for the site AB and BC comparisons in triplets of sites (see RESULTS).


RESULTS

Genotyping errors:

Examination of the triplets of nucleotide sites inferred from our wild barley nucleotide sequence data demonstrated that for some loci, a relatively small number of segregating sites and a relatively small number of individuals from each sequencing panel contributed the majority of pattern a triplets. For each of the outer to inner site comparisons (i.e., sites AB and BC) in a triplet, the rarest of haplotypic classes is the most direct single source of typing error. Samples that are heterozygous at a locus but are incorrectly represented as a single haplotype can result in triplets and quadruplets of sites that mimic the effects of gene conversion or double crossover and thus represent a problematic source of typing error. In a manner analogous to error detection in genetic mapping algorithms (LINCOLN and LANDER 1992) examination of site frequencies between pairs of sites can identify individual samples and nucleotide sites that lead to the inference of double crossover events. Correcting typing errors can dramatically improve recombination rate estimates (LINCOLN and LANDER 1992).

For the 18 wild barley loci in MORRELL et al. (2005), original sequence traces were available for reexamination. Base calls at each site in each sequence that contributed the rarest gametic class (e.g., 01) for the outer sites in each triplet were reexamined. All sites in the panel had been sequenced with a minimum phred quality of ≥20 for forward and reverse sequence reads. For the vast majority of sites, the base calls from the original data set submitted to GenBank were confirmed and thus the triplet was accepted as valid. For example, all triplets at the Dhn4 locus involve a segregating site at bp 114. The critical two-site haplotype occurs in sample 06 (GenBank no. AY895883). All quadruplets for Dhn4 include bp 992 as the last segregating site in the quadruplet, on the basis of a gametic type again found only in sample 06. Thus in a manner similar to the handling of singleton confirmation in population genetic studies, this sample was reamplified and resequenced using all available primers on both the forward and reverse strands; the original nucleotide states at both of the sites were confirmed, and base calls for sites segregating within the population did not indicate the presence of more than one allele (i.e., there is no evidence that the individual was a heterozygote at this locus).

The targeted examination of base calls (in the original trace files) that contributed the least frequent gametic class for pattern a triplets at other wild barley loci revealed heterozygous individuals that were not previously detected by screening with PolyPhred or by visual inspection. Heterozygous individuals were identified at five loci including samples 04 and 28 at Cbf3, 28 at Dhn1, 12 at Dhn5, 36 at Dhn7, and 12 at Waxy (see Table 3). The phase of mutations was resolved experimentally, using a combination of cloning and allele-specific PCR. Examination of the sequence traces from individuals that were newly detected as heterozygotes at a locus revealed that many of the base calls at segregating sites that differentiate the two parental chromosomes did not show equal amplification of the PCR products from each chromosome. Several sequencing primers produced sequence reads from the PCR product of only one of the two parental chromosomes. Unequal amplification of initial PCR products was also evident; clones of Waxy sample 12 were biased 15:1 for one of the parental haplotypes. The two haplotypes at Waxy sample 12 were ultimately confirmed on the basis of the direct sequencing of the products of allele-specific PCR.


View this table:
In this window
In a new window

 
TABLE 3

The number of heterozygotes detected at wild barley loci and the impact of newly detected heterozygotes on descriptive statistics and parameter estimates

 
In general, the resolution of heterozygotes reduced the evidence for recombination in the data sets (Table 3). For example, for Dhn7 this resulted in a change from Rm = 9 in the original data set to an Rm = 6 after error checking and experimental resolution of haplotypes. Estimates of {theta} for the locus were reduced slightly, with a reduction of 16.5% for FormulaW and 6.7% for Formula{pi}. Estimates of Formula showed a more dramatic decrease with FormulaH01 reduced by 27.5% and FormulaW00 reduced by 39.19%. Experimental resolution of typing errors also tends to reduce the proportions of patterns a, b, and d in the data set (see Table 3). In the extreme case, the original Cbf3 data set had 1.4% of possible triplets in pattern a, but with errors in phasing corrected for three heterozygotes, no pattern a triplets were present.

Recombination events:

All but four wild barley loci (Adh1, {alpha}-Amy1, Faldh, and PepcC) show evidence of recombination on the basis of the four-gamete test (HUDSON and KAPLAN 1985); i.e., Rm > 0. In loci where recombination was detected, Rm varies from 1 to 7, with the largest number of recombination events evident in Dhn1, Dhn7, and Waxy (Table 2). For the four loci where Rm = 0, the Rh and Rs estimates also did not show any evidence of recombination. Rl and Ru also report no evidence of recombination in loci with Rm = 0 with one exception, the Faldh locus, where Rl and Ru = 1. For loci with Rm > 0, both Rh and Rs ranged from 1 to 12 (Table 2). Values of Rl for wild barley ranged from 0 to 13; Ru had a maximum of 17. In Figure 2, an ARG generated by the Ru estimator depicts the three recombination events inferred at a typical locus (Stk) from the wild barley data set. The Drosophila and Z. mays loci were chosen for inclusion in the study because they were likely to have a sufficient number of recombination events to infer the role of gene conversion. For these loci, Ru-values are as large as 35 in Idgf1 from D. melanogaster and 49 in Zfl2 from Z. mays. For some loci, e.g., vermilion from the D. melanogaster locus, the Rh estimate is larger than the Rs estimate because the RecMin software can make use of more of the polymorphism data by considering sites that are segregating in alignment gaps.


Figure 2
View larger version (19K):
In this window
In a new window
Download PPT slide
 
FIGURE 2.—

An ancestral recombination graph (ARG) for the wild barley locus Stk is shown. Coalescent events are shown as open circles, sampled haplotypes are shown as solid circles, and recombination events are shown as larger red circles. The positions of segregating sites that are on the boundaries of recombination events are shown next to each recombination. Solid colors for haplotypes represent the major portions of the geographic range, or wild barley, previously identified as the Western (blue), Zagros (green), and Eastern (yellow) regions (MORRELL et al. 2003).

 

Estimates of {rho}:

Estimated rates of recombination per base pair for each of the wild barley loci are shown in Table 4 and Figure 3. A nonparametric Friedman rank sum test considering the estimation method as the treatment is significant (P = 8 x 10–4), rejecting the null hypothesis that there is no systematic difference in the estimators. The mean value of Formula varies almost threefold among the estimators, ranging from 4.33 to 12.48 x 10–3. While the mean estimates from FormulaH01, FormulaMAF02, FormulaW00, FormulaLS03, Formula, and FormulaT05 are relatively similar, the much larger average {rho} estimate from FormulaFD04 results primarily from Formula ≥ 24 x 10–3 for three loci, Dhn1, Dhn7, and Waxy. Values of FormulaLamarc, FormulaFD04, and FormulaT05 are coestimated along with {theta} (Figure 3). The values of FormulaT05 are similar to estimates that were not coestimated, i.e., FormulaH01, FormulaLS03, and FormulaW00. However, FormulaLamarc and FormulaFD04 produce very different estimates of {rho}, with much of the difference attributable to Formula and Formula for Dhn1, Dhn7, and Waxy mentioned above (Table 4). While the three loci have FormulaFD04 > 24 x 10–3, FormulaLamarc estimates are all ≤16 x 10–3, with the largest difference among estimates at Dhn1, where FormulaLamarc = 5.14 x 10–3, but FormulaFD04 = 46.55 x 10–3. The estimate of FormulaLamarc for the locus is 38.68 x 10–3 while FormulaFD04 = 17.88 x 10–3. This difference is consistent with average values of Formula and Formula for the two methods (Figure 3). The Lamarc estimator appears to attribute much more of the total diversity to mutation; the average value from FormulaLamarc is only 35% of FormulaFD04 and FormulaLamarc is 26% larger than FormulaFD04.


View this table:
In this window
In a new window

 
TABLE 4

Estimates of Formula and three coestimated values of Formula (x10–3) for a common set of 25 samples at 18 loci in wild barley

 

Figure 3
Figure 3
View larger version (31K):
In this window
In a new window
Download PPT slide
 
FIGURE 3.—

Comparison of Formula and Formula for 17 wild barley loci. (a) The estimators coestimate Formula and Formula and the values for each locus are paired. (b) Separate estimates of Formula and Formula. Point estimates are shown as circles and when 95% confidence intervals could be estimated, they are indicated by gray lines. Loci are presented in the same order as in Figure 4. From top to bottom, the loci are Vrn1, Waxy, Adh2, Dhn1, Cbf3, Dhn7, Dhn4, Dhn5, Dhn9, Adh1, Stk, ORF1, Faldh, G3pdh, Adh3, PepcC, and {alpha}-amy1.

 

Figure 4
View larger version (17K):
In this window
In a new window
Download PPT slide
 
FIGURE 4.—

Estimates of FormulaT05/ FormulaT05 and 95% confidence intervals for all wild barley loci. For each locus, the point of FormulaT05/ FormulaT05 is shown as a circle, and bounds of the upper and lower 95% confidence intervals are indicated by gray vertical lines.

 
Despite the differences among estimators, ranks of {rho} estimates, analyzed for all seven estimators while considering the locus as the treatment in the Friedman rank sum test, distinguish between the levels of recombination for the 17 loci (P = 2.4 x 10–11). The null hypothesis that all loci have the same Formula-value is rejected. Within any given estimation method, the estimates of recombination per base pair vary dramatically among loci; for example, the values for FormulaH01 varied from 0 to 36.08 x 10–3/bp (Table 4).

Estimates of {rho}/{theta}:

Four estimates of {rho}/{theta} and the corresponding estimate from Lamarc, FormulaLamarc, for each of the wild barley loci are shown in Table 5. The mean estimate of Formula/Formula for wild barley varies among estimators from 0.90 to 1.93. Values for FormulaLamarc for each locus were generally smaller than Formula/Formula and are dramatically lower for loci with Formula/Formula > 1. For example the Waxy locus estimates are FormulaH01/Formula{pi} = 4.59, but FormulaLamarc = 1.4 even when Lamarc estimates for the locus are reinitiated with higher values of FormulaLamarc and lower values for {theta}. Estimates of the ratio {rho}/{theta} for wild barley loci follow a relatively narrow range of 0–4 regardless of the estimators of {rho} and {theta} considered (Table 5). The only exceptions are at the PepcC locus, where there are only four informative sites: at PepcC, FormulaH01/FormulaW = 6.6 and FormulaH01/Formula{pi} = 9.8. The FormulaT05/FormulaT05 estimator provides a direct means of estimating confidence intervals. Estimates of FormulaT05/FormulaT05 and 95% confidence intervals for the wild barley loci are shown in Figure 4. Estimates of FormulaH01/FormulaW and FormulaH01/Formula{pi} for sampled Zea and Drosophila loci are in Table S1 (http://www.genetics.org/supplemental/). Estimates of {rho}/{theta} from Zea data are slightly higher than those for wild barley with a mean FormulaH01/ FormulaW = 3.5. The D. melanogaster data sets sampled here are generally from multiple populations worldwide, including populations from parts of the species range that were recently colonized. Mean FormulaH01/FormulaW = 2.5, which is much lower than Formula/Formula from the apparent core of the range of D. melanogaster in East Africa, where FormulaT05/FormulaT05 was estimated as 7.6 (HADDRILL et al. 2005).


View this table:
In this window
In a new window

 
TABLE 5

Estimates of {rho}/{theta} for wild barley loci including the 95% confidence intervals for FormulaT05/FormulaT05 and FormulaLamarc

 

Estimates of f:

On the basis of maxhap estimates, 8 of our 17 wild barley loci show no evidence of gene conversion and return Formula = 0 (Table 6). Among the 10 of our wild barley loci that met our criteria for external data sets (those that include >20 informative sites), 6 have maxhap estimates of Formula > 0, with estimates ranging from 0 to 59.5 (mean = 18.04 and median = 0.95). The largest two estimates of Formula are 59.5 for Dhn5 and 34 for Adh3. Inference of high levels of gene conversion at these loci is perhaps not surprising, as the Dhn5 locus includes a series of repeated sequence motifs (CHOI et al. 1999) that may be especially prone to illegitimate recombination, while Adh3 contains a 12-bp segment, delineated by three segregating sites, that shows evidence of either gene conversion or double recombination between deeply divergent haplotypes (see Figure 4 in LIN et al. 2001).


View this table:
In this window
In a new window

 
TABLE 6

Estimates of f based on composite-likelihood and pattern matching, the percentage of decrease in the coestimated FormulaH01, and Formula from pattern matching (x10–3)

 
Maxhap estimates Formula from Z. mays loci range from 0 to 23.2, with Formula= 0 at two of the five loci. The mean of Formula for Z. mays is 6.3 and the median estimate is 1.3. Nine of the 10 D. melanogaster loci show evidence of gene conversion on the basis of maxhap estimates, with Formula = 0.0–50.6 (mean = 8.4 and median = 1.3). For both portions of the Notch locus from D. simulans, Formula = 29.4. The largest available lookup table for maxhap has n = 100 chromosomes. For the D. pseudoobscura Adh locus 10 samples of 100 of the 139 chromosomes were drawn at random and analyzed with maxhap. In 8 of the 10 samples Formula = 0. The remaining two samples return Formula = 6.8 and 25.

Maxhap provides an option to return the composite likelihood for each value of Formula considered. Plotting the output from individual loci demonstrates that for loci with very large values of Formula (e.g., Dhn5 with Formula = 59.5) the likelihood value at the maximum-likelihood estimate of Formula is only very slightly higher than that for much smaller values of Formula; i.e., the likelihood surface is almost completely flat and it is difficult to distinguish between the likelihood of small values of Formula and the very large values returned by maxhap.

For our estimates of Formula (pattern matching), the proportion of triplets in pattern a was calculated for our 10 wild barley loci that have >20 parsimony-informative sites and Rm ≥ 2; pattern a is not possible in the absence of at least two observed recombination events. Among the 7 loci, 2 have no triplets in pattern a (Table 6). When pattern a triplets are observed, they are always a very small percentage of all possible triplets; e.g., there are 65 pattern a triplets at Dhn4 or 1.4% of all 4495 triplets. Three loci, Dhn1, Dhn4, and Dhn7 have Formula > 0 on the basis of pattern matching, with Formula = 2, 1, and 2, respectively. The maxhap estimates for Dhn1 and Dhn7 were Formula = 1.2 and 1.3, but 0 for Dhn4. Thus pattern matching for wild barley results in a mean Formula = 1 (median Formula = 1).

Figure 5 illustrates the results of pattern-matching simulations on a single locus. Simulation input values of {rho}c and f are plotted relative to a likelihood surface that shows the proportion of coalescent simulations that matched within 20% of the proportion of patterns a and then b and d in the wild barley Dhn7 locus. The locus has 50 parsimony-informative sites and Rm = 6 (Table 2) after the elimination of every detectable genotyping error (Table 3). The best fit to the empirical data is at Formula = 2.1 and Formulac = 3 x 10–3/bp (for f > 0, {rho}-values are for crossover only). For simulations with f = 0, the best fit occurs for {rho}c = 12 x 10–3/bp (very similar to the {rho}H01 and {rho}LS03 estimates of 12 and 14 x 10–3). However, f = 0 simulations provide a much poorer fit to the data than simulations with f ≥ 0.4 (Figure 5). The plot is based on 1000 simulations for each pair of values for {rho} and f, where parameter pairs make up a grid of all integer values of {rho} (per locus) between 1 and 20 inclusive (plotted values are per base pair x 10–3) and f-values incremented by 0.1 between 0 and 4 inclusive.


Figure 5
View larger version (43K):
In this window
In a new window
Download PPT slide
 
FIGURE 5.—

The likelihood surface for the wild barley Dhn7 locus based on the proportions of patterns a, b, and d in coalescent simulations across a dense grid of values of {rho} and f. Spectral colors from red toward violet represent increased likelihood of a match between simulation input parameters and the empirical data. The strongest single peak is for FormulaPM = 2.1, and Formulac = 3 x 10–3.

 
Among the 10 D. melanogaster loci, Formula ranged from 0 to 6, with a median value of 1. In D. simulans, the two portions of the Notch locus return estimates of Formula = 1 and 2 (Table 6). Pattern matching for maize loci returns Formula = 0 for three loci and Formula = 3 for the Adh locus. Pattern-matching simulations provided poor fit to the data from the D. melanogaster tinman and the Z. mays Zfl2 loci and D. pseudoobscura Adh, and thus no pattern-matching estimates are reported (Table 6).

The accuracy of maxhap estimates of fH01:

Simulated data generated with f = 0 and {rho} ranging from 0 to 0.032/bp show that almost half of all maxhap estimates FormulaH01 and Formula return a point estimate of Formula > 0. The variance of Formula is consistently higher than that of FormulaH01. For {rho} = 8 x 10–3 (near the mean estimate for wild barley) there is an ~50% probability of estimating Formula > 0 in simulations with f = 0. In 10,000 simulations each for L = 250 and 500 and with f = 0, median Formula = 5.7 and 7.4. This indicates that in coalescent simulations, single-locus-based estimates of Formula have a large bias and high variance (WALL 2004). Confidence intervals for composite-likelihood estimates of Formula can be estimated using a parametric bootstrap simulation procedure (HUDSON 2001; MCVEAN et al. 2002). However, estimating confidence intervals for Formula when FormulaH01 and Formula are coestimated is problematic. For simulations based on the loci sampled here, the lower bound of the 95% confidence interval appears to always include 0, and upper bounds can be greater than an order of magnitude larger than the point estimate. Thus, an estimate of Formula > 0 does not necessarily reflect the presence of gene conversion.

Simulations with f > 0 yielded Formula-values that increase dramatically with increasing f in the simulation. For {rho} = 8 x 10–3, L = 250, and locus length l = 1500 bp, a simulation input value of f = 5 results in a 97% probability of Formula > 0. An increase in the length of the region simulated appears to dramatically improve the potential for rejecting high values of f. For l = 3000 and 4500 bp, the presence of f > 1 can be rejected with >95% probability (Figure 6). For parameter values that reflect the wild barley data (i.e., as above, {rho} = 8 x 10–3 and l = 1500 bp) but with a tract length L = 500 a simulation input of f = 5 results in a 92% probability of estimating Formula> 0. For 10 of 17 wild barley loci where the maxhap Formula = 0, the haplotype variation observed is not likely to have been generated by high levels of gene conversion (e.g., f > 5).


Figure 6
View larger version (9K):
In this window
In a new window
Download PPT slide
 
FIGURE 6.—

Coalescent simulations based on mean {rho}- and {theta}-values from the wild barley data, depicting the probability of a composite-likelihood estimate of f = 0 given the input value of f shown along the x-axis. Tract length L = 250 bp is shown with locus lengths l = 1500, 3000, and 4500 bp.

 


DISCUSSION
We demonstrate that despite a high level of self-fertilization, recombination makes as large a contribution to sequence diversity in wild barley as does mutation ({rho}/{theta} = r≥ 1). The primary impact of inbreeding is expected to be a dramatic reduction in the effectiveness of recombination. In a coalescent framework, this is realized as a reduction in the effect rate of recombination relative to mutation. For wild barley, Formula/Formula {approx} 1.5, similar to values estimated for outcrossing species (e.g., BALAKIREV et al. 2003; BALAKIREV and AYALA 2004b) and is ~30-fold greater than {rho}/{theta} = 0.05 recently estimated for the self-fertilizing species Arabidopsis thaliana (NORDBORG et al. 2005). Published estimates for species-wide samples from the outcrossing species D. melanogaster and maize have means of 1.0 and 1.5 (BALAKIREV and AYALA 2004b; WRIGHT et al. 2005). However, various published data sets from D. melanogaster, including those considered here, have very different sampling schemes and thus include populations with disparate demographic histories. Recent demographic history in particular can influence estimated rates of recombination (THORNTON and ANDOLFATTO 2006). In East African populations of D. melanogaster (putatively the core of the species range) (LACHAISE et al. 1988) and in wild Mexican samples of the maize progenitor, teosinte, mean Formula/Formula has been estimated as 7.6 and 4.5 (HADDRILL et al. 2005; WRIGHT et al. 2005). In A. thaliana, the impact of inbreeding is also confounded by a recent demographic expansion (NORDBORG et al. 2005)

Why has the high rate of self-fertilization in wild barley not had a more dramatic impact on the relative role of recombination? First, it is important to note that the relative role of recombination and mutation in the wild barley lineage prior to the evolution of self-fertilization is unknown. The species most closely related to H. vulgare ssp. spontaneum is H. bulbosum, which is self incompatible and obligately outcrossing. By comparison to teosinte, and accounting for potential impact of the ancestral mating system, {rho}/{theta} of 5–10 prior to the transition to self-fertilization is plausible. With an average of 98.4% self-fertilization, expected {rho}s/{theta}s {approx} 0.14–0.28, so observed Formula/Formula {approx} 5–10 times that expected. A larger ancestral value of {rho}/{theta} leads to a smaller difference in observed and expected values.

How can we account for the relatively large role of recombination in generating haplotypic diversity in wild barley? One possibility is that the rate of self-fertilization in wild barley has been overestimated. However, this does not seem likely. BROWN et al. (1978) reported an average selfing rate of 98.4% (with a 95% confidence interval of 97.3–99.2%). The estimate was based on multiple progeny from each maternal plant and an assay of 22 polymorphic allozyme loci in 26 populations in Israel. The lowest self-fertilization rate estimated in a single population was 90.4%. More xeric sites had a higher self-fertilization rate than mesic sites, with average rates of 99.6 and 97.9%, respectively. A recent study of 12 populations in Jordan that employed microsatellite-based estimates of outcrossing rate reported an average selfing rate of 99.7% (ABDEL-GHANI et al. 2004). Reports of observed heterozygosity based on numerous studies of allelic diversity in wild barley are consistent in suggesting very high rates of self-fertilization (cf. NEVO et al. 1979; VOLIS et al. 2001). Rates of self-fertilization could be as high or higher in other parts of the species range (e.g., Central Asia); populations in Israel and Jordan occur in a region with much higher rainfall than occurs across most of the range of wild barley (VOLIS et al. 2001, 2002).

A second possible explanation for the relatively large apparent role of recombination in this highly selfing species is that the transition to self-fertilization may have occurred relatively recently (LIN et al. 2002). If self-fertilization evolved recently, perhaps within the last 100,000 years (see discussion in CHARLESWORTH and VEKEMANS 2005), then many recombination events that occurred before the transition may still be evident in the data (MORRELL et al. 2005).

Another possibility is that increased chiasma frequencies may elevate recombination rates within self-fertilizing lineages. Comparisons of inbreeding species and outcrossing relatives have frequently reported higher chiasma frequencies in inbreeders (GRANT 1958; reviewed in CHARLESWORTH et al. 1977). The potential compensatory effects of increase in chiasma frequency are limited, however, because the high rate of homozygosity in self-fertilizing species means that most effective recombination follows an outcrossing event (NORDBORG 1999). In wild barley the level of heterozygosity is extremely low, <0.5% for highly polymorphic microsatellite loci (BAEK et al. 2003) and 3.3% in the present data set after employing our heterozygote detection approach. Also, a phenomenon known as chiasma (or crossover) interference (cf. MALKOVA et al. 2004) limits the number of additional chiasmata that can occur along an individual chromosome [although some fraction of recombination events appear not to be constrained by interference (COPENHAVER et al. 2002)]. Increased chiasma frequency alone is unlikely to compensate for the 5- to 10-fold excess in recombination relative to expectations.

Estimated values of {rho}:

Parametric estimates of Formula per base pair for wild barley have a mean of 7–8 x 10–3, ~40 times greater than estimates for A. thaliana (Formula = 2 x 10–4) (NORDBORG et al. 2005) and 0.4–0.5 times that of maize (Formula = 16–19 x 10–3) (TENAILLON et al. 2002) and D. melanogaster (Formula = 12–14 x 10–3) (HADDRILL et al. 2005).

For several loci, the majority of estimators report Formula = 0 and it is evident that estimation of recombination in these loci is limited by the number of parsimony-informative sites. Several loci have Rm = 0 (i.e., Adh1, {alpha}-Amy1, Faldh, and PepcC) (Table 4), and for these loci, the 95% confidence intervals of many estimators include Formula = 0. For the same loci the FormulaW00 point estimate is always FormulaW00 = 0. For most of the same loci, FormulaH01 is the largest point estimate of Formula. The FormulaT05 estimate uses a similar summary of the data to that considered in FormulaW00 but estimates Formula > 0 (although sometimes very small values) for every locus in the data set, including those with Rm = 0.

A number of studies have reported on the accuracy of Formula estimators based on coalescent simulations with a known input value of {rho} (KUHNER et al. 2000; WALL 2000; FEARNHEAD and DONNELLY 2002; SMITH and FEARNHEAD 2005). Although we cannot estimate the accuracy of the seven estimators we have used on the wild barley empirical data, we can consider the utility of estimators and consistency of Formula across estimators. As larger numbers of loci are considered, both the difficulty of input file preparation and computational efficiency can be serious limitations.

Point estimates of {rho} from the seven estimators are highly correlated with each other. The most highly correlated measures are from the two composite-likelihood estimators (FormulaH01 and FormulaMAF02, Pearson's r2 = 0.95) and the two estimators that are based on summary statistics (FormulaW00 and FormulaT05, r2 = 0.96). The estimates that are least correlated with those of other estimators are those from FormulaLamarc (r2 < 0.62 for all pairs involving FormulaLamarc). The results of the Friedman test indicate that both the locus and the estimation methods influence the rank of the estimates when the other factor is used as a blocking variable. This indicates that although the estimators differ, the locus rankings of Formula are correlated among the seven methods. Therefore, the estimators concur sufficiently to allow the detection of different recombination rates for the 17 wild barley loci.

Among the seven estimators used for our 17 wild barley loci, FormulaT05 returns the median estimate of Formula for six loci and FormulaH01 returns the median estimate for three more. No other estimator returns the median estimate more than twice (Table 4). Because one of our principal goals was to estimate the relative role of recombination and mutation, the rhotheta (FormulaT05) estimator has considerable utility in that it provides a means of estimating Formula, Formula, and Formula/Formula with confidence intervals in a relatively limited