The inference of transmission pathways for medicinally important bacteria is important to our understanding of pathogens. Here we report analyses of transmission in Helicobacter pylori, a major carcinogen. Our study is novel in that the focal community comprises detailed family pedigrees and has a high prevalence of H. pylori. To infer transmission, we performed high-resolution analyses of nucleotide sequences for three genes and accounted for the occurrence of mutation and recombination through the use of simulation modeling. Our results demonstrate that transmission has a strong nonfamilial component potentially the result of a large proportion of infections derived from the community. These results are interesting from both a medical and an evolutionary standpoint. First, efficient control measures and beliefs about the sources of H. pylori infection should be reevaluated. Evolutionarily, our results contradict the hypothesis of strict vertical transmission, presented as an explanation for the strong correlation between human population history and H. pylori diversity. Thus the paradox of persistent phylogenetic structure, despite a permissive mode of transmission and high recombination rates, must be solved elsewhere. Here we consider the potential for recombination events to maintain genetic structure in light of horizontal transmission.
HELICOBACTER pylori is a gram-negative bacterium that colonizes the gastric mucosa (Goodman and Correa 1995). This infection is associated with chronic gastritis, peptic ulcer disease, mucosa-associated lymphoid tissue lymphoma, and gastric adenocarcinoma, has a major impact on public health (Alm et al. 2000), and has been classified as a group I carcinogen by the International Agency for Research on Cancer (NIH Consensus Development Panel 1994). Although much is known about the virulence of H. pylori (Covacci et al. 1999), the potential transmission pathways for the bacterium are unresolved (Goosen et al. 2002; Dowsett and Kowolik 2003). Potential transmission routes include oral–oral and fecal–oral, both with and without intermediate transmission steps. These transmission pathways are supported by a plethora of studies demonstrating the presence of H. pylori isolates, either by culturing or PCR techniques, in both the mouth environment (Dowsett and Kowolik 2003) and stool samples (Thomas et al. 1992; Kelly et al. 1994; Parsonnet et al. 1999). Further clues regarding transmission lie in the fact that infection is more prevalent in developing countries, where it can assume epidemic proportions with up to 80% of the population being infected (Alm et al. 1999). Such observations suggest that H. pylori infection may be associated with low socioeconomic status and overcrowded living conditions, as has been suggested previously (Suerbaum and Michetti 2002). Transmission studies in developed countries suggest a person-to-person mode of transmission (Suerbaum and Michetti 2002). Furthermore, epidemiological and DNA-based studies have suggested that a parent–offspring transmission pathway, mostly mother to child, is responsible for transmission within the family and that infection outside the family rarely occurs (Drumm et al. 1990; Bamford et al. 1993; Rothenbacher et al. 1999; Suerbaum and Michetti 2002; Samir et al. 2004).
Given that developed countries have shown a decrease in H. pylori incidence without targeted intervention (Parsonnet et al. 1992; Banatvala et al. 1993), the status quo of H. pylori and its transmission route should be addressed in developing countries where incidence is high and the bacterium continues to present itself as a serious health concern. Understanding the transmission process is an essential step toward controlling the spread of H. pylori. Previous transmission studies of H. pylori have been conducted in low-incidence study populations related to their socioeconomic status (Rothenbacher et al. 1999; Tindberg et al. 2001), with small within-family sample sizes (Rothenbacher et al. 1999; Samir et al. 2004; Raymond et al. 2004), and using low-resolution methods for the inference of transmission (Rothenbacher et al. 1999; Tindberg et al. 2001; Samir et al. 2004). These characteristics should be further explored. First, poor sanitation, overcrowding, and generally low socioeconomic status are factors that determine susceptibility to H. pylori infection (Everhart 2000; Bunn et al. 2002; Suerbaum and Michetti 2002). Thus transmission should be evaluated in high-prevalence communities. Second, few studies have compared H. pylori genotypes from both parents and several children within individual families (Rothenbacher et al. 1999; Samir et al. 2004). The observation of transmissions from a mother to each of her children should be the means by which the route of infection is evaluated such that one can distinguish socially mediated transmission from a strictly vertical pathway. Third, transmission studies have typically used presence/absence measures of H. pylori infection via 13C urea breath tests (Rothenbacher et al. 1999; Tindberg et al. 2001), presence/absence of antibodies (Tindberg et al. 2001; Samir et al. 2004), restriction enzyme digestion of amplified genes (Wang et al. 1993), or analyses of peptide sequences (Prewett et al. 1992). Although the latter two methods are an advance over 13C urea breath tests as they allow comparison of H. pylori genotypes in parents and children, these methods do not provide the high resolution of nucleotide-sequence-based studies and are more susceptible to convergent changes. Furthermore, previous studies have not considered the potential for mutation, and/or recombination, to occur between the times of transmission and sampling.
In this article we derive DNA sequence data from extended family pedigrees in a high-prevalence community and use population genetics tools and computer simulations to infer the transmission patterns of H. pylori. We show that a large proportion of infection is derived outside of the family in this population and that horizontal transmission from the community plays a major role in the spread of H. pylori.
MATERIALS AND METHODS
Sampling and gene sequencing:
The study population comprised 105 healthy individuals, for which extensive pedigree information was available (supplemental Figure 1 at http://www.genetics.org/supplemental/), from a rural, South African, black community (Ogies, Mpumulanga) who have been followed as part of a long-term surveillance program on the epidemiology of H. pylori (Goosen et al. 2002; Fritz et al. 2006; Olivier et al. 2006). This population had many of the risk factors that are associated with a high prevalence of H. pylori infection (Bunn et al. 2002; Suerbaum and Michetti 2002). The study population comprises 12 pedigrees, the individuals of which occupy 23 independent households (supplemental Figure 1 and supplemental Table 1 at http://www.genetics.org/supplemental/). Furthermore, 100% of the households have a reticulated water supply, 87% flushing toilets, and 98% of individuals have their own toothbrush (supplemental Figure 1 and supplemental Table 1 ). These population statistics indicate that the study population is sufficiently replicated for the proposed analysis since each of the 23 households have their own reticulated water supply, and most have flushing toilets. Ethical approval for this study was obtained from the University of Pretoria and the Hospital Review board of the Unitas hospital (Fritz et al. 2006; Olivier et al. 2006). Endoscopy was performed in 90 individuals. DNA was isolated directly from gastric biopsy samples, and fragments from three housekeeping genes (ureI, ureC, and mutY; Figure 1) were PCR amplified and sequenced. These housekeeping genes were chosen, using the calculations of recombination end-point frequency (Falush et al. 2001) to span both narrow (ureC-ureI) and moderate (ureC-mutY) genomic distances. This strategy facilitates detection of recombination events following transmission between sampled individuals, a process that could potentially obscure patterns of vertical inheritance of H. pylori infection. The intergene distances between ureI and ureC may be sufficient to maintain linkage disequilibrium between these fragments, while the greater distance between ureC and mutY would allow more recombination and, potentially, breakdown of linkage disequilibrium. PCR products were purified by precipitation with 95% ethanol and 3 m NaAc, and sequence reads were determined on an ABI 3100 capillary sequencer, following cycle-sequencing using the BigDye 3.2 termination reaction. Direct DNA extraction of biopsy samples was preferred over culturing since the latter may result in in vitro sequence change and may also reduce genotypic diversity within a sample. Where multiple genotypes were detected in a single biopsy sample, we cloned PCR products and sequenced a selection of these cloned fragments to identify unique genotypes. PCR products were cloned into the pTZ57R vector using the InsT/Aclone PCR product cloning kit (Fermentas). These data were combined with published multiple strain data (Raymond et al. 2004) to calculate the degree of within-individual sequence diversity and thus to determine whether the single-strain assumption was acceptable, given the intended aims of this study.
Sequences from the three genes were analyzed to identify the predominant route of transmission of H. pylori. In summary, (i) neighbor-joining phylograms were used to represent phylogenetic structure of H. pylori within the community and (ii) statistical comparisons and a custom simulation model were developed to identify the most likely path of transmission (Table 1).
Sequences from the three genes ureI, ureC, and mutY with 80, 79, and 79 genotypes, respectively, were imported into Sequence Navigator (Applied Biosystems, Foster City, CA), where they were proofread and subsequently aligned using ClustalX (Higgins and Sharp 1988). We performed preliminary analyses of sequence diversity in DnaSp (Rozas and Rozas 1999) and used Mega2 (Kumar et al. 2004) to construct unrooted neighbor-joining phylograms based on uncorrected P distances for each of the three genes. The aim of these preliminary analyses was to identify overall trends and population genetic structure within the data set derived from the Ogies community.
To obtain a preliminary view of the information content of the data in terms of analyzing transmission hypotheses across relationship categories, we performed chi-square permutation tests. These tests examined the association between individuals carrying similar sequences and individuals within a particular category of relationship (mother–child, parent–offspring, siblings, family members, or housemates). Family members are individuals who are related and share a household, whereas housemates are individuals who share a household yet are not necessarily related. Since each individual appears in multiple pairwise comparisons, these comparisons are not independent of each other. We tested the effects of dependence on chi-square tests with a Monte Carlo approach that randomly selected individuals from pairwise comparisons. The purpose of these permutations was to compare the results of the dependent chi-square tests utilized above to simulated data sets of various levels of dependence. Significance of the observed chi-square test statistic was assessed using a permutation approach as per Roff and Bentzen (1989). An algorithm was used to produce a series of 10,000 randomized contingency tables for each comparison, with the number of similar sequences and the number of individuals in the particular relationship category (the marginal values) held constant. To assess the significance of an association, we determined the proportion of these randomized contingency tables that showed more extreme chi-square values than the observed data. Since mutations may have accumulated since the time of transmission, we scored sequences as similar if they differed by five or fewer substitutions. This criterion was based on a gap observed in the distribution of pairwise sequence differences (the mismatch distribution), with few comparisons that differ by 6–10 nucleotide substitutions. The maximum mutation rate of H. pylori has been estimated as 2.28e−5 mutations/site/year (Falush et al. 2001). Given a total of 1391 sites sequenced in this study, the probability of a single mutation occurring each year is 0.03. Thus the time required for five mutations to occur is on the order of 160 years, and it is highly unlikely that more than five mutations could have accumulated between the times of transmission and sampling.
These categorical tests alone provide an inadequate representation of H. pylori transmission patterns, as there may be multiple routes of person-to-person infection. A probabilistic model of infection, with pedigree and associated sequence data as inputs, was constructed to characterize patterns of genetic diversity expected under more complex transmission scenarios. In the model, transmission was simulated using a broken-stick design (Figure 2), where the source of infection for each target individual was determined by a random draw from a uniform distribution, with varying frequency segments assigned to five relationship categories (Table 1). Given the mode of transmission, the source individual and the associated DNA sequence was identified from the pedigree data. Infection from the community was simulated by drawing from the data set of available individuals (n = 74) or by creating a new allele at a rate determined by the observed gene diversity. The nucleotide substitutions to be enforced along this new allele were determined by first randomly choosing one of the three identified phylogenetic groups (Figure 3) at a rate determined by their frequency in the observed data and then by choosing a pairwise difference from the mismatch distribution within these groups for each of the three candidate genes. Substitutions along the new allele were applied according to the appropriate mutation model for each gene as determined in PAUP*4b10 (Swofford 1999) using ModelTest 3.4 (Posada and Crandall 1998) and Akaike Information Criterion model selection. Substitutions were constrained to the observed variable sites or to new mutant sites arising at a rate determined by the nucleotide diversity, such that the strong phylogenetic structure evident in H. pylori (Falush et al. 2003) would be retained in the model. Age of infection, drawn from a gamma distribution (mean = 3, α = 0.1) that approximates patterns of infection in empirical studies (Klein et al. 1994; Rowland et al. 2006), was used to calculate the sequence divergence from time of infection to time of sampling. Time since divergence determined the number of mutation and recombination events, with rates of 6.9e−5 and 4.1e−5/nucleotide/annum, respectively (Falush et al. 2001). Mutation events that occurred within the sequenced gene regions (ureI, ureC, and mutY) were enforced according to the best-fit mutation model for each gene. Similarly, recombination events that occurred within the sequenced gene regions, or that occurred within one recombinant's length of any of these gene regions, drawn from an exponential distribution with a mean of 417 bp (Falush et al. 2001), were enforced along the simulated DNA sequences. Transmission was simulated for 26 individuals, for whom at least one parent's bacterium infection had been sequenced, and was repeated 50,000 times for each of eight transmission hypotheses (Table 1). Transmission hypotheses were chosen to represent a range of vertical, horizontal, and combined vertical and horizontal scenarios (Table 1, scenarios 1–7). In addition, we used the observed data to calculate a proportion of sharing of genotypes between children and their mothers, fathers, siblings, and housemates. Genotypes that differed by fewer than five mutations were considered to be identical, as in the chi-square analysis, such that mutation and recombination could be accounted for. These patterns of genotype sharing observed from the data were used to parametize an eighth transmission hypothesis (Table 1, scenario 8). Although only 50,000 Monte Carlo iterations were simulated for each hypothesis (Table 1), the simulated data comprise 1.3 million independent transmission events, since iterations each consist of 26 transmission events across the pedigrees.
The model simulated a single transmission event per individual, and thus summary statistics (gene diversity and nucleotide diversity) and phylogenetic structure of the simulated data should not differ substantially under the alternate transmission hypotheses. However, pairwise comparisons of parent–child divergence should show marked differences. We performed Archie–Faith–Cranston randomization tests, with 1000 permutations, in PAUP*4b10 (Swofford 1999), which calculated the probability that the topologies of randomly chosen simulated data sets for each transmission scenarios were consistent with that of the observed data topology. To compare results from alternate transmission scenarios, we calculated the mean, median, and skewness from simulated distributions of mother–child divergences and used bootstrap resampling procedures (50,000 replicates) to calculate confidence limits on these statistics for each of the eight transmission hypotheses. Finally, we used these resampling procedures along with Kolmogorov–Smirnov tests to infer whether any of the eight transmission hypotheses were unlikely to have given rise to the observed data.
Seventy-five individuals (83%) were H. pylori positive by PCR and histology. Multiple genotypes were detected in 10 (13%) of the individuals sequenced. Co-infection with multiple H. pylori genotypes has been suggested in some previous studies (Miehlke et al. 1999; Luman et al. 2002). Multiple strain data generated in this study (mutY: 3 individuals, 66 pairwise comparisons; ureI: 2 individuals, 28 pairwise comparisons) indicated low average within-individual genetic distances (mutY: 0.006 ± 0.01, n = 19; ureI: 0.014 ± 0.013, n = 12). Furthermore, Raymond et al. (2004) sequenced 20 cultures for each of 6 individuals and detected a maximum of four strains per individual, where each individual appeared to be infected by a dominant strain with >70% occurrence. Mean average ureC genetic distance within individuals was low (0.023 ± 0.004, n = 5). Consequently, all subsequent analyses were performed assuming one strain per individual. It is important to note that we do not advocate that co-infection with multiple strains does not exist, but rather suggest that the low percentage of sequence divergence observed among multiple strains within individuals is unlikely to confound the inference of transmission under a single-strain assumption. We discuss the implications of this assumption later. Preliminary sequence analyses show high levels of gene diversity for each of the genes sequenced (Table 2). In general, nucleotide diversity is higher for mutY than for either of the other genes (ureI and ureC). There are low levels of coding substitutions relative to silent substitutions, and thus there is no evidence for selection among the sequenced genotypes at these genes. Given that the fragments sequenced are from general H. pylori housekeeping genes, this result is expected. Phylogenetic conflict between adjacent segregating sites suggests a minimum estimate of between 23 and 37 recombination events within this sample for each of the three genes (Table 2). Finally, phylogenetic analysis of the three sequenced genes indicates a lack of clustering of individuals from the same family (Figure 3) as would be expected if transmission were vertical. In further support of a substantial horizontal transmission component, most families comprised genotypes belonging to both the hpAfrica1 and hpAfrica2 (Falush et al. 2003) divergent strains (Figure 3), and furthermore there was no significant clustering of H. pylori strains among families (χ2 = 26.23, d.f. = 16, P > 0.05).
Transmission of H. pylori was inferred using two approaches. The first approach used categorical tests to determine whether the number of pairwise comparisons between individuals carrying identical or similar sequences, within a particular relationship category, differed from that expected under a random assignment of genotypes to individuals within the sample. We compared genotypes within mother–child, parent–offspring, sibling, extended family, housemate, and spouse relationship classes. To test for associations between genotypes within relationship classes, we used chi-square values, with the test distribution for these pairwise comparisons generated from 10,000 random permutations of genotype assignments (Table 3). Since most individuals are involved in multiple pairwise comparisons, there may be a problem with dependency. Resampling the data set for different levels of dependency indicated that at least some of the statistical power achieved was the result of individuals involved in multiple dependent comparisons (results not shown). However, we used only chi-square results to gain a preliminary understanding of the data. Results from these chi-square permutation tests indicate that (i) parents are significantly more likely to share similar H. pylori genotypes with their children than are unrelated individuals; (ii) siblings are also likely to share H. pylori genotypes; (iii) individuals from the same household show the highest frequency of genotype sharing (evident as large chi-square values), irrespective of their family relationships; and (iv) spousal partners are no more likely to share H. pylori genotypes with each other than with anyone else in the community. These results are consistent with some degree of transmission through childhood social interaction. Most individuals, however, carry substantially different H. pylori genotypes, irrespective of their relationships, which suggests that a large proportion of H. pylori infections are acquired outside the family.
The second approach involved the construction of a probabilistic model that used pedigree information and sequence data derived from the study population to simulate transmission processes (Table 1). This simulation model is preferred over categorical tests as it incorporates mutations and recombination events that may have occurred since infection and allows one to investigate patterns of genetic diversity occurring under multiple contrasting transmission pathways. As with all such models, however, it was first necessary to evaluate simulation results in terms of whether these were comparable with the sample and whether these are sufficient to discriminate alternate transmission pathways. Since this model simulated a single transmission event for each of 26 individuals, one would not expect summary statistics (gene diversity and nucleotide diversity) or phylogenetic structure of the simulated data to differ substantially under the alternate transmission hypotheses. Pairwise comparisons of parent–child divergence, however, should show marked differences, according to the frequency of vertical transmission in a particular transmission scenario. Mismatch distributions, distributions of gene diversity and nucleotide diversity, and distributions of segregating sites for three contrasting transmission hypotheses were comparable and similar to the observed data (results not shown). Furthermore, permutation tests in PAUP*4b10 (Swofford 1999), using the observed topology (Figure 3) as a constraint, demonstrated that the simulated data were highly consistent with the observed phylogeny for each of these three alternate transmission hypotheses (in all cases P = 1.0). These results indicate that the simulation model does not perturb within-community phylogenetic structure (evident in Figure 3). To assess the potential of mother–child sequence divergence to discriminate alternate transmission hypotheses, we calculated confidence limits, using bootstrap resampling, on distribution statistics derived from 50,000 simulations. These results indicate that this approach is powerful, with narrow confidence limits on expected statistics (Table 4A). In particular, there is a shift from a strongly right-skewed distribution of mother–child divergences (g1 > 0) in vertical transmission models (hypotheses 1 and 2, Table 4A) to a strongly left-skewed distribution (g1 < 0), when infection is predominantly acquired horizontally and outside the family (hypothesis 7, Table 4A). These results confirm that average pairwise sequence divergences between mothers and their children are low under strict vertical transmission models and high under permissive horizontal transmission models.
To infer which of the transmission hypotheses were most likely to generate the observed data, we conducted Kolmogorov–Smirnov tests comparing observed distributions of pairwise divergence statistics against those simulated under various transmission scenarios. The observed distribution of within-household sequence divergence values for the 26 focal individuals was significantly different from all the simulated transmission scenarios (Table 5). In contrast, the sequence divergence values among siblings were consistent with all the transmission models considered. These cases show rigorous and insufficient discrimination, respectively. The inability to discriminate among hypotheses for among-housemate comparisons may be due partly to the difficulty in distinguishing instances of vertical and horizontal transmission within households, where individuals can obtain identical genotypes both from specific parent–offspring interactions or from less specific social contacts. Father–child divergence distributions showed significant deviations from the observed data in the two strictly vertical transmission scenarios (Table 5). Observed mother–child divergences were significantly different from both vertical transmission scenarios (Table 5). However, elements of both parent–offspring and social transmission are evident in the observed data as a bimodal distribution with high frequencies of both low and high mother–child sequence divergences, respectively (Figure 4). In general, scenarios with a high probability of vertical transmission (or strongly right-skewed distributions of parent–child sequence divergence) and scenarios with low probabilities of infection from the community were least consistent with the observed data (Table 5). Bootstrap estimates of distribution statistics for the observed data have wider confidence limits than the simulated data (Table 4B), as expected given the smaller sample sizes. Nonetheless, the observed mean mother–child sequence divergences are unlikely to have been generated through predominantly vertical transmission. It is difficult to distinguish the observed data from either mixed transmission or predominantly horizontal scenarios, given the wide confidence estimates on mean, median, and skewness. Skewness, in particular, has confidence limits that encompass both right-skewed (g1 > 0) and left-skewed distributions (g1 < 0), which results from the bimodal distribution of mother–child divergences in the observed data (Figure 4). However, observed distribution statistics are substantially different from those simulated in predominantly vertical transmission scenarios (Table 4).
Previous population genetic studies of H. pylori have shown the existence of ancestral population types or strains that are consistent with geographic regions (Achtman et al. 1999; Falush et al. 2003). These studies indicate the effects that human migrations have had on global genetic diversity within H. pylori. Evolutionarily, the existence of geographical strains, each with an ancestral origin that can be geographically determined, provides a marker independent from that of Y chromosome, microsatellite, or mitochondrial DNA studies, for deciphering human history (Wirth et al. 2004). Comparisons of these markers show that DNA sequences from H. pylori provide greater resolution, for example, in the separation of Buddhists and Muslim populations in Ladakh, India, than do mtDNA sequences or microsatellites (Wirth et al. 2004). The accurate inference of human migratory patterns using H. pylori, however, has been justified on the assumption of strictly vertical transmission (Wirth et al. 2004). To date most studies have suggested predominantly vertical transmission (Drumm et al. 1990; Rothenbacher et al. 1999; Tindberg et al. 2001), with most evidence for infection contained within the family unit and a sampling bias toward maternal transmission.
We have shown that in a high-prevalence population, transmission of H. pylori also includes a strong horizontal component derived from the community. Although multiple strains have not been sequenced for all individuals in this study, we have shown that within-individual sequence diversity is low. This low within-individual genetic diversity, combined with the observation of dominant strains (Raymond et al. 2004), provides support for the single-strain assumption used in this study. Indeed, several recent studies that have sequenced multiple strains found substantially lower sequence divergences within individuals (Israel et al. 2001, Raymond et al. 2004, Kraft et al. 2006), suggesting a shared common ancestor for multiple strains within individuals. Given these results, it is unlikely that sequencing of multiple strains within individuals will explain the right-skewed distribution of parent–offspring genetic distances (Figure 4) and thus could not account for the community-derived component of infection. However, a paradox exists, given the occurrence of a single dominant strain and the observation that a large proportion of temporal genomic changes within H. pylori are the result of intergenomic recombination (Kraft et al. 2006). Kraft et al. (2006) conclude that the continuous acquisition of new strains is necessary for generating genomic changes in H. pylori. Given such intergenomic recombination, the inference of transmission in our study would be confounded. However, the use of multiple unlinked housekeeping genes not under selective pressure strengthens our argument since it is unlikely that each of the genes sequenced have had parallel recombination events.
Further complication in the inference of transmission routes in this study, however, could stem from host–pathogen interactions. Individuals do appear to carry a dominant strain (Israel et al. 2001; Raymond et al. 2004; Kraft et al. 2006). However, whether this observation is the result of within-host selection is uncertain and could certainly complicate the inference of transmission. Our observed results could be generated if a child obtained multiple infections from his or her mother, yet within-host selection resulted in a dominant strain in the child that differed from the mother's dominant strain. However, given the existence of a dominant strain, the multiple infections evident in some individuals are most likely the result of multiple independent horizontal transmission events. Clearly, much work on the development of infection and temporal host–pathogen interactions is required. This would include the sequencing of multiple strains from paired biopsies at the onset of infection, such that an understanding of temporal changes of the H. pylori population in the stomach environment could be addressed.
Many people in developing countries, and until recently those in the developed countries, live in comparable social conditions and experience similarly high H. pylori prevalence to this community (Parsonnet et al. 1992; Banatvala et al. 1993). Given the observed degree of acquisition of infection from the community demonstrated in this study, the retention of strong ancestral geographic structure within global H. pylori sequences (Falush et al. 2003), within sequences from regional populations (Muslims and Buddhists in Ladakh; Wirth et al. 2004), and within the single homogeneous community considered in this study requires an alternate explanation. The ancestral population structure observed in Ladakh is most likely the result of cultural separation of the religious population groups (Kaul and Kaul 1992; Srinivas 1998). The persistence of ancestral H. pylori lineages within the essentially homogeneous and intermarried community in this study, however, suggests that this structure is maintained by bacterial interactions rather than through separation of human societies alone. Phylogenetic structuring of global H. pylori genotypes probably arose through the isolation of ancestral human populations before the onset of migration and admixture. Subsequent recombination between lineages will disrupt this ancestral population structure but recombination within lineages tends to retain this ancestral structure by homogenizing within-group differences. Most likely, this retention of ancestral population structure is the result of genome-selective mechanisms, which act to limit recombination between different ancestral population groups.
From a medical perspective, the presence of a strong community-derived transmission outside the family affects our epidemiological understanding of H. pylori infection, especially in high-prevalence communities. The very high levels of gene diversity and the observation that most individuals carry highly divergent genotypes imply the presence of an immense community reservoir of H. pylori genotypes that serves as a source of infection. Many studies have searched for environmental sources of H. pylori such as water supplies (Klein et al. 1991; Hulten et al. 1996; Bunn et al. 2002), food (Hopkins et al. 1993), or insect vectors (Grubel et al. 1998; Osato et al. 1998), but these sources remain poorly substantiated and controversial, perhaps due to the presence of an alternate form of the bacterium that is difficult to culture (Bode et al. 1993; Dowsett and Kowolik 2003). The Ogies community considered here has a reticulated supply of treated tap water and flushing toilets, and hence water and sanitation are unlikely explanations for the high prevalence and diversity of H. pylori in this population. An alternative reservoir may be within the community itself, with infection passed from person to person, especially among children. Existing studies have found little evidence for transmission among school children (Tindberg et al. 2001) but these results were based on serological analysis in a low-prevalence community. Such transmission pathways could account for the strong community-derived transmission component observed in this study.
We thank Mark Achtman, John Atherton, Daniel Falush, and Cisca Wijmenga for pertinent comments on an earlier draft of this manuscript. Wayne Delport thanks Paulette Bloomer and Willem Ferguson for their support. Finally, we thank three anonymous reviewers and associate editor, Susan Gottesman, for insightful comments that significantly improved the quality of this manuscript. Schalk van der Merwe is a recipient of a Astra-Zeneca/South African Gastroenterology Society Fellowship in Gastroenterology.
- Received February 28, 2006.
- Accepted October 4, 2006.
- Copyright © 2006 by the Genetics Society of America