- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Innan, H.
- Articles by Stephan, W.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Innan, H.
- Articles by Stephan, W.
Distinguishing the Hitchhiking and Background Selection Models
Hideki Innana and Wolfgang Stephanba Human Genetics Center, University of Texas Health Science Center, Houston, Texas 77030
b Section of Evolutionary Biology, Department of Biology II, University of Munich, 80333 Munich, Germany
Corresponding author: Hideki Innan, School of Public Health, University of Texas Health Science Center, 1200 Hermann Pressler Dr., Houston, TX 77030., hideki.innan{at}uth.tmc.edu (E-mail)
Communicating editor: M. AGUADÉ
| ABSTRACT |
|---|
A simple method to distinguish hitchhiking and background selection is proposed. It is based on the observation that these models make different predictions about the average level of nucleotide diversity in regions of low recombination. The method is applied to data from Drosophila melanogaster and two highly selfing tomato species.
ONE of the signatures of genome-wide selection is the positive correlation between the amount of polymorphism and recombination rate, which was found in Drosophila (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Theoretical studies have shown that, in a diploid population (with constant effective size Ne) undergoing recurrent hitchhiking events or background selection, the expected degree of reduction in neutral polymorphism (f) is a function of
, the recombination rate per site per generation. That is, the expectation of the amount of variation in a region with a local recombination rate
is given by

where
neu = 4Neµ and µ is the neutral mutation rate per generation. On the basis of the work of ![]()
![]()
![]()
![]() |
(1) |
where a is a parameter that depends on the product of the population selection parameter (population size times selection coefficient) and the rate of sweeps per generation. It should be noted that this equation has several assumptions. First,
neu and a are constant over the genome. Second, the equation considers only selective sweeps but other types of selection (e.g., negative selection, balancing selection) are neglected. Third, the local recombination rate
has a uniform distribution (i.e., recombination hot and cold spots are ignored). Last, multiple concurrent selective sweeps are not allowed, so that (1) does not hold when recombination rate is very small due to interference among them. ![]()
=0, is
0.1, then (1) holds for all
values such that f
0.1. Furthermore, if f|
=0
0.1, (1) holds approximately for all
values for which f
f|
=0.
Under the BS model and the assumption that recombination rate is not extremely low, f is approximately given by
![]() |
(2) |
where u is the deleterious mutation rate per site per generation (![]()
neu and u values and a uniform distribution of
. It is known that these two Equation 1 and Equation 2 produce similar functional relationships for a wide range of
. In Fig 1A, the solid curve represents the function of f for the HH model obtained from Equation 1 with a = 5 x 10-9, which is an estimate for Drosophila melanogaster (![]()
when
gets large, while for BS f converges to 1 - u/
for large
. That means for a = u the two functions are asymptotically identical. Therefore, given a data set of levels of DNA polymorphism and recombination rates, we can fit both Equation 1 and Equation 2 to the data. For this reason, it is impossible to distinguish the two selection models on the basis of these types of data for large
values.
|
However, a close examination reveals a difference between the two functions in regions of low (but not too low) recombination rates such that both equations are still valid (see above). Fig 1B shows f under the two models with a = 5 x 10-9 for the HH model and u = 2.4 x 10-9 for the BS model. Although the two parameters are chosen to give the same average level of f for 1 x 10-9
3 x 10-9, it is evident that the shapes of the two curves are different. The curve of the HH model is convex in this parameter range while that of the BS model is concave, suggesting that polymorphism data from regions of low recombination might be useful to distinguish the two selection models. Focusing on this difference between the two functions, we propose a simple method with which to distinguish the two selection models. The idea is from ![]()
![]() |
(3) |
Suppose now that polymorphism data from n independent loci (DNA regions) are available from a single species. Let
i be the estimated amount of DNA polymorphism (
) at the ith locus (i = 1, 2, 3, ... , n). Let
i be the recombination rate (
) at the ith locus. We assume that recombination rates are known. Equation 3 indicates that
has a negative linear correlation with
/
. That is, the correlation coefficient between
and
/
is -1 if
i =
HHi at all the loci, where
HHi is the expectation of
i under the HH model given
i (obtained from Equation 1). On the other hand, we expect that r is relatively close to +1 under the BS model because of the concave behavior of f in regions of low recombination.
In practice,
is never exactly the same as the theoretical expectation due to genetic drift. Therefore, we investigate the distribution of the correlation coefficient (r) between
and
/
, taking the variance of
into account. First, the distribution of r is investigated under the HH model assuming
i has a normal distribution with mean
HHi and standard deviation (SD) k
HHi, where k is a constant value. A computer simulation is carried out in the following way:
- Determine
neu, a, and k. - Simulate
for the n loci.
i is assumed to be a random variable on the basis of a normal distribution with mean
HHi and SD k
HHi. If
i < 0,
i = 0 is set. - Calculate the correlation coefficient, r, between
and
/
using the simulated
for the n loci.
Steps 2 and 3 are repeated 10,000 times and the distribution of r is obtained. The procedure to obtain the null distribution of r under the BS model is almost identical to this. That is,
i is simulated as a random variable on the basis of a normal distribution with mean
BSi and SD k
BSi, where
BSi is the expectation of
i under the BS model given
i according to Equation 2.
Fig 1C and Fig D, shows the results of the distributions of r. Fig 1C and Fig E, investigates the distributions when regions of high recombination are studied, and Fig 1D and Fig F, is for regions of low recombination. a and u in Fig 1C and Fig E, are the same as in Fig 1A, while in Fig 1D and Fig F, we use the same a and u as those in Fig 1B. We consider n = 20 loci, whose recombination rates are assumed to be
i = (i + 5) x 10-9 so that the range of
is 625 x 10-9 in Fig 1C and Fig E. In Fig 1D and Fig F,
i is assumed to be (i + 10) x 10-10 so that
i ranges from 1.1 x 10-9 to 3 x 10-9. Fig 1C and Fig D, studies a case of small k (k = 0.1), while k = 1 is assumed in Fig 1E and Fig F.
First, we consider Fig 1D, which shows the distributions of r when regions of low recombination are studied and k is small. The distribution of r under the HH model is nearly symmetrical and the average is -0.33, while r under the BS model has a relatively narrow distribution close to +1. This result indicates that r might be a useful summary statistic to distinguish the HH and BS models since the distributions of r are completely different in the two models. However, it should be noted that this method does not work when applied to regions of high recombination. As shown in Fig 1C, the two distributions of r are very similar as expected (discussed above).
The power of this method to distinguish the two models depends on k. Fig 1E and Fig F, shows the results of the same analysis as those in Fig 1C and Fig D, respectively, but k = 1 is assumed instead of k = 0.1. The two distributions of r for the case of low recombination are quite similar, although not identical (Fig 1F). Fig 1E shows the distributions of r when regions of high recombination are investigated. The two distributions are very similar again, except that the means have moved to
0.7. These results suggest that the two selection models can be best distinguished under the following two conditions: (1) Polymorphism data from regions of low (but not too low) recombination are available; (2) the variances of the estimates of variation are sufficiently small.
Next we discuss how this method may be applied to data. Suppose that we have a data set of estimates of
from n independent loci and that we know the local recombination rates for the n loci. First, the correlation coefficient between
and
/
, robs, is calculated, and then robs is compared with the null distributions of r under the HH and BS models. The procedure described above is modified because we need to estimate a, u, and k from the data. That is, step 1 in the procedure should be replaced by the following two steps:
- 1a. Determine
neu. - 1b. For the HH model, find a, which gives the best fit of Equation 1 to the data by a least-squares method, which also gives an estimate of k. In a similar way, for the BS model, u is estimated using Equation 2 together with k.
Then, we can follow steps 2 and 3.
We apply this method to the data of D. melanogaster from ![]()
< 7 x 10-9 [yellow and su(s) are excluded because of too low recombination rates; see Table 2 in ![]()
neu, which is assumed to be 0.03 (e.g., ![]()
![]()
neu = 0.02 and 0.04. These results indicate that the observed distribution of the amount of variation in the 10 loci is explained better by a convex function than by a concave one, suggesting that it is very difficult to explain the observation by background selection alone. Hitchhiking might be the dominant force creating the pattern of standing polymorphism on the X chromosome of D. melanogaster. This is consistent with the conclusion of ![]()
|
Other interesting species to study are partially selfing plants, in which recombination is "effectively" reduced in the whole genome. We use two tomato species, Lycopersicon pimpinellifolium and L. chmielewskii, whose selfing rate (S) is
0.9 (![]()
![]()
![]()
![]()
40 loci of nine tomato species including the two highly selfing species mentioned above.
|
![]()
, the effective population size decreases to
, and selection intensity increases by a factor 1 + F when the effect of selection is additive. Then,
is defined as the reduction of the amount of polymorphism in comparison with the rescaled neutral expectation,
. Note that the rescaled parameters are represented by a bar. Thus, plugging these rescaling coefficients into (1) and (2), we can study the joint effects of the two mechanisms, selfing and selection, both of which decrease the level of polymorphism.
Then, we apply our method to L. pimpinellifolium and L. chmielewskii. To avoid the problem that Equation 1 and Equation 2 are invalid when recombination rate is very small, we use only 29 loci of MILLER and TANKSLEY's (1990) data set, excluding loci of very low recombination.
and
for the 29 loci are according to Table 1 in ![]()
ranges from 0 to 0.0275 in L. pimpinellifolium and from 0 to 0.0143 in L. chmielewskii. The recombination rates are rescaled to per site per generation values by multiplying them by a factor of 12.1 x 10-8. This factor results from the fact that the tomato genome size is
950 Mb (![]()
![]()
2.7 x 10-8 (or 2 x 10-9
4.9 x 10-9). Then, we obtain the correlation coefficient robs = 0.945 and 0.949 for L. pimpinellifolium and L. chmielewskii, respectively. These very high values of r seem to favor the BS model.
To test this possibility, we investigate null distributions of r under each selection model. We use a relatively wide range of
neu since it is very difficult to estimate
neu for highly selfing species in which the level of polymorphism is reduced in the whole genome. The probabilities that r exceeds the observation (robs) are shown in Table 1. These probabilities are relatively low under the HH model for the two species, suggesting that robs may be too big to be expected under the HH model, especially for L. pimpinellifolium. P(r > robs) seems to be quite robust to
neu. Under the BS model, P(r > robs) is relatively sensitive to
neu. This may be because f under the BS model can be either concave or convex depending on
neu. The two models can be distinguished well for
neu values that generate a concave shape of f. robs for the two species may be in the acceptable range under the BS model unless a very small
neu is assumed. For example, robs = 0.945 of L. pimpinellifolium could be too big even under the BS model if
neu = 0.02 is assumed, but this small value of
neu seems to be quite unrealistic because
can be as large as 1.38 x
neu in this highly selfing species. Thus, the results of Table 1 seem to suggest that background selection has played a larger role than hitchhiking in shaping genome-wide patterns of variation in the history of these two tomato species.
In this note, we proposed a method to distinguish the HH and BS models. Since the test looks at whether the level of polymorphism is a convex or concave function of the local recombination rate, we should have data from multiple regions in which the recombination rate is low (but not too low). The test is very powerful when the variance of
is low, indicating that
should be estimated from sufficiently long regions (such that the variances of
are reduced due to intragenic recombination). ![]()
from 500-kb fragments on human chromosome 21 is very similar to a normal distribution with a quite small SD. The application of our method to Drosophila and tomatoes led to different results. That is, the HH model is preferred in Drosophila while the BS model could better explain the observation in tomatoes. This might be due to the difference in life style and mating system between animals and plants (e.g., ![]()
![]()
![]()
However, there are some potential problems in the application of our method to the currently available data sets:
- It was not possible to obtain correct estimates of
neu, especially in the highly selfing tomato species. - The variance of
is relatively large, which decreases the power of the test. Also, our assumption of a normal distribution of
may not be adequate. These problems could be fixed for Drosophila and other outcrossing species if data from very long regions of low recombination rates are available, together with data from regions of high recombination to estimate
neu. For highly selfing species, even with such data, it is very difficult to estimate
neu because the level of polymorphism is reduced in the whole genome. - The theory assumes an unstructured population of constant size. The relationship between levels of variation and recombination rate should also be studied in other population models (see also
ANDOLFATTO and PRZEWORSKI 2001 ).
- The theory assumes constant values of a and u in Equation 1 and Equation 2 across the genome. Variation in these parameters could increase the variance of the observed amounts of polymorphism, reducing the power of the test. In such a case, problem 2 becomes more serious.
| ACKNOWLEDGMENTS |
|---|
We thank Yuseob Kim for his stimulating study of the hitchhiking process with interference among adaptive fixations. We also thank M. Aguadé and two anonymous reviewers for comments and suggestions. H.I. is supported by a fund from University of Texas. W.S. is grateful to the Erwin Schroedinger International Institute for Mathematical Physics in Vienna for support during his stay in winter 2002/2003 and to the Deutsche Forschungsgemeinschaft for funding (STE 325/5-1; Schwerpunktprogramm 1127).
Manuscript received April 29, 2003; Accepted for publication August 1, 2003.
| LITERATURE CITED |
|---|
ANDOLFATTO, P., 2001 Adaptive hitchhiking effects on genome variability. Curr. Opin. Genet. Dev. 11:635-641.[Medline]
ANDOLFATTO, P. and M. PRZEWORSKI, 2001 Regions of lower crossing over harbor more rare variants in African populations of Drosophila melanogaster.. Genetics 158:657-665.
BAUDRY, E., C. KERDELHUÉ, H. INNAN, and W. STEPHAN, 2001 Species and recombination effects on DNA variability in the tomato genus. Genetics 158:1725-1735.
BEGUN, D. J. and C. F. AQUADRO, 1992 Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster.. Nature 356:519-520.[Medline]
CHARLESWORTH, B., M. T. MORGAN, and D. CHARLESWORTH, 1993 The effect of deleterious mutations on neutral molecular variation. Genetics 134:1289-1303.[Abstract]
CHERRY, J. L. and J. WAKELEY, 2003 A diffusion approximation for selection and drift in a subdivided population. Genetics 163:421-428.
CUTTER, A. D. and B. A. PAYSEUR, 2003 Selection at linked sites in the partial selfer Caenorhabditis elegans.. Mol. Biol. Evol. 20:665-673.
HUDSON, R. R. and N. L. KAPLAN, 1995 Deleterious background selection with recombination. Genetics 141:1605-1617.[Abstract]
INNAN, H., B. PADHUKASAHASRAM, and M. NORDBORG, 2003 The pattern of polymorphism on human chromosome 21. Genome Res. 13:1158-1168.
KAPLAN, N. L., R. R. HUDSON, and C. H. LANGLEY, 1989 The "hitchhiking" effect revisited. Genetics 123:887-899.
KIM, Y. and W. STEPHAN, 2000 Joint effects of genetic hitchhiking and background selection on neutral variation. Genetics 155:1415-1427.
KIM, Y. and W. STEPHAN, 2003 Selective sweeps in the presence of interference among partially linked loci. Genetics 164:389-398.
MAYNARD SMITH, J. and J. HAIGH, 1974 The hitchhiking effect of a favourable gene. Genet. Res. 23:23-35.[Medline]
MILLER, J. C. and S. D. TANKSLEY, 1990 RFLP analysis of phylogenetic relationships and genetic variation in the genus Lycopersicon.. Theor. Appl. Genet. 80:437-448.
NACHMAN, M. W., 2001 Single nucleotide polymorphism and recombination rate in humans. Trends Genet. 17:481-485.[Medline]
NORDBORG, M., 1997 Structured coalescent processes on different time scales. Genetics 146:1501-1514.[Abstract]
NORDBORG, M. and H. INNAN, 2002 Molecular population genetics. Curr. Opin. Plant Biol. 5:69-73.[Medline]
PILLEN, K., O. PINEDA, C. B. LEWIS and S. D. TANKSLEY, 1996 Status of genome mapping tools in the taxon Solanaceae, pp. 281308 in Genome Mapping in Plants, edited by A. H. PATERSON and R. G. LANDES. Academic Press, Austin, TX.
RICK, C. M., 1966 Some plant-animal relations on the Galapagos islands, pp. 215224 in The Galapagos, edited by R. I. BOWMAN. University of California Press, Berkeley, CA.
RICK, C. M., 1983 Evolution of mating systems: evidence from allozyme variation, pp. 215221 in Genetics: New Frontiers (Proceedings of the XV International Congress on Genetics, New Delhi, December 1983), edited by V. L. CHOPRA, B. C. JOSHI, R. P. SHARMA and H. C. BANSAL. Oxford & IBH Publishing, New Delhi.
SHERMAN, J. D. and S. M. STACK, 1995 Two-dimensional spreads of synaptonemal complexes from solanaceous plants. VI. High-resolution recombination nodule map for tomato (Lycopersicon esculentum). Genetics 141:683-708.[Abstract]
STEPHAN, W., 1995 An improved method for estimating the rate of fixation of favorable mutations based on DNA polymorphism data. Mol. Biol. Evol. 12:959-962.[Medline]
STEPHAN, W. and C. H. LANGLEY, 1998 DNA polymorphism in Lycopersicon and crossing-over per physical length. Genetics 150:1585-1593.
STEPHAN, W., T. H. E. WIEHE, and M. W. LENZ, 1992 The effect of strongly selected substitutions on neutral polymorphism: analytical results based on diffusion theory. Theor. Popul. Biol. 41:237-254.
WHITLOCK, M. C., 2003 Fixation probability and time in subdivided populations. Genetics 164:767-779.
WIEHE, T. H. E. and W. STEPHAN, 1993 Analysis of a genetic hitchhiking model and its application to DNA polymorphism data. Mol. Biol. Evol. 10:842-854.[Abstract]
This article has been cited by other articles:
![]() |
I. Hellmann, Y. Mang, Z. Gu, P. Li, F. M. de la Vega, A. G. Clark, and R. Nielsen Population genetic analysis of shotgun assemblies of genomic sequences from multiple individuals Genome Res., July 1, 2008; 18(7): 1020 - 1029. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Nakagome, J. Pecon-Slattery, and R. Masuda Unequal Rates of Y Chromosome Gene Divergence during Speciation of the Family Ursidae Mol. Biol. Evol., July 1, 2008; 25(7): 1344 - 1356. [Abstract] [Full Text] [PDF] |
||||
![]() |
A.-C. Thuillet, M. I. Tenaillon, L. K. Anderson, S. E. Mitchell, S. Kresovich, S. M. Stack, B. Gaut, and J. Doebley A Weak Effect of Background Selection on Trinucleotide Microsatellites in Maize J. Hered., January 1, 2008; 99(1): 45 - 55. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Andolfatto Hitchhiking effects of recurrent beneficial amino acid substitutions in the Drosophila melanogaster genome Genome Res., December 1, 2007; 17(12): 1755 - 1762. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. I. Wright, J. P. Foxe, L. DeRose-Wilson, A. Kawabe, M. Looseley, B. S. Gaut, and D. Charlesworth Testing for Effects of Recombination Rate on Nucleotide Diversity in Natural Populations of Arabidopsis lyrata Genetics, November 1, 2006; 174(3): 1421 - 1430. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Ironside and D. A. Filatov Extreme Population Structure and High Interspecific Divergence of the Silene Y Chromosome Genetics, October 1, 2005; 171(2): 705 - 713. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Roselius, W. Stephan, and T. Stadler The Relationship of Nucleotide Polymorphism, Recombination Rate and Selection in Wild Tomato Species Genetics, October 1, 2005; 171(2): 753 - 763. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. Braverman, B. P. Lazzaro, M. Aguade, and C. H. Langley DNA Sequence Polymorphism and Divergence at the erect wing and suppressor of sable Loci of Drosophila melanogaster and D. simulans Genetics, July 1, 2005; 170(3): 1153 - 1165. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. Betancourt, Y. Kim, and H. A. Orr A Pseudohitchhiking Model of X vs. Autosomal Diversity Genetics, December 1, 2004; 168(4): 2261 - 2269. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Innan, H.
- Articles by Stephan, W.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Innan, H.
- Articles by Stephan, W.








