- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Wiuf, C.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Wiuf, C.
Recombination in Human Mitochondrial DNA?
Carsten Wiufaa Department of Statistics, University of Oxford, Oxford OX1 3TG, United Kingdom
Corresponding author: Carsten Wiuf, 1 S. Parks Rd., Oxford OX1 3TG, England., wiuf{at}stats.ox.ac.uk (E-mail)
| ABSTRACT |
|---|
The possibility of recombination in human mitochondrial DNA (mtDNA) has been hotly debated over the last few years. In this study, a general model of recombination in circular molecules is developed and applied to a recently published African sample (n = 21) of complete mtDNA sequences. It is shown that the power of correlation measures to detect recombination in circular molecules can be vanishingly small and that the data are consistent with the given model and no recombination only if the overall heterogeneity in mutation rate is <0.09.
RESEARCH based on the phylogenetic analyses of complete human mtDNA sequences has suggested that mtDNA undergoes recombination (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
A decay of linkage disequilibrium (LD) with physical distance is expected in recombining sequences. Putative evidence of decay with physical distance has been the core argument in favor of recombination in human mtDNA. In all published analyses of LD in mtDNA known to the author, it is assumed that genetic distance increases with increasing physical distance, or that genetic distance is (roughly) proportional to physical distance. However, this is not necessarily the case. Depending on the nature of the recombination process, the relation between the two measures of distance can be more complex than that. The aim of this article is to demonstrate how modeling of mtDNA evolution can play a role in understanding the potential effects of recombination. First, I describe a general model of recombination in circular molecules and, second, this model is applied to the data of African origin.
Imagine that a molecule, M1, in a generation recombines with probability r. In that case, a second molecule, M2, is chosen and a new molecule, M, is formed consisting of a part, P1, copied from M1 and a part, P2, copied from M2. This process is called the mixing process. In general, the possible forms of P1 and P2 are restricted only by P1
P2 =
and P1
P2 = M; that is, the parts are disjoint and together they constitute a whole molecule. The interpretation of r would vary with the type of genetic system in question. For example, in the context of mtDNA, r could be the probability that two mtDNA molecules meet and recombine, or r could be the probability that a part of a nuclear pseudogene is copied onto a mtDNA molecule. For now the exact definition of "generation" is left out, as well as any attempt to model aspects of the reproductive structure other than recombination, such as the number of offspring, possible paternal leakage (i.e., paternal mitochondria that enter the egg), etc. These aspects are of course of considerable importance in describing the genealogical structure of a sample of mtDNA sequences and are required in the analysis of data. First, I am concerned with the effect of mixing M1 and M2.
Several copies of the mitochondrial genome are present in each mitochondrion (generally 510 copies; ![]()
![]()
![]()
![]()
![]()
![]()
![]()
|
Consider the molecule M. Viewed from the present toward the past, the nucleotides in M share different ancestors. Those in P1 have ancestor M1 and those in P2 have ancestor M2 (Fig 1). Let y and z be two nucleotides in M separated by a physical distance d = d(y, z)
(the smaller of the two arcs from y to z), such that the length of the whole molecule is 1. The genetic distance, R(d), between y and z is here defined as the probability that y and z have different ancestors, given that M is created by recombination in the present generation. This definition is very convenient and relies only on the way nucleotides copied from M1 and M2 are joined into M; in fact R(d) is independent of the probability r of a recombination. It is then easy to compare the effect of mixing in different models without reference to how frequently recombination events occur. In the standard model of recombination for nuclear sequences (i.e., no heterogeneity in recombination rate), a linear relation is obtained between the two distance measures.
| MATERIALS AND METHODS |
|---|
Genetic distance, R(d):
Consider two nucleotides y and z at distance d = d(y, z). In the two cases A and B, respectively, R(d) is given by
![]() |
(1) |
and
![]() |
(2) |
respectively, where X1, X2, X3, and X4 denote the breakpoints ordered clockwise from y (Fig 2). In Equation 1 and Equation 2, R(d) is not necessarily increasing in d and R(d) might depend on y due to rate heterogeneity or hotspots. To proceed farther, some restrictions are put on the distribution of the breakpoints. Choose one breakpoint, X, randomly among all nucleotides and let Zi, i = 1, 2, or i = 1, 2, 3, 4, be the arc lengths numbered consecutively clockwise from X. Thus, rate homogeneity is assumed and no region (e.g., the control region) evolves in any different ways from the rest of the molecule (with respect to recombination). Without loss of generality it can be assumed that in model A, Z1
1/2. Equation 1 becomes
![]() |
(3) |
|
It can be shown that R(d) is increasing in d with a gradually decreasing slope (Appendix). Equation 2 does not take a similar simple form, but R(d) can be bounded upward,
![]() |
(4) |
(Appendix).
The examples of model A shown in Fig 3 are as follows: A1, Z1 = L
0.5, Z2 = 1 - L, R(d) = min{2d, 2L}; and A2, Z1
U(0, 0.5) (Z1 is uniform on the interval from 0 to 0.5), Z2 = 1 - Z1, R(d) = 2d(1 - d). Example A2 is equivalent to choosing two points at random on the circle. This provides two extreme models: one with high interference (the relative positions of the two breakpoints are correlated) and no variation in the length of the exchanged segment (A1); and one with no interference (the two breakpoints are uncorrelated) and relatively high variance, Var(Z1) = 1/48 (A2). The maximal variance is 1/16, which is obtained only if P(Z1 = 0) + P(Z1 =
) = 1 (![]()
|
Examples of model B shown in Fig 3 are as follows: B1, Z1 = Z2 = Z3 = L
, Z4 = 1 - 3L. The form of R(d) depends in a complicated way on L. For 0
L
1/6, R(d) = 4d, if 0
d < L; R(d) = -2d + 6L, if L
d < 2L; R(d) = 2d - 2L, if 2L
d < 3L; and R(d) = 4L, if 3L
d < 0.5. For
L
, R(d) = 4d, if 0
d < L; R(d) = -2d + 6L, if L
d < 2L; R(d) = 2d - 2L, if 2L
d < 1 - 3L; and R(d) = 2(1 - 4L), if 1 - 3L
d < 0.5. For
L
, R(d) = 4d, if 0
d < L; R(d) = -2d + 6L, if L
d < 1 - 3L; R(d) = -4d + 2, if 1 - 3L
d < 2L; and R(d) = 2(1 - 4L), if 2L
d < 0.5. And in the second example, B2, (Z1, Z2, Z3)
D3(1, 1, 1) (Dirichlet with parameters 1, 1, and 1), Z4 = 1 - (Z1 + Z2 + Z3), R(d) = 4d(1 - d){d2 + (1 - d)2}. Example B2 is equivalent to choosing four points at random on the circle. Similar to A1 and A2, example B1 shows high interference whereas B2 does not.
The genealogy of a sample:
It is assumed that recombination happens relatively rarely such that a single molecule is not likely to experience more than one recombination when passed on from parent to child. One human generation is counted as one generation. During one generation the molecule might be copied several times. The next generation of females and males is chosen from the previous generation by choosing randomly N females and N males from the reproductive N females. Assume a molecule M1 is drawn at random from the female pool. With probability r a recombination event occurs and a second parent, M2, from the male pool is chosen, and M1 and M2 are joined according to the mixing process. Thus, heteroplasmy is introduced through recombination of paternal leaked mtDNA, and r encompasses the probability of leakage, that two molecules meet (one female and one male) and that the two molecules mix. The inbreeding effective population size, Ne, is Ne = N (![]()
![]()
Model specifications:
The mutation process is assumed to be a two-state Jukes-Cantor model with rate heterogeneity (![]()
= 4.63 x 10-3 (![]()
= 2Neu (where u is the probability of a mutation per molecule per generation), is estimated from
and the coalescent model for two values of the rate heterogeneity parameter,
:
= 0.2 and
=
(all sites evolve with the same rate; Appendix). For
= 0.2,
= 4.88 x 10-3, and for
=
,
= 4.63 x 10-3. This estimate of
does not presuppose that recombination does not occur (Appendix). If the sequence divergence,
, is changed then
is changed accordingly, but not the time depth in the genealogy of the sequences. The recombination rate is varied over R = 0.1, 1, and 10, which on average gives
3.6R recombination events in the sample's history (![]()
U(0, 1/2).
| RESULTS |
|---|
The mixing process:
Fig 3 displays a number of examples of the cases A and B with a general mathematical treatment of the cases A and B put in MATERIALS AND METHODS. A linear relation between physical and genetic distance is achieved only if one-half of a molecule is exchanged. This has been proposed as a model of recombination in bacteria (![]()
R(d)
0.5. Such pairs provide information about variation in LD for distant pairs of nucleotides, rather than information about the decay of LD. Examples A1 and A2 differ in the amount of interference (in A1 there is no interference), but the shape of R(d) is very similar in the two examples.
If there are many breakpoints, there is no straightforward relation between genetic and physical distance. In the examples displayed in Fig 3B, two show an increase followed by a decrease in genetic distance, whereas the last example shows a rapid increase toward the maximal genetic distance (here 0.5). In the latter, most pairs of nucleotides are expected to show the same level of LD. These examples are by no means exhaustive, but they are sufficient to establish that the relation between physical and genetic distance is not necessarily simple. In general, there are two things in play:
- Interference: If the end points of the two exchanged segments are correlated (as in Fig 3B, example 1, but not in example 2), the genetic distance does not increase overall with physical distance. In Fig 3B, the "valley" is caused by interference.
- Circularity: The genetic distance between two points is affected by the molecule bending back onto itself. In general, the maximal genetic distance between two sites is limited to 1/2, but this limit might be lower if two (or more) segments are exchanged. As an example, consider Fig 3B, example 1, where the distance levels off at d = 0.46. The two exchanged segments plus the shortest segment between them take up 3L
3/4 of the molecule. If the distance, d, between two points exceeds 1 - 3L, an effect similar to interference appears and the genetic distance levels off. This should not be confused with the phenomenon that appears in Fig 3A and Fig B, example 1, L = 0.07, which is due to a small segment being exchanged.
African mtDNA data:
How do different models affect the conclusions drawn in recent studies (![]()
![]()
A brief summary of the African sample (![]()
16,500 nucleotides each. There are 367 polymorphic sites and of these 198 are informative (from Figure 4, ![]()

2 and
D', respectively, were calculated between physical distance and level of LD measured by the quantities
2 and D' (for a definition see ![]()
2 is also known at r2), respectively. The correlation coefficients were found to be consistent with a hypothesis of no recombination (
D' = 1 x 10-3 and 
2 = 2.23 x 10-6). If only three out of four possible haplotypes are present in two sites, the quantity |D'| is 1. In this sample the proportion of pairs where |D'| < 1, f(D'), is 0.093 (obtained from Figures 1 and 4 in ![]()
Because of the possibly complicated (and unknown) relation between physical and genetic distance, the correlation coefficient of physical distance with D' (or
2) is not necessarily a good indicator of the presence of recombination. The fraction f(D') might prove better; if recombination is frequent, many pairs of sites will show four distinct haplotypes (irrespectively of the mixing process) and in each of these cases, D' < 1. Thus, f(D') relates to the number of homoplasies in the sample and therefore to the amount of recombination. Unfortunately (in this context), homoplasies might also appear as a product of the mutation process.
Table 1 shows the power of 
2,
D', and f(D') for various choices of mixing process, amount of recombination, and mutation process (further details of parameters and the simulation method are given in MATERIALS AND METHODS). The fraction f(D') seems superior to the two correlation measures when the exchanged segment is small, and
D' has in most cases higher power than 
2. (That D' is a better measure of LD than
2 has repeatedly been stressed in the literature; ![]()
![]()
250 nucleotides (A1, L = 0.015) are exchanged at a recombination event, while in the second (A1, L = 0.15), 2500 nucleotides are exchanged. In example A2
4000 nucleotides are exchanged on average. If more than one segment is exchanged (e.g., as in example B) the power of the correlation measures can be vanishingly small whereas the power of f(D') is relatively high (results not shown). This is expected because
D' (or 
2) is designed to detect recombination only if genetic and physical distance is monotonically related, whereas f(D') is designed to detect recombination from an excess of homoplasies.
|
Other mutation processes, e.g., processes that allow for different rates in different regions or for transition/transversion bias, have reduced power compared to the mutation process applied in the simulations in Table 1, simply because more variable sequence patterns are expected.
The probability of no recombination events in a sample's history is

(MATERIALS AND METHODS). If R = 0.1, P (no recombination)
0.70, and most samples have not experienced recombination in its history. Consequently, recombination is hard to detect, irrespective of the mixing process. If R = 1, P(no recombination)
0.04, and most samples have experienced at least one recombination event. With an effective population size of 5000 and R = 1 the probability of a recombination (i.e., the probability of leakage, that two molecules meet, one sperm-derived and one egg-derived, and that they mix) is 10-4 per molecule per generation. This could easily turn out to be orders of magnitude too high, but seems to provide a lower bound to which recombination could be inferred from phylogenetic analyses (provided the exchanged segment in a recombination event is fairly large; Table 1). In general, the power increases with increasing sample size, but only slowly. Table 2 shows the power for three values of sample sizes n = 21, 50, and 100 assuming model A2.
|
But are the data consistent with any model of recombination? In Table 3 Table 4 Table 5, this is investigated using the statistics
D', f(D'), and I = no. inf/no. polym, the ratio of informative to polymorphic sites (informative sites are polymorphic sites where two different nucleotides are present in at least two sequences each). In general, the expectation of I does not vary much with R, but the variance decreases with increasing R (results not shown). Therefore, I indicates whether the data fit a constant population size model and can be thought of as a complementary test to Tajima's D. The observed value of I (0.53) is consistent with all the investigated models. In contrast, the observed value of f(D') (0.093) does not conform to a model with no recombination (P < 0.002 if
= 0.2 and P < 0.0002 if
=
). If
0.09, P
0.05 (results not shown). That is, more homoplasies are observed than can be explained by a model without recombination, unless
< 0.09. The observed value of
D' (0.001) is consistent, in some cases, with high levels of recombination. Thus, even though the correlation coefficients can be explained by a model without recombination, other aspects of the data cannot. Similar results were obtained using a sequence divergence higher and lower than the one applied here (results not shown).
|
|
|
What is the true value of the rate heterogeneity parameter? Estimates of
vary and depend on the region(s) under scrutiny (![]()
from data. Using tree-puzzle (![]()
= 0.10 ± 0.05 for the African data set. This is not surprising: If R = 0, the data are consistent with the given model only if
< 0.09, and
= 0.10 ± 0.05 allows for this to be true. If R > 0, any estimate of
is likely to be downward biased, predicting more rate heterogeneity than is actually there. In that case, the true value is thus expected to be higher than the estimated.
| DISCUSSION |
|---|
These findings suggest that human mtDNA might be recombining. A number of comments should be made at this stage.
The mutation process:
The excess of homoplasies observed in the African sample could be generated by a complicated mutation process. Strong rate heterogeneity (
> 0.09) in itself is not sufficient. Rate heterogeneity does not lead to a decay of LD with physical or genetic distance but to higher variance in the number of mutations in the sample. Linked mutations (one mutation happens as a result of another mutation) could possibly explain an excess of homoplasies. Estimates of the rate heterogeneity in the hypervariable regions vary; one study reports 0.26 in hypervariable region (HVR)I and 0.13 in HVRII (![]()
is estimated to be 0.15 in the African data (V. MACAULAY, personal communication). At the present stage it is difficult to judge whether skewness in rate heterogeneity could explain the data.
The recombination (mixing) process:
The models applied to analyze data are mathematically simple and assume that all sites potentially are sites for crossing over. In reality, this might be a very crude assumption (e.g., some bacteria have just a few crossing-over sites) and might complicate detection of recombination from DNA sequences data even further.
Demography:
In the analysis a population of constant size is assumed. If the population has been expanding, recombination is in general more difficult to detect, because the genealogy of the sample becomes star-like. Also the expected number of homoplasies in the data is fewer relative to a population of constant size (assuming the same number of mutations), because each mutation is more likely to affect only a single sequence.
mtDNA replication:
It is not known how mtDNA replicates (![]()
Paternal leakage:
Paternal leakage has been reported in inbred lines of mice (![]()
![]()
Concluding remarks:
Even small levels of recombination that may not be immediately detectable in the data can have pronounced effects if recombination is ignored in an analysis of the data (![]()
In conclusion, it seems vital and important that assessment of recombination in the mtDNA is based on proper modeling. Significant correlation of LD with physical distance might be a sign of recombination, but recombination cannot be ruled out as a result of a nonsignificant correlation. Phylogenetic and population genetic analyses might prove insufficient to judge whether human mtDNA is recombining partly because different candidate models vary considerably in what they predict and partly because the power to detect recombination from decay in LD might be vanishingly small. It is therefore not surprising that the original results in ![]()
![]()
| ACKNOWLEDGMENTS |
|---|
P. Donnelly, V. Macaulay, G. McVean, M. Przeworski, and K. Strimmer are thanked for commenting on the manuscript. The author was supported by Biotechnology and Biological Sciences Research Council grant 43/MMI9788 and by the Carlsberg Foundation, Denmark.
Manuscript received April 10, 2001; Accepted for publication July 17, 2001.
| APPENDIX |
|---|
Genetic distance, R(d):
Equation 3 follows from

using that X1 is uniformly distributed. By differentiation with respect to d,

and it follows that R(d) is increasing with a gradually decreasing slope. Equation 4 follows from reasoning similar to that above and P(X3 < d
X4)
P(X1 < d
X4)
2d.
Model specifications:
Assume the standard model of heterogeneity in mutation rates; that is, the mutation rate
i of column i in the alignment is given by
i =
ui, where ui is gamma distributed, ui
(
,
) (![]()
t, that two nucleotides sharing an ancestor t generations ago are different becomes

(![]()
, that two nucleotides are different is
![]() |
(A1) |
If
=
then
=
, and if
= 0 then
= 0. In (A1),
is also the expected pairwise sequence divergence in a random sample, irrespective of whether R = 0 or R > 0. If
is estimated by
, the observed average pairwise sequence divergence in the sample and
is assumed known; then an estimate of
can be produced from (A1). If R > 0, the average
has less variance than if R = 0.
| LITERATURE CITED |
|---|
AWADALLA, P., A. EYRE-WALKER, and J. MAYNARD SMITH, 1999 Linkage disequilibrium and recombination in hominid mitochondrial DNA. Science 286:2524-2525
DEVLIN, B. and N. RISCH, 1995 A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics 29:311-322[Medline].
ELSON, J. L., R. M. ANDREWS, P. F. CHINNERY, R. N. LIGHTOWLERS, and D. M. TURNBUL et al., 2001 Analysis of European mtDNAs for recombination. Am. J. Hum. Genet. 68:145-153[Medline].
EWENS, W., 1979 Mathematical Population Genetics. Springer-Verlag, New York.
EYRE-WALKER, A., N. H. SMITH, and J. MAYNARD SMITH, 1999 How clonal are human mitochondria? Proc. R. Soc. Lond. Ser. B 266:477-483[Medline].
GRIFFITHS, A. J. F., J. H. MILLER, D. T. SUZUKI, R. C. LEWONTIN and W. M. GELBART, 1996 An Introduction to Genetic Analysis. W. H. Freeman and Company, New York.
GU, X. and W.-H. LI, 1998 Estimation of evolutionary distances under stationary and nonstationary models of nucleotide substitution. Proc. Natl. Acad. Sci. USA 95:5899-5905
GUO, S.-W., 1997 Linkage disequilibrium measures for fine-scale mapping: a comparison. Hum. Hered. 47:310-314.
GYLLENSTEN, U., D. WHARTON, A. JOSEFSSON, and A. C. WILSON, 1991 Paternal inheritance of mitochondrial DNA in mice. Nature 352:255-257[Medline].
HOFFMANN-JØRGENSEN, J., 1994 Probability with a View Towards Statistics. Chapman & Hall, New York.
HOWELL, N., 1997 mtDNA recombination: what do in vitro data mean? Am. J. Hum. Genet. 61:18-22[Medline].
HUDSON, R. R., 1983 Properties of a neutral allele model with intragenic recombination. Theor. Popul. Biol. 23:183-201[Medline].
HUDSON, R. R., 1994 Analytical results concerning linkage disequilibrium in models with genetic transformation and conjugation. J. Evol. Biol. 7:535-548.
INGMAN, M., H. KAESSMANN, S. PÄÄBO, and U. GYLLENSTEN, 2000 Mitochondrial genome variation and the origin of modern humans. Nature 408:708-713[Medline].
JORDE, L. B. and M. BAMSHAD, 2000 Questioning evidence for recombination in human mitochondrial DNA. Science 288:1931.
KAJANDER, O. A., A. T. ROVIO, K. MAJAMAA, J. POULTON, and J. N. SPELBRINK et al., 2000 Human mtDNA sublimons resemble rearranged mitochondrial genomes found in pathological states. Hum. Mol. Genet. 9:2821-2835
KING, R. C., and W. D. STANSFIELD, 1990 A Dictionary of Genetics. Oxford University Press, Oxford.
KIVISILD, T. and R. VILLEMS, 2000 Questioning evidence for recombination in human mitochondrial DNA. Science 288:1931.
KUMAR, S., P. HEDRICK, T. DOWLING, and M. STONEKING, 2000 Questioning evidence for recombination in human mitochondrial DNA. Science 288:1931.
MACAULAY, V., M. RICHARDS, and B. SYKES, 1999 Mitochondrial DNA recombinationno need to panic. Proc. R. Soc. Lond. Ser. B 266:2037-2039[Medline].
MEYER, S., G. WEISS, and A. VON HAESELER, 1999 Pattern of nucleotide substitution and rate heterogeneity in the hypervariable regions I and II of human mtDNA. Genetics 152:1103-1110
PARSONS, T. J. and J. A. IRWIN, 2000 Questioning evidence for recombination in human mitochondrial DNA. Science 288:1931.
SAVILLE, B. J., Y. KOHLI, and J. B. ANDERSON, 1998 mtDNA recombination in a natural population. Proc. Natl. Acad. Sci. USA 95:1331-1335
SCHIERUP, M. and J. HEIN, 2000 Consequences of recombination on traditional phylogenetic analysis. Genetics 156:879-891
SHITARA, H., J.-I. HAYASHI, S. TAKAHAMA, H. KANEDA, and H. YONEKAWA, 1998 Maternal inheritance of mouse mtDNA in interspecific hybrids: segregation of the leaked paternal mtDNA followed by the prevention of subsequent paternal leakage. Genetics 148:851-857
STRIMMER, K. and A. VON HAESELER, 1996 Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies. Mol. Biol. Evol. 13:964-969.
THYAGARAJAN, B., R. A. PADUA, and C. CAMBELL, 1996 Mammalian mitochondria possess homologous DNA recombination activity. J. Biol. Chem. 271:27536-27543
This article has been cited by other articles:
![]() |
T. Stadler and L. F. Delph Ancient mitochondrial haplotypes and evidence for intragenic recombination in a gynodioecious plant PNAS, September 3, 2002; 99(18): 11730 - 11735. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Innan and M. Nordborg Recombination or Mutational Hot Spots in Human mtDNA? Mol. Biol. Evol., July 1, 2002; 19(7): 1122 - 1127. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. McVean, P. Awadalla, and P. Fearnhead A Coalescent-Based Method for Detecting and Estimating Recombination From Gene Sequences Genetics, March 1, 2002; 160(3): 1231 - 1241. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Wiuf, C.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Wiuf, C.










