- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Kingman, J. F. C.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Kingman, J. F. C.
Origins of the Coalescent: 19741982
J. F. C. Kingmanaa University of Bristol, Bristol BS8 1TH, United Kingdom
Corresponding author: J. F. C. Kingman, Senate House, Tyndall Ave., Bristol, BS8 1TH, United Kingdom.
THE circle of ideas that has come to be known as the coalescent has proved to be a useful tool in a range of genetical problems, both in modeling biological phenomena and in making statistical sense of the rich data now available. In the 20 years since the concept crystallized, it has been extended in a number of directions. It is not the purpose of this note to document recent developments or to record the way in which others later arrived at similar conclusions by different routes; for that, consult, for instance, ![]()
![]()
![]()
The technique has been widely applied in recent studies of evolution, thanks to advances in molecular biology and computer technology. By being sample based, it has provided rigorous statistical analyses of population data and has provided a rationale for designing simulations. It has led to two different estimators of the key parameter,
= 4Nµ, where N is the effective population number and µ is the mutation rate, and, therefore, to a test of neutrality. It has provided estimates of the time to a common ancestor, and, in particular, a very long time estimate provided strong evidence for balancing selection in the ancestry of the HLA and other loci. This technique has also provided estimates of recombination and rate of selfing. It has been helpful in assessing migration patterns in human ancestry, in particular, sex differences as revealed by comparison of within- and between-group variability of Y chromosome and mitochondrial DNA. For a recent review, see ![]()
I shall not discuss these applications either. Mine is the much simpler aim of describing the way in which the ideas first came together, in the period leading to my 1982 articles. This is inevitably a personal account, but one that I hope is accurate, being based on records from these years. I have had the benefit of comments from Warren Ewens and Peter Donnelly, for which I am most grateful, but the interpretations and emphases are mine.
Three insights, in combination, comprise the essential basis of the coalescent. The first is the idea of tracing the ancestry of a gene backward in time and building up the family tree of the genes (at a particular locus) in a population sample back to the point at which they have a single common ancestor. This is just a generalization of Malécot's "identity by descent" (![]()
What is surprising is that these rather simple ideas took so long to emerge. For me the story begins in 1974, when I was traveling in Australia meeting mathematicians who shared my interests in random processes and their applications. I had not worked on genetics since, as a Cambridge undergraduate, I had published juvenilia on polymorphisms maintained by single locus selection. But in Melbourne I encountered Warren Ewens, who was exploring some ideas of ![]()
![]()
They considered a locus at which the different alleles can be labeled by a single numerical quantity and in which mutation causes a random addition or subtraction. Thus, a single line of descent will have genes that perform a random walk on the line. There was little biological credibility in such a description, but it accorded with the experimental techniques of gel electrophoresis, which were then the best way of distinguishing alleles (e.g., ![]()
Thus, the genes of one generation are represented by N points on the line. As we observe from generation to generation, the N points perform a "coherent random walk." The group strays farther and farther from its starting point, but the extent of the group remains relatively limited, and the distribution of the relative positions of the points converges to a proper limit. In my 1976 article I quoted Ewens as explaining this phenomenon by noting that "the probability that two points of Gt (the tth generation) have a common ancestor in Gs is 1 - (1 - N-1)t-s, which is near unity when (t - s) is large compared with N. Thus, the whole of Gt is descended from a common ancestor in Gt-
, where the random integer
=
(t) remains stochastically bounded as t
. The relative distances are the result, therefore, only of displacements in these
generations."
This is tantalizingly close to the idea of deducing the genetic structure of the population from the genealogy back to the common ancestor, but the article then goes off into complicated mathematical analysis, which adds little to our understanding of the model. There are, however, two pointers to later work. First, the algebra is such that it forces consideration of samples of size n from the population and produces a recursion between n and n + 1. This led me to an interest in the Ewens sampling formula (![]()
The second pointer was the use of Fourier transforms, which made easy a generalization from a gene as a single number to one described by a family of d numbers. This led ![]()
![]()
It would not have been difficult to use the machinery of this article (![]()
![]()
![]()
![]()
![]()
![]()
![]()
Thus, by the end of 1978, the nature of the Ewens sampling formula and its links with, on the one hand, nonrecurrent mutation and, on the other, the classical Kimura diffusion approach to neutral evolution were well understood. Moreover, I had noted in ![]()
As a result not only of this work but also of research into deterministic selective models, I was invited to be the speaker at a conference held at Iowa State University in June 1979 under the auspices of the Conference Board of the Mathematical Sciences and the National Science Foundation. The proceedings of that conference were published as ![]()
![]()
(t) above. It shows, in fact, that the probability that
(t) is greater than any integer r is at most 3(1 - N-1)r, the constant 3 being the best possible. But despite the title, there is no exploration of the family tree beyond the number of generations back to the common ancestor.
Our host at Iowa State, Oscar Kempthorne, had gathered an impressive group of participants, both mathematicians and biologists, and we discussed the problems of population genetics far into the night. It does not appear that the structure of the family tree entered into these debates, but it must have been there that the crucial idea was conceived, because the first account of the coalescent appears in ![]()
A footnote on the first page of that article observes that "genealogy means the whole family tree structure," so the cat is out of the bag. The argument starts from the observation that the Wright-Fisher multinomial model is equivalent to the rule that each member of a generation chooses its mother at random from the previous generation, the choices of different members being independent. This means that two members of the same generation have a probability (1 - N-1)r of having different ancestors r generations back. If time is measured in units of N generations and N
, the time to a com-mon ancestor for the two has a negative exponential distribution with probability density e-t. Now consider n members of a particular generation, and trace their family tree backward through time. For some time there will be n ancestors, but at some instant two of the lines come together. The probability density of this coalescence time (in the limit as N
) is ke-kt, where k =
is the number of pairs that might coalesce. Now trace back the n - 1 lines until they coalesce; the argument is the same with n replaced by n - 1 and so on, until the number of lines is reduced to 1. The article sets this up formally, by means of a Markov chain whose states are equivalence relations on {1, 2, ... , n}, and relates it to a representation of the Ewens sampling formula in terms of a certain "random paintbox." Thus the circle of ideas is complete.
![]()
![]()
There is a moral to this tale. The first articles on coherent random walks (![]()
![]()
![]()
| LITERATURE CITED |
|---|
CROW, J. F., 1989 Twenty five years ago in genetics: the infinite allele model. Genetics 121:93-96.
DONNELLY, P. J. and S. TAVARÉ, 1995 Coalescents and genealogical structure under neutrality. Annu. Rev. Genet. 29:401-421[Medline].
EWENS, W. J., 1972 The sampling theory of selectively neutral alleles. Theor. Popul. Biol. 3:87-112[Medline].
EWENS, W. J., 1974 A note on the sampling theory for infinite alleles and infinite sites models. Theor. Popul. Biol. 6:143-148[Medline].
FU, Y.-X. and W.-H. LI, 1999 Coalescing into the 21st century: an overview and prospects of coalescent theory. Theor. Popul. Biol. 56:1-10[Medline].
HUDSON, R. R., 1990 Gene genealogies and the coalescent process. Oxf. Surv. Evol. Biol. 7:1-44.
KINGMAN, J. F. C., 1975 Random discrete distributions. J. R. Stat. Soc. B 37:1-22.
KINGMAN, J. F. C., 1976 Coherent random walks arising in some genetical models. Proc. R. Soc. Lond. Ser. A 351:19-31
KINGMAN, J. F. C., 1977a A note on multi-dimensional models of neutral mutation. Theor. Popul. Biol. 11:285-290[Medline].
KINGMAN, J. F. C., 1977b The population structure associated with the Ewens sampling formula. Theor. Popul. Biol. 11:274-283[Medline].
KINGMAN, J. F. C., 1978 Random partitions in population genetics. Proc. R. Soc. Lond. Ser. A 361:1-20
KINGMAN, J. F. C., 1980 Mathematics of Genetic Diversity. SIAM, Philadelphia.
KINGMAN, J. F. C., 1982a On the genealogy of large populations. J. Appl. Probab. 19A:27-43.
KINGMAN, J. F. C., 1982b The coalescent. Stochastic Process. Appl. 13:235-248.
KINGMAN, J. F. C., 1982c Exchangeability and the evolution of large populations pp. 97112 in Exchangeability in Probability and Statistics, edited by G. KOCH and F. SPIZZICHINO. North-Holland, Amsterdam.
MORAN, P. A. P., 1975 Wandering distributions and the electrophoretic profile. Theor. Popul. Biol. 8:318-330[Medline].
NAGYLAKI, T., 1989 Gustave Malécot and the transition from classical to modern population genetics. Genetics 121:103-118.
OHTA, T. and M. KIMURA, 1973 A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population. Genet. Res. 22:201-204[Medline].
SINGH, R. S., R. C. LEWONTIN, and A. A. FELTON, 1976 Genetic heterogeneity within electrophoretic "alleles" of xanthine dehydrogenase in Drosophila pseudoobscura.. Genetics 84:609-629
STEPHENS, M. and P. J. DONNELLY, 2000 Inference in molecular population genetics. J. R. Stat. Soc. B 62:1-31.
WATTERSON, G. A., 1974 The sampling theory of selectively neutral alleles. Adv. Appl. Probab. 6:463-488.
WATTERSON, G. A., 1976 The stationary distribution of the infinitely-many neutral alleles diffusion model. J. Appl. Probab. 13:639-651.
This article has been cited by other articles:
![]() |
L. Liu and S. V. Edwards Phylogenetic Analysis in the Anomaly Zone Syst Biol, August 1, 2009; 58(4): 452 - 460. [Full Text] [PDF] |
||||
![]() |
L. Liu, L. Yu, D. K. Pearl, and S. V. Edwards Estimating Species Phylogenies Using Coalescence Times among Sequences Syst Biol, July 16, 2009; (2009) syp031v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Ramirez-Soriano, S. E. Ramos-Onsins, J. Rozas, F. Calafell, and A. Navarro Statistical Power Analysis of Neutrality Tests Under Demographic Expansions, Contractions and Bottlenecks With Recombination Genetics, May 1, 2008; 179(1): 555 - 567. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Sigwart Gene Genealogies, Variation and Evolution: A Primer in Coalescent Theory.--Jotun Hein, Mikkel H. Schierup, and Carsten Wiuf. 2004. Oxford University Press, Oxford. xiii + 276 pp. ISBN 0-19-852996-1, {pound}29.95 (paperback); ISBN 0-19-852995-3, {pound}65.00 (hardback). Syst Biol, December 1, 2005; 54(6): 986 - 987. [Full Text] [PDF] |
||||
![]() |
S. Hue, D. Pillay, J. P. Clewley, and O. G. Pybus Genetic analysis reveals the complex structure of HIV-1 transmission within defined risk groups PNAS, March 22, 2005; 102(12): 4425 - 4429. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Sjodin, I. Kaj, S. Krone, M. Lascoux, and M. Nordborg On the Meaning and Existence of an Effective Population Size Genetics, February 1, 2005; 169(2): 1061 - 1070. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Wakeley The Limits of Theoretical Population Genetics Genetics, January 1, 2005; 169(1): 1 - 7. [Full Text] [PDF] |
||||
![]() |
J. Wakeley Recent Trends in Population Genetics: More Data! More Math! Simple Models? J. Hered., September 1, 2004; 95(5): 397 - 405. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Li, S.-J. Meng, Z.-M. Men, Y.-X. Fu, and Y.-P. Zhang Genetic Diversity and Population History of Golden Monkeys (Rhinopithecus roxellana) Genetics, May 1, 2003; 164(1): 269 - 275. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Whittle Applied Probability in Great Britain Operations Research, January 1, 2002; 50(1): 227 - 239. [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Kingman, J. F. C.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Kingman, J. F. C.





