- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Polanski, A.
- Articles by Kimmel, M.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Polanski, A.
- Articles by Kimmel, M.
New Explicit Expressions for Relative Frequencies of Single-Nucleotide Polymorphisms With Application to Statistical Inference on Population Growth
A. Polanskia,b and M. Kimmelaa Department of Statistics, Rice University, Houston, Texas 77005
b Institute of Automation, Silesian Technical University, 44-100 Gliwice, Poland
Corresponding author: M. Kimmel, Rice University, M.S. 138, 6100 Main St., Houston, TX 77005., kimmel{at}rice.edu (E-mail)
Communicating editor: N. TAKAHATA
| ABSTRACT |
|---|
We present new methodology for calculating sampling distributions of single-nucleotide polymorphism (SNP) frequencies in populations with time-varying size. Our approach is based on deriving analytical expressions for frequencies of SNPs. Analytical expressions allow for computations that are faster and more accurate than Monte Carlo simulations. In contrast to other articles showing analytical formulas for frequencies of SNPs, we derive expressions that contain coefficients that do not explode when the genealogy size increases. We also provide analytical formulas to describe the way in which the ascertainment procedure modifies SNP distributions. Using our methods, we study the power to test the hypothesis of exponential population expansion vs. the hypothesis of evolution with constant population size. We also analyze some of the available SNP data and we compare our results of demographic parameters estimation to those obtained in previous studies in population genetics. The analyzed data seem consistent with the hypothesis of past population growth of modern humans. The analysis of the data also shows a very strong sensitivity of estimated demographic parameters to changes of the model of the ascertainment procedure.
A lot of research has been done to develop methods for discovery of single-nucleotide polymorphisms (SNP) and to characterize distributions of SNPs across the genome (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Using population genetics methods to model and analyze SNPs opens an area for investigating problems like predicting frequencies of SNPs under various demographic scenarios, inferring demographic parameters and history from sampling frequencies of SNPs, comparing estimates obtained on the basis of SNP data to those obtained with other methods, and evaluating efficiency of using SNP data for estimation of population parameters. Several interesting studies were carried out in this area. Studies by ![]()
![]()
![]()
![]()
= 4Neµ of the effective population size Ne and mutation rate µ, under assumption of constant population size, was studied by ![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Sampling distributions of SNP frequencies in populations with time-varying size can be calculated with the use of analytical expressions for the expected lengths of branches in the coalescence tree derived in the articles by ![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
| METHODS |
|---|
We consider the process of coalescence with time-changing effective population size. Notation for the coalescence tree, for the sample of size n = 5 DNA sequences, is shown in Fig 1. Time t is measured, in number of generations, from the present to the past. Random times between coalescence events are denoted by Sn, Sn-1, ... , S2 and sn, sn-1, ... , s2. Cumulative times to coalescence, from sample of size n to sample of size k - 1, are denoted by Tk, k = 2, 3, ... , n, and their realizations by corresponding lowercase letters tn, tn-1, ... , t2, 0 < tn < tn-1 ... < t2.
|
We assume that an observed SNP was produced by a single, neutral mutation, like the one denoted in Fig 1 by an open circle. In Fig 1 sequences 4 and 5 have mutant alleles (bases), while sequences 1, 2, and 3 have ancestral ones. In the situation where it is not known which allele is mutant and which is ancestral, we use the terms rare and frequent allele. In other words, the SNP in Fig 1 has configuration b = 2 mutant vs. n - b = 3 ancestral, or b = 2 rare vs. n - b = 3 frequent alleles. We assume that mutation intensity for SNPs is very low; i.e., they follow the infinite-sites mutation model.
Probability that a SNP has b mutant bases:
Probability qnb that a SNP site in a sample of n chromosomes has b mutant bases, under the infinite-sites mutation model, is given by ![]()
![]()
![]()
![]()
![]()
![]() |
(1) |
where 0 < b < n, Sk = Tk - Tk+1, and Tn+1 = 0.
The above expression can be written as
![]() |
(2) |
(![]()
![]() |
(3) |
are expectations of times distributed as
![]() |
(4) |
with the effective population size history described by a function of reverse time,
![]() |
(5) |
Coefficients Ankj are defined by the expression
![]() |
(6) |
Equation 2 is an analytic expression for probabilities qnb. ![]()
![]()
![]()
Methods for computation of qnb for large genealogies:
To avoid large numerical errors in summations in (2) for genealogies n > 50, one needs to apply computations with precision of hundreds, or even thousands, of decimal digits (![]()
![]()
Below, we present a method for computing qnb for large genealogies, which is more general than the one developed by ![]()
![]() |
(7) |
![]() |
(8) |
In the above, we introduced coefficients
![]() |
(9) |
and
![]() |
(10) |
For k > n - b + 1, the elements in the sum (10) become zero, so the upper limit j can be replaced by min(j, n - b + 1). Coefficients Vnj and Wnbj remain the same for all histories of effective population size Ne(t). Once calculated, they can be stored in computer memory or tabularized and reused when we wish to analyze different histories Ne(t), e.g., when maximizing likelihood function with respect to population growth parameters. Their important property is that their growth, when genealogy size n increases, is very moderate; e.g., for
; for
; and for
. In Fig 2 growth plots of maxb,j|Wnbj| and maxj(Vnj) vs. n are shown. One can see that both plots are, asymptotically, of the power type with the exponent less than one.
|
Expressions in (10) and (9) are sums of hypergeometric series, which can be seen by factoring the denominators in (6),
, and then expressing coefficients Ankj in (6) as follows:
![]() |
(11) |
Substituting (11) in (9) and using Chu-Vandermode identity (![]()
![]() |
(12) |
Coefficients Wnbj in expression (10) can be efficiently computed with the use of recursive procedures (![]()
![]()
![]() |
(13) |
![]() |
(14) |
![]() |
(15) |
The above recursions are numerically stable and very fast. We used them, implemented in a standard double-precision arithmetic, for genealogies consisting of thousands of DNA sequences (the largest value of n tested was n = 5000). We did not perform precise measurements, but usually, when calculating probabilities qnb, according to (8), computing coefficients Wnbj and Vnj takes only a small fraction of the time, while most of the computing effort is needed to evaluate expectations ej.
Influence of the ascertainment procedure on SNP sampling frequencies:
Most of the published data on SNP sampling frequencies are obtained in a two-step process, where the first step involves discovering chromosomal locations of a number of SNPs, and the second one involves DNA sequencing of a sample of n chromosomes restricted to locations discovered in the first step. The first step is called SNP ascertainment and is based on number of chromosomes smaller than n. As described in previous studies, taking into account the ascertainment scheme is a very important aspect of SNP data analysis. Below we derive expressions for modeling the way in which ascertainment modifies SNP sampling frequencies.
We use the following notation introduced by ![]()
To determine how ascertainment modifies probability distribution (22), we merge ascertainment and data sets to obtain the joint set of size nJ = nD + nO + nA. We treat the ascertainment procedure as sampling SNP alleles, without replacement, from the joint set. A SNP is discovered if (a) both alleles are present in the ascertainment sample and (b) none of the alleles in the ascertainment sample has number of copies less than G, where G is a predetermined threshold. Since the joint set contains elements of two types (two alleles), the number of copies of alleles in the ascertainment sample follows a hypergeometric distribution. We analyze two cases: (i) no overlap, which means nO = 0, n = nD, nJ = nD + nA; and (ii) overlap only, which means nA = 0, n = nJ = nD + nO. The case where both overlap and ascertainment-only samples are present is obtained by combining i and ii. We compute frequencies of discovered SNPs in the data set, which follow from conditions a and b above. We analyze first the case i. If a SNP in the joint set has b mutant and nJ - b ancestral bases, then the probability that a sample of size nA from the joint set has ß mutant and nA - ß ancestral bases is
![]() |
(16) |
For a SNP to be discovered, ß must satisfy G
ß
nA - G, with G defined as above. Moreover, the following inequalities must hold: ß
b, nA - ß
nJ - b. Consequently, the probability
An
that a discovered SNP in the data-only set i has
= b - ß mutant and nD -
ancestral alleles is
![]() |
(17) |
= 0, 1, ... , nD. Probabilities qnb are given by (8). The relation
= b - ß follows from the fact that ß chromosomes with mutant bases are removed from the joint set. The numerator in (17) is a sum of contributions to
AnD
for possible values of ß, while the denominator is a normalizing factor. For case ii assume again that a SNP in the joint set has b mutant and nJ - b ancestral bases. The probability that a sample of nO has ß mutant and nO - ß ancestral bases is given by (16) with nA replaced by nO. For this SNP to be discovered ß must satisfy G
ß
nO - G. Consequently, the probability
OnJb that a discovered SNP in the joint set ii has b mutant and nJ - b ancestral alleles is
![]() |
(18) |
b = G, ... , nJ - G.
If it is not known which of the alleles is mutant and which one is ancestral, we need to symmetrize
AnD
and
OnJb to get probability of data configuration. For case i we have expression
![]() |
(19) |
for the probability that the rare allele has
copies. For case ii the probability that there are b copies of the rare allele is
![]() |
(20) |
In the above [n/2] denotes greatest integer
n/2.
In the sequel, we refer to the models described above as type i and type ii ascertainment, respectively.
Likelihood function of the sample:
Data studied are derived from a number of SNP sites. Let us denote the number of SNP loci by M and random variables defined by diallelic data by
![]() |
(21) |
where XRm is the number of copies of the less frequent (rare) allele and XFm is the number of copies of the more frequent one, in the sample of
. It is possible that
for some indices m, in which case both alleles are equally frequent. We assume that the ancestral state is not known. Then, for an SNP (XRm,XFm), the probability that we observe configuration bm, nm - bm, bm
[nm/2] is
![]() |
(22) |
where
(·) is the Kronecker delta function and qnb are probabilities defined and evaluated in the previous section.
When SNP sites are located far from one another, random variables {X1, X2, ... , XM} in (21) are independent. If the observed numbers of copies of rare alleles are
, then the log likelihood of the sample (21) is
![]() |
(23) |
(![]()
![]()
![]() |
(24) |
where cb denotes number of SNP loci in the sample, which have configuration of b copies of the rare allele vs. n - b copies of the frequent allele. Subsequently, we use expressions (23) and (24) to compute likelihoods of SNP samples with different ascertainment models. To specify the ascertainment model we substitute in (23) or (24), pnb = pnb [expression (22), no ascertainment step],
[expression (19), ascertainment model type i], or
[expression (20), ascertainment model type ii].
| RESULTS |
|---|
Exponential history of population size:
In our computations we assume an exponential history of effective population size. In previous publications devoted to SNP and demography, ![]()
![]()
![]()
For an exponential scenario of population growth
![]() |
(25) |
expectations in (3) become
![]() |
(26) |
(![]()
, Re (µ) > 0 (![]()
)(rNe0)-1 in (26) becomes large, computing el(Ne0, r) involves solving product of the type
· 0. For (
)(rNe0)-1 > 50, we used expansion,
![]() |
(27) |
(![]()
![]() |
(28) |
which allowed canceling exp[(
)(rNe0)-1] in (26).
It turns out that sampling frequencies of SNPs depend only on the product parameter
= rNe0 of initial effective population size and exponential factor.
Distributions of SNP frequencies:
Fig 3 provides examples of probabilities of different configurations of SNP sites, for sample size n = 30, and different values of the parameter
(0, 1, 10), under the assumption that data collection did not include an ascertainment step [expression (22)] or under the ascertainment model of type ii [expression (20)] with nO = 10, G = 1, or G = 2. As already reported in many articles, increasing
results in higher proportions of rare alleles in the sample. Plots in Fig 3 also show how ascertainment modifies the distribution of SNP frequencies. Increasing the threshold value G flattens the distribution of frequencies. Both types of ascertainment (i and ii) have similar effects on SNP frequency distributions (results not shown).
|
Likelihood-ratio tests to detect signatures of population growth:
An interesting issue is our power to test hypothesis H0 of evolution with constant population size,
=
0 = 0, against the alternative hypothesis H1 of population expansion,
=
1 > 0, on the basis of SNP data. It is also of interest to determine how this power is affected by the ascertainment step of data collection. From previous computations it follows that SNP data can be seen as samples from multinomial distributions given by expressions (22), (19), or (20). Assuming that the number of SNP sites is always large enough to allow asymptotic approximation (BICKEL and DOKSUM 2001, p. 227) we computed powers of single-value vs. single-value likelihood-ratio tests of statistical null hypothesis H0 (constant population size
=
0 = 0) vs. the alternative H1 (population expansion with
=
1 > 0). We assumed significance level
= 0.05 and values of
1,
1 = 0.1,
1 = 1,
1 = 10,
1 = 100. Table 1A and Table 1B, gives powers of likelihood-ratio tests for sample size n = 50, for different models of ascertainment: no ascertainment [probabilities given by expression (22)] or ascertainment model type ii [expression (19)] with parameters nO and G. Table 1A is for the number of SNP loci M = 30, and Table 1B is for M = 100. From values of powers of tests depicted in Table 1A and Table 1B, one can see that the cases
0 = 0,
1 = 0.1 are practically indistinguishable;
0 = 0,
1 = 1 may be distinguished only for a large enough number of SNP sites, while
0 = 0,
1 = 10, or
1 = 100 are rather easily distinguishable even for small numbers of SNPs. The ascertainment step in data collection can deteriorate the power to detect signatures of population growth. Increasing the threshold value of G, the aim of which typically is filtering out sequencing errors in the data, also progressively lowers the probability of rejecting the hypothesis H0 of constant population size; i.e., it increases the probability of committing type II error. This results from the flattening effect of increasing G observed in Fig 3.
|
Data analysis:
Data on segregating sites in mitochondrial DNA from Cann et al. (1987):
First, we apply our method to the data on segregating sites in mitochondrial DNA from the article by ![]()
= rNe0 in (25).
Data in ![]()
= 80. The 95% confidence interval for this estimate, obtained with the use of likelihood-ratio statistics (BICKEL and DOKSUM 2001), is
[40, 166].
|
|
Segregating sites collected by ![]()
, we have performed 100 coalescent simulations of genealogies representing ancestries for 148 mtDNA sequences. We added mutations along branches of coalescence trees according to the infinite-sites model with intensity µ. In the simulations we assumed mutational time scale
= 2µt and exponential change of
(
) = 2Ne(
)µ,
(
) =
0 exp(-
), with parameters
0 = 400,
= 0.2. So, the true value of the product parameter
was
= 80. For each of these 100 simulation experiments we treated segregating sites as independent SNPs and we estimated the parameter
by maximizing likelihood (24). We obtained the mean of estimates equal to 86.8 and standard deviation equal to 29.7. This confirms that our approach, at least for these specific values, will allow us to obtain a reasonable estimate of
. From these simulations follows the estimate of 95% confidence interval for the parameter
, when the sample consists of 148 mtDNA sequences and the demography is as shown above. This estimate, [mean - 2 standard deviations, mean + 2 standard deviations], equals
[27, 147]. This estimate is quite consistent with the 95% confidence interval obtained from likelihood-ratio statistics,
[40, 166]. The shift toward the left of the confidence interval based on simulations results from the asymmetric shape of the distribution of the estimate of
. By applying a logarithmic transformation to simulation results (estimates of
) we were able to obtain almost perfect agreement of the two confidence intervals, [40, 166] and [45, 175].
SNP data from Picoult-Newberg et al. (1999) and Trikka et al. (2002):
There are several population studies in the literature where relative frequencies of SNP alleles are shown. We have chosen data from the research by ![]()
![]()
![]()
![]()
When analyzing SNP data we followed remarks given in the source articles (![]()
![]()
![]()
![]()
= 3.9, with the 95% confidence interval, obtained with the use of likelihood-ratio statistics,
[0, 105.3]. Log-likelihood function for the data on Caucasians from ![]()
= 0.78, with the 95% confidence interval, obtained with the use of likelihood-ratio statistics,
[0, 6.1].
|
|
Sensitivity of estimates to ascertainment model parameters:
A question arises: How sensitive are the estimates of parameter
to changes of the model of the ascertainment? We studied this question by increasing or decreasing the value of the threshold G in expressions (17) and (18). Indexing the estimated parameter with nA, nO, and G, we can denote our estimates from the previous section as
![]() |
(29) |
and
![]() |
(30) |
Here we compute estimates
,
on the basis of data from ![]()
,
on the basis of data from ![]()
![]()
assumes that no ascertainment procedure is taken into account. The model to estimate
is inconsistent with complete data of ![]()
we have removed this one locus.
The results of computations show an extreme sensitivity of estimates to the ascertainment model. Notably,
![]() |
(31) |
and
![]() |
(32) |
In (31), by
we meant that the likelihood function was increasing for values of
up to 108.
The fact that the ascertainment model strongly affects estimates of parameters is also confirmed in the previous articles on SNPs. ![]()
![]()
![]()
= 0. The need for careful modeling of ascertainment is also stressed by ![]()
| DISCUSSION |
|---|
The methods developed in this article allow us to analyze large data sets and carry out computations for different parameter values, which helps us draw more conclusions from data. We have shown examples of applying our methodology to the study of several issues arising in SNP data analysis.
We are particularly interested in the problem of what are reasonable values of the exponential growth product parameter
= rNe0 obtained on the basis of DNA data. Insight into this problem can be gained by comparing estimates obtained using different approaches.
Our aim when estimating
from relative frequencies of segregating sites in the article by ![]()
![]()
![]()
![]()
![]()
, for both the worldwide population and Caucasians, fit into the interval from
= 50 to
= 500. Our estimate of
= 80 is consistent with the above ranges.
Mutation intensity (per site) at autosomal loci is approximately one order of magnitude lower than that in mtDNA (![]()
= rNe0 is invariant with respect to timescale changes and therefore does not depend on the value of the mutation intensity. We can assume that mutation intensity is used only to scale the time axis. The effective population size for autosomal loci is four times the effective population size for loci at mtDNA. So, the estimate of
from mtDNA should be one-fourth the estimate of
from nuclear DNA. Taking into account the large stochastic variation, the estimates of
coming from SNP data should then be comparable (of the same order of magnitude) to those obtained from mtDNA.
However, our estimates of the parameter
based on SNP data,
= 3.9 and
= 0.78, are markedly smaller than values coming from mtDNA, which runs counter to the expected tendency. Differences between our estimates and the above ranges can be, probably, attributed to two factors. The first one, mentioned by ![]()
toward lower values. The second factor, which comes from our analysis, is the sensitivity to the parameters of the ascertainment model, shown in (31) and (32). With this high sensitivity, even a small unmodeled factor resulting from eliminating some low-frequency SNPs by assuming that they were sequencing errors can lead to estimates substantially lower than the true value of
.
| ACKNOWLEDGMENTS |
|---|
The authors are grateful to Peter Paule and Markus Schorn for making their program, implementing Zeilberger's algorithm in Mathematica, available to the scientific community. The authors were supported by National Institutes of Health grants GM58545 and CA75432, Polish Scientific Committee (KBN) research projects PBZ/KBN/040/P04/2001 and 4T11F 01824, and NATO collaborative linkage grant LST.CLG.977845.
Manuscript received January 29, 2003; Accepted for publication May 30, 2003.
| LITERATURE CITED |
|---|
ALTSHULER, D., V. J. POLLAR, C. R. COWLES, W. J. VAN ETTEN, and J. BALDWIN et al., 2000 A SNP map of the human genome generated by reduced representation shotgun sequencing. Nature 407:582-589.[Medline]
BICKEL, P. J., and K. A. DOCKSUM, 2001 Mathematical Statistics: Basic Ideas and Selected Topics. Prentice Hall, Upper Saddle River, NJ.
BOERWINKLE, E., D. L. ELLSWORTH, D. M. HALLMAN, and A. BIDDINGER, 1996 Genetic analysis of artherosclerosis: a research paradigm for the common chronic diseases. Hum. Mol. Genet. 5:1405-1410.[Abstract]
BONNEN, P. E., M. D. STORY, C. L. ASHORN, T. A. BUCHHOLZ, and M. M. WEIL et al., 2000 Haplotypes at ATM identify coding-sequence variation and indicate a region of extensive linkage disequilibrium. Am. J. Hum. Genet. 67:1437-1451.[Medline]
CANN, R. L., M. STONEKING, and A. C. WILSON, 1987 Mitochondrial DNA and human evolution. Nature 325:31-36.
CARGILL, M., D. ALTSHULER, J. IRELAND, P. SKLAR, and K. ARDLIE et al., 1999 Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat. Genet. 22:231-238.[Medline]
COLLINS, F. S., M. S. GUYER, and A. CHAKRAVARTI, 1997 Variations on a theme: cataloging human DNA sequence variation. Science 278:1580-1581.
DURRETT, R. and V. LIMIC, 2001 On the quantity and quality of single nucleotide polymorphisms in the human genome. Stoch. Proc. Appl. 93:1-24.
EBERLE, M. A. and L. KRUGLYAK, 2000 An analysis of strategies for discovery of single nucleotide polymorphisms. Genet. Epidemiol. 19(Suppl 1):S29-S35.
GRAHAM, R. L., D. E. KNUTH and O. PATASHNIK, 1998 Concrete Mathematics. A Foundation for Computer Science, Ed. 2. Addison-Wesley, Reading, MA.
GRADSHTEYN, I. S., and I. M. RYZHIK, 1980 Table of Integrals, Series and Products, Ed. 2. Academic Press, San Diego.
GRIFFITHS, R. C. and S. TAVARE, 1998 The age of a mutation in the general coalescent tree. Stoch. Models 14:273-295.
FU, X.-Y., 1995 Statistical properties of segregating sites. Theor. Popul. Biol. 48:172-197.[Medline]
HALUSHKA, M. K., J. B. FAN, K. BENTLEY, L. HSIE, and N. SHEN et al., 1999 Patterns of single-nucleotide polymorphisms in candidate genes for blood pressure homeostasis. Nat. Genet. 22:239-247.[Medline]
KRUGLYAK, L., 1999 Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat. Genet. 22:139-144.[Medline]
KUHNER, M. K., P. BEERLI, J. YAMAMOTO, and J. FELSENSTEIN, 2000 Usefulness of single nucleotide polymorphism data for estimating population parameters. Genetics 156:439-447.
LI, W. H., 1997 Molecular Evolution. Sinauer Associates, Sunderland, MA.
MARTH, G. T., I. KORF, M. D. YANDELL, R. T. YEH, and Z. GU et al., 1999 A general approach to single-nucleotide polymorphism discovery. Nat. Genet. 23:452-456.[Medline]
NIELSEN, R., 2000 Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics 154:931-942.
PAULE, P. and M. SCHORN, 1994 A Mathematica version of Zeilberger's algorithm for proving binomial coefficients identities. J. Symbol. Comput. 11:1-25.
PETKOVSEK, M., H. S. WILF and D. ZEILBERGER, 1996 A=B. A. K. Peters, Wellesley, MA (http://www.cis.upenn.edu/~wilf/AeqB.html).
PICOULT-NEWBERG, L., T. E. IDEKER, M. G. POHL, S. L. TAYLOR, and M. A. DONALDSON et al., 1999 Mining SNPs from EST databases. Genome Res. 9:167-174.
POLANSKI, A., M. KIMMEL, and R. CHAKRABORTY, 1998 Application of a time-dependent coalescent process for inferring the history of population changes from DNA sequence data. Proc. Natl. Acad. Sci. USA 95:5456-5461.
POLANSKI, A., A. BOBROWSKI, and M. KIMMEL, 2003 A note on distributions of times to coalescence under time-dependent population size. Theor. Popul. Biol. 63:33-40.[Medline]
RENWICK, A., P. BONNEN, D. TRIKKA, D. NELSON, and R. CHAKRABORTY et al., 2003 Sampling properties of estimators of nucleotide diversity at discovered SNP sites. Appl. Math. Comp. Sci. in press.
RISH, N. J., 2000 Searching for genetic determination in the new millenium. Nature 405:847-856.[Medline]
ROGERS, A. R. and H. HARPENDING, 1992 Population growth makes waves in the distribution of pairwise genetic differences, Mol. Biol. Evol. 9:552-569.[Abstract]
SHERRY, S. T., H. C. HARPENDING, M. A. BATZER, and M. STONEKING, 1997 Alu evolution in human populations: using the coalescent to estimate effective population size. Genetics 147:1977-1982.[Abstract]
SLATKIN, M. and R. R. HUDSON, 1991 Pairwise comparisons of mitochondrial DNA in stable and exponentialy growing populations. Genetics 129:555-562.[Abstract]
TRIKKA, D., Z. FANG, A. RENWICK, S. H. JONES, and R. CHAKRABORTY et al., 2002 Complex SNP-based haplotypes in three human helicases: implication for cancer association studies. Genome Res. 12:627-639.
WAKELEY, J., 2001 The coalescent in an island model of population subdivision with variation among demes. Theor. Popul. Biol. 59:133-144.[Medline]
WAKELEY, J., R. NIELSEN, S. N. LIU-CORDERO, and K. ARDLIE, 2001 The discovery of single-nucleotide polymorphismsand inferences about human demographic history. Am. J. Hum. Genet. 69:1332-1347.[Medline]
WANG, D. G., J. B. FAN, C. J. SIAO, A. BERNO, and P. YOUNG et al., 1998 Large scale identification, mapping and genotyping of single-nucleotide polymorphisms in the human genome. Science 280:1077-1082.
WEISS, G. and A. VON HAESELER, 1998 Inference on population history using a likelihood approach. Genetics 149:1539-1546.
WOODING, S. and A. ROGERS, 2002 The matrix coalescent and an application to human single-nucleotide polymorphisms. Genetics 161:1641-1650.
YANG, Z., G. WONG, M. A. EBERLE, M. KIBUKAWA, and D. A. PASSEY et al., 2000 Sampling SNPs. Nat. Genet. 26:13-14.
This article has been cited by other articles:
![]() |
I. Hellmann, Y. Mang, Z. Gu, P. Li, F. M. de la Vega, A. G. Clark, and R. Nielsen Population genetic analysis of shotgun assemblies of genomic sequences from multiple individuals Genome Res., July 1, 2008; 18(7): 1020 - 1029. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. B. Rosenblum and J. Novembre Ascertainment Bias in Spatially Structured Populations: A Case Study in the Eastern Fence Lizard J. Hered., July 4, 2007; (2007) esm031v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. L.F. Johnson and M. Slatkin Inference of population genetic parameters in metagenomics: A clean look at messy data Genome Res., October 1, 2006; 16(10): 1320 - 1327. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Nielsen, M. J. Hubisz, and A. G. Clark Reconstituting the Frequency Spectrum of Ascertained Single-Nucleotide Polymorphism Data Genetics, December 1, 2004; 168(4): 2373 - 2382. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. M. Adams and R. R. Hudson Maximum-Likelihood Estimation of Demographic Parameters Using the Frequency Spectrum of Unlinked Single-Nucleotide Polymorphisms Genetics, November 1, 2004; 168(3): 1699 - 1712. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited












) vs. n.



























