Some statistical properties of samples of DNA sequences are studied under an infinite-site neutral model with recombination. The two quantities of interest are R, the number of recombination events in the history of a sample of sequences, and RM, the number of recombination events that can be parsimoniously inferred from a sample of sequences. Formulas are derived for the mean and variance of R. In contrast to R, RM can be determined from the sample. Since no formulas are known for the mean and variance of RM, they are estimated with Monte Carlo simulations. It is found that RM is often much less than R, therefore, the number of recombination events may be greatly under-estimated in a parsimonious reconstruction of the history of a sample. The statistic RM can be used to estimate the product of the recombination rate and the population size or, if the recombination rate is known, to estimate the population size. To illustrate this, DNA sequences from the Adh region of Drosophila melanogaster are used to estimate the effective population size of this species.
- Received November 8, 1985.
- Accepted April 30, 1985.