Originally published as Genetics Published Articles Ahead of Print on September 30, 2004.

Genetics, Vol. 170, 365-374, May 2005, Copyright © 2005
doi:10.1534/genetics.103.022061

Using Molecular Sizes of Simple Sequence Repeats vs. Discrete Binned Data in Assessing Probability of Ancestry

Application to Maize Hybrids

* University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030
{dagger} Pioneer Hi-Bred International, Johnston, Iowa 50131
{ddagger} Medtronic, Minneapolis, Minnesota 55432

2 Corresponding author: Department of Biostatistics, University of Texas M. D. Anderson Cancer Center, 1515 Holcombe Blvd., Unit 447, Houston, TX 77030-4009.
E-mail: dberry{at}mdanderson.org

Most inferential methods for profiling genotypes based upon the use of DNA fragments use molecular-size data transcribed into discrete bins, which are intervals of DNA fragment sizes. Categorizing into bins is labor intensive with inevitable arbitrariness that may vary between laboratories. We describe and evaluate an algorithm for determining probabilities of parentage based on raw molecular-size data without establishing bins. We determine the standard deviation of DNA fragment size and assess the association of standard deviation with fragment size. We consider a pool of potential ancestors for an index line that is a hybrid with unknown pedigree. We evaluate the identification of inbred parents of maize hybrids with simple sequence repeat data in the form of actual molecular sizes received from two laboratories. We find the standard deviation to be essentially constant over the molecular weight. We compare these results with those of parallel analyses based on these same data that had been transcribed into discrete bins by the respective laboratories. The conclusions were quite similar in the two cases, with excellent performance using either binned or molecular-size data. We demonstrate the algorithm's utility and robustness through simulations of levels of missing and misscored molecular-size data.