The vast amount of sequence data generated to analyze complex traits is posing new challenges in terms of the analysis and interpretation of the results. Although simulation is a fundamental tool to investigate the reliability of genomic analyses and to optimize experimental design, existing software cannot realistically simulate complete genomes. To remedy this, we have developed a new strategy (Sequence-Based Virtual Breeding, SBVB) that uses real sequence data and simulates new offspring genomes and phenotypes in a very efficient and flexible manner. Using this tool, we studied the efficiency of full sequence in genomic prediction compared to SNP arrays. We used real porcine sequences from three breeds as founder genomes of a 2500-animal pedigree and two genetic architectures: “neutral” and “selective.” In the neutral architecture, frequencies and allele effects were sampled independently whereas, in the selective case, SNPs were sites putatively under selection after domestication and a negative correlation between effect and frequency was induced. We compared the effectiveness of different genotyping strategies for genomic selection, including the use of full sequence commercial arrays or randomly chosen SNP sets in both outbred and crossbred experimental designs. We found that accuracy increases using sequence instead of commercial chips but modestly, perhaps by ≤ 4%. This result was robust to extreme genetic architectures. We conclude that full sequence is unlikely to offset commercial arrays for predicting genetic value when the number of loci is relatively large and the prior given to each SNP is uniform. Using sequence to improve selection thus requires optimized prior information and, likely, increased population sizes. The code and manual for SBVB are available at https://github.com/mperezenciso/sbvb0.
- Received August 17, 2016.
- Accepted November 23, 2016.
- Copyright © 2017 by the Genetics Society of America