Abstract
SLiM is an efficient forward population genetic simulation designed for studying the effects of linkage and selection on a chromosome-wide scale. The program can incorporate complex scenarios of demography and population substructure, various models for selection and dominance of new mutations, arbitrary gene structure, and user-defined recombination maps.
RECENT studies suggest that linkage effects such as genetic draft and background selection can profoundly alter the patterns of genetic variation in many species (Sella et al. 2009; Lohmueller et al. 2011; Weissman and Barton 2012; Messer and Petrov 2013). Understanding the potential impact of these linkage effects on population genetic methods requires efficient forward simulations that can model the evolution of whole chromosomes with realistic gene structure.
Forward simulations have a long-standing tradition in population genetics and many programs have been developed (Carvajal-Rodriguez 2010; Hoban et al. 2011). For any such program, there is typically a trade-off between efficiency and flexibility. Simulations based on combined forward-backward approaches, such as MSMS (Ewing and Hermisson 2010), can be very fast but remain limited to scenarios with only a single selected locus. Current programs that can model scenarios with multiple linked selected polymorphisms, such as FREGENE (Chadeau-Hyam et al. 2008), GENOMEPOP (Carvajal-Rodriguez 2008), simuPOP (Peng and Kimmel, 2005), forwsim (Padhukasahasram et al. 2008), or SFS_CODE (Hernandez 2008), either lack the ability to model realistic gene structure or are not efficient enough to allow for simulations on the scale of a whole chromosome in reasonably large populations.
Here I present SLiM, a population genetic simulation targeted at bridging the gap between efficiency and flexibility for the simulation of linkage and selection on a chromosome-wide scale. The program can incorporate complex scenarios of demography and population substructure, various models for selection and dominance, realistic gene structure, and user-defined recombination maps. Special emphasis was further placed on the ability to model and track individual selective sweepsβboth complete and partial. While retaining all capabilities of a forward simulation, SLiM utilizes sophisticated algorithms and optimized data structures that enable simulations in reasonably large populations. All these features are implemented in an easy-to-use C++ command line program. The source code is freely available under the GNU General Public License and can be downloaded from http://www.stanford.edu/~messer/software.
SLiM simulates the evolution of diploid genomes in a population of hermaphrodites under an extended WrightβFisher model with selection (Figure 1A). In each generation, a new set of offspring is created, descending from the individuals in the previous generation. The probability of becoming a parent is proportional to an individualβs fitness, which is determined by the selection and dominance effects of the mutations present in its diploid genome. Gametes are generated by recombining parental chromosomes (both crossing over and gene conversion can be modeled) and adding new mutations.
(A) Illustration of SLiMβs core algorithm for a scenario with two subpopulations. In each generation, a new set of offspring is created from the individuals in the previous generation. Parents are drawn with probabilities proportional to their fitness. Gametes are generated by recombining parental chromosomes and adding new mutations. Migration is modeled by choosing parents from the source subpopulation. (B) Comparison of runtimes and peak resident set size memory requirements between SLiM and SFS_CODE for different evolutionary scenarios. In each column, one of the four parameters L, N, u, and r was varied, while the others were kept constant at their respective base values (L = 5 Mbp, N = 500, u = 10β9, and r = 10β8). Simulations were run for 10N generations. Data points show the observed runtimes and memory requirements averaged over 10 simulation runs.
Mutations can be of different user-defined mutation types, specified by their dominance coefficients and distributions of fitness effects (DFE); examples could be synonymous, adaptive, and lethal mutations. The possibility to specify arbitrary dominance effects allows for the simulation of a variety of evolutionary scenarios, including balancing selection and recessive deleterious mutations. Genomic regions can be of different user-defined genomic element types, specified by the particular mutation types that can occur in such elements and their relative proportions; examples could be exons and introns.
Each mutation has a specified position along the chromosome but remains abstract in the sense that the simulation does not specify the particular nucleotide states of ancestral and derived alleles. Note, however, that the user has the freedom to associate abstract mutation types with specific classes of events. There are no back mutations in the simulations. Fixed mutations are removed from the population and recorded as substitutions.
SLiM is capable of modeling complex scenarios of demography and population substructure. The simulation can include arbitrary numbers of subpopulations that can be added at user-defined times, initialized either with new individuals or with individuals drawn from another subpopulation to model a population split or colonization event. Subpopulation sizes, migration rates, and selfing rates can be changed over time.
To establish genetic diversity, simulations first have to go through a burn-in. Alternatively, simulations can be initialized with a set of predefined genomes provided by the user or with the output of a previous simulation run. The user can further specify predetermined mutations to be introduced at certain time points. Such mutations can be used, for example, to investigate individual selective sweeps or to track the frequency trajectories of particular mutations in the population. Predetermined adaptive mutations can also be limited to partial selective sweeps, where positive selection ceases once the mutation has reached a particular population frequency.
SLiM provides a variety of output options, including (i) the complete state of the population at specified time points, in terms of all mutations and genomes present in the population; (ii) random samples of specific sizes drawn from a subpopulation at given time points; (iii) lists of all mutations that have become fixed, together with the times when each mutation became fixed; and (iv) frequency trajectories of particular mutations over time.
I ran SLiM under various test scenarios to check whether its output agrees with theoretical predictions. Specifically, I analyzed levels of neutral heterozygosity under different scenarios of demography, population substructure, and selfing, the fixation probabilities of new mutations of different selective effects, and the reduction in neutral diversity around adaptive substitutions. Simulation results conformed with the respective theoretical predictions in all tests (Supporting Information, File S1, section 6).
SLiM utilizes sophisticated algorithms and optimized data structures to achieve its high computational efficiency (File S1, section 7.1). The simulation is based on a hierarchical data architecture that minimizes the amount of information stored redundantly. At many stages of the program, large quantities of random numbers have to be drawn from general probability distributions. To do this efficiently, SLiM uses lookup tables that are precomputed only once per generation and then allow one to draw random numbers in O(1) time. Most algorithms are implemented using fast routines provided in the GNU scientific library (Galassi et al. 2009).
To evaluate SLiMβs performance I compared its runtime and memory requirements with those of SFS_CODE, a popular forward simulation of similar scope (Hernandez 2008). For these tests, a chromosome of length L was simulated with uniform mutation rate u and recombination rate r in a population of size N over the course of 10N generations, assuming an exponential DFE with
Its computational performance enables SLiM to simulate entire eukaryotic chromosomes in reasonably large populations. For instance, simulating the functional regions in a typical human chromosome of length L = 100 Mbp over 105 generations in a population of size N = 104 with u = 10β8 and r = 10β9 per site per generation, assuming a functional density of 5% and
SLiM has already been successfully applied in several projects that required efficient forward simulations on large genomic scales. For example, Kousathanas and Keightley (2013) used the program to examine how linked selection can affect their method for inferring the DFE from polymorphism data in fruit flies and mice. In Messer and Petrov (2013), SLiM was used to investigate the effects of linked selection on the MK test, and it was shown that such effects can severely bias the test. These studies highlight the need for efficient forward simulations that can model chromosomes with realistic gene structure.
Most of the current machinery of population genetics is still deeply rooted in the mindset of neutral theory, which assumes that adaptation is rare and that linkage effects from recurrent selective sweeps can thus be neglected. However, this assumption may be violated in many species. It is hence essential to verify with forward simulations under realistic scenarios of selection and linkage whether population genetics methods, and our estimates of key evolutionary parameters obtained from them, are robust to linkage effects. SLiM is specifically designed for this purpose and I believe that it will become an important tool for future population genetic studies.
Acknowledgments
The author thanks Dmitri Petrov for continuous support throughout the project; members of the Petrov lab, especially Zoe Assaf, David Enard, and Nandita Garud for testing the program; and three anonymous reviewers for their valuable comments on program and documentation. Part of this research was funded by the National Institutes of Health (grants GM089926 and HG002568 to Dmitri Petrov).
Footnotes
Communicating editor: J. Wall
- Received April 11, 2013.
- Accepted May 20, 2013.
- Copyright Β© 2013 by the Genetics Society of America