# Signatures of Dobzhansky–Muller Incompatibilities in the Genomes of Recombinant Inbred Lines

^{*}European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, NL-9713 AV Groningen, The Netherlands^{†}Population Epigenetics and Epigenomics, Department of Plant Sciences, Technical University Munich, 85354 Freising, Germany^{‡}Institute for Advanced Study, Technical University Munich, 85748 Garching, Germany

- 1Corresponding authors: European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, A. Deusinglaan 1, NL-9713 AV Groningen, The Netherlands. E-mail: m.colome.tatche{at}umcg.nl; and Population Epigenetics and Epigenomics, Department of Plant Sciences, Technical University Munich, Liesel-Beckmann-Str. 2, 85354 Freising, Germany and Institute for Advanced Study, Technical University Munich, Lichtenbergstr. 2a, 85748 Garching, Germany. E-mail: frank{at}johanneslab.org

## Abstract

In the construction of recombinant inbred lines (RILs) from two divergent inbred parents certain genotype (or epigenotype) combinations may be functionally “incompatible” when brought together in the genomes of the progeny, thus resulting in sterility or lower fertility. Natural selection against these epistatic combinations during inbreeding can change haplotype frequencies and distort linkage disequilibrium (LD) relations between loci on the same or on different chromosomes. These LD distortions have received increased experimental attention, because they point to genomic regions that may drive a Dobzhansky–Muller type of reproductive isolation and, ultimately, speciation in the wild. Here we study the selection signatures of two-locus epistatic incompatibility models and quantify their impact on the genetic composition of the genomes of two-way RILs obtained by selfing. We also consider the biases introduced by breeders when trying to counteract the loss of lines by selectively propagating only viable seeds. Building on our theoretical results, we develop model-based maximum-likelihood (ML) tests that can be applied to multilocus RIL genotype data to infer the precise mode of incompatibility as well as the relative fitness of incompatible loci. We illustrate this ML approach in the context of two published *Arabidopsis thaliana* RIL panels. Our work lays the theoretical foundation for studying more complex systems such as RILs obtained by sibling mating and/or from multiparental crosses.

- genetic incompatibility
- RIL
- long-range LD
- selection
- epistasis
- recombination
- complex traits
- Dobzhansky–Muller
- inbreeding

HYBRIDS from crosses between two divergent parental lines sometimes display low fertility and phenotypic abnormalities (Presgraves 2010). These effects are often attributable to combinations of parental genotypes (or epigenotypes) at two or more loci that are functionally incompatible when brought together into a single genome. This form of negative epistasis was originally invoked by Dobzhansky (1937) and Muller (1942) as a model for speciation. In the classical Dobzhansky–Muller (DM) model, a population splits into two subpopulations that become reproductively isolated through geographic or temporal mechanisms (*i.e.*, prezygotically). Once separated, the two subpopulations acquire independent mutations that are incompatible upon hybridization, thus resulting in sterility or reduced fertility among offspring. This process prevents further mixing and reinforces the existing (prezygotic) reproductive isolation genetically (*i.e.*, postzygotically). Additional independent mutations accumulate over time, causing further divergence between subpopulations and ultimately speciation. Empirical examples of interspecific genetic incompatibilities are well documented in the literature (Presgraves 2010) and have motivated extensive theoretical work in evolutionary genetics (*e.g.*, Nei *et al.* 1983; Orr and Orr 1996; Turelli and Orr 2000; Barton 2001; Orr and Turelli 2001; Turelli *et al.* 2001; Welch 2004; Fierst and Hansen 2010; Bank *et al.* 2012). Interestingly, genetic incompatibilities with varying degrees of penetrance are often already visible in intraspecific experimental crosses of inbred laboratory strains (Corbett-Detig *et al.* 2013; Chae *et al.* 2014). The detection and functional analysis of such intraspecific incompatibilities could provide fundamental insights into the mechanisms that drive postzygotic reproductive isolation in the wild and thus represent a useful model for understanding the molecular basis of speciation (Bomblies and Weigel 2007).

In plants, the clearest examples of intraspecific genetic incompatibilities come from experimental crosses of *Arabidopsis thaliana* (*e.g.*, Bomblies *et al.* 2007; Bikard *et al.* 2009; Durand *et al.* 2012; Chae *et al.* 2014). Arguably the best-studied case is the work of Bikard *et al.* (2009), who examined F_{2} progeny of selfed hybrids derived from the Columbia (Col) and the Cape Verde (Cvi) accessions. The authors found that a subset of the F_{2}’s had severely compromised fitness and demonstrated that this fitness loss is caused by a genetic incompatibility involving a reciprocal loss of duplicate genes on chromosome (chr) 1 and chr 5 (Figure 1A). Specifically, it was shown that Cvi carries a deletion of the gene on chr 5 and Col a nonfunctional version on chr 1, both of which act recessively. Hence, F_{2} individuals with the recessive epistatic combination Col|Col (chr 1) and Cvi|Cvi (chr 5) are (nearly) embryonic lethal. Interestingly, the genomic regions that are implicated in this epistatic incompatibility were first identified in a densely genotyped population of recombinant inbred lines (RILs) derived from the Col and Cvi accessions: at generation F_{6} of inbreeding, the authors noted strong long-range linkage disequilibrium (LD) between markers on chrs 1 and 5 (Supporting Information, Figure S1A) (Simon *et al.* 2008). Combinations of the Col|Col marker genotype on chr 1 and the Cvi|Cvi marker genotype on chr 5 were completely absent, suggesting that these epistatic combinations were subject to intense selection during inbreeding. Similar long-range LD patterns were identified in another RIL population originating from Shahdara (Sha) and Col accessions and involved epistatic interactions between a locus on chr 4 and chr 5 (Figure S1B). The two loci were subsequently fine-mapped, and functional studies revealed that this epistatic incompatibility is due to stable epigenetic silencing of a paralogue (Figure 1B) (Durand *et al.* 2012). This latter finding illustrates that—besides genetic factors—also epigenetic factors can cause intraspecific incompatibilities in plants, although it remains to be seen how common such epigenetic phenomena are.

Short- or long-range LD distortions between loci on the same or on different chromosomes are a common feature of RILs genomes. From the standpoint of complex trait analysis, such distortions are typically undesirable because they affect the resolution and power of quantitative trait locus (QTL) mapping. On the other hand, a systematic analysis of LD distortion patterns can provide insights into the epistatic architecture underlying genetic incompatibilities and yield targets for experimental follow-up. A decisive contribution to such efforts is a theoretical analysis of different incompatibility models and their selection signatures in the genomes of RILs. Most of the theoretical work devoted to understanding the genomes of RILs has ignored the role of selection (*e.g.*, Haldane and Waddington 1931; Broman 2005; Martin 2006; Teuscher and Broman 2007; Johannes and Colomé-Tatché 2011; Martin and Hospital 2011; Broman 2012; Zheng *et al.* 2015). The exception is the early work by Haldane (1956), Reeve (1955), and Hayman and Mather (1953), who examined cases of selection against homozygotes at a single locus and described the changes in genotype frequencies as a function of inbreeding and selection. However, these earlier theoretical results are of limited use for understanding the selection signatures of DM-type genetic incompatibilities as the latter require multilocus models.

Here we provide the first theoretical analysis of two-locus incompatibility models in the context of RIL construction. We consider three variants of the classical DM model (the dominance epistasis, the recessive epistasis, and the dominance–recessive epistasis models) and quantify their respective effects on short- and long-range LD patterns as a function of inbreeding, fitness, and recombination. We also give theoretical expressions for the total number of lines that are expected to be lost under different incompatibility scenarios. Building on these results, we present model-based maximum-likelihood (ML) tests that can be used for the detection of incompatible loci from multilocus genotype data collected at any inbreeding generation. We apply this ML method to two published *A. thaliana* RIL panels. Our work lays the theoretical foundation for studying more complex systems such as RILs obtained by sibling mating and/or from multiparental crosses.

## Overview of Genetic Incompatibility Models

The simplest form of epistatic incompatibility involves the interaction between only two loci, say and Consider two divergent inbred lines with diplotypes (*i.e.*, two-point genotypes) and where the “dot” superscript denotes a nonfunctional (*i.e.*, mutant) allele. We use the notation to distinguish genotypes and at the first () and the second () locus, respectively, from haplotypes and on each of the two homologous chromosomes (Table 1). Hence, inbred line is homozygous for two mutant alleles at the first locus and homozygous wild type at the second locus, while inbred line is homozygous mutant at the second locus and homozygous wild type at the first locus. There are three basic models of two-locus epistatic incompatibility, the dominance epistasis model (), the recessive epistasis model (), and the dominance–recessive epistasis model (). These models are summarized in Table 2 and are further detailed below.

### Dominance epistasis model ()

In the classical DM model, individuals with diplotypes and are fully viable, but their F_{1} hybrid progeny is sterile or shows reduced fertility. The reduced fitness of the hybrid is the result of dominance interactions of loci and meaning that allele is dominant over *B* at while allele is dominant over *A* at When the loss of fertility is not fully penetrant, F_{1} hybrids can be crossed (or selfed) to obtain an F_{2} population. Due to recombination and/or independent segregation of alleles at loci and there are 16 possible diplotypes in the F_{2} (Table 1). One can assume that double-heterozygote F_{2} individuals () experience the same loss of fitness as in the F_{1}. However, due to the dominance interactions, there are additional diplotypes in the F_{2} or in subsequent generations that are phenotypically equivalent to and will therefore be subject to the same, or similar, fitness loss. These diplotypes, with their corresponding fitness parameters are summarized in Table 2.

### Recessive epistasis model ()

A basic requirement of the classical DM model is that the incompatibility first appears in F_{1} hybrids. This may not always be the case. A less stringent version of the DM model is the recessive epistasis model. In this model allele is recessive to *B* at the first locus and allele is recessive to *A* at the second locus. This leads to selection against individuals, which do not appear in the F_{1} population but only at later breeding generations at low frequency (Table 2).

### Dominance–recessive epistasis model ()

A combination of the dominance and the recessive epistasis models is the dominance–recessive epistasis model. In this model, allele is dominant over *B* at the first locus and is recessive to *A* at the second locus. Selection is against individuals with diplotypes and (Table 1 and Table 2). Similar to the recessive epistasis case, this model implies that incompatibility does not appear in F_{1} individuals but only at later breeding generations. The reciprocal model where allele is recessive to *B* at the first locus and is dominant over *A* at the second locus is equivalent and can be obtained by considering the symmetries and

Of course, the above three incompatibility models are just as valid had we assumed that the two inbred lines are instead and meaning that the mutant allele is at the second locus and mutant allele is at the first locus. Various degrees of partial dominance are taken into account by attributing different fitness parameters to deleterious diplotypes (Table 2). In the following section we develop the necessary analytical framework to quantify the population-level consequences of these three incompatibility models during RIL construction. Readers who are primarily interested in the biological insights may skip directly to *Results*.

### Data availability

See Simon *et al.* 2008 for original data used in this paper.

## Theory

### Markov chain model

Consider the construction of a two-way RIL by selfing, starting from an F_{2} base population. There are 16 possible diplotypes in the F_{2}. Ignoring haplotype order, these can be grouped into 10 diplotype classes (Table 1). Individuals from the F_{2} (time ) are chosen to initiate an inbreeding process by repeated selfing for many generations to obtain a final population of RILs. The inbreeding process can be modeled as an absorbing finite Markov chain, where the states of the chain are the different diplotypes (Table 1). Assume that denotes the diplotype state of an individual at generation *t*. Then forms a Markov chain; *i.e.*, is independent of given We define the transition probability as a function of both *r* and where *r* () is the recombination rate at meiosis, and ( is the fitness corresponding to diplotype *j*. The transition matrix *T* is the collection of transition probabilities from one diplotype to another in one generation of inbreeding. For notational simplicity we omit the dot superscript in the following and implicitly keep track of the origin of the nonfunctional alleles. The general form of *T* is shown in *Appendix A*. Following Reeve (1955), we augment the Markov chain with a pseudostate “lost,” which accounts for the loss of diplotypes as a result of differential survival. The column corresponding to the lost state in the new transition matrix is given by for each line and by for line 11. This addition ensures that the rows of the new transition matrix sum to unity. The initial row vector of state probabilities is (1)which corresponds to the hybrid diplotype at F_{1} (time ) where there are no other diplotypes and no selection unless Hence, the state probabilities at any generation *t* of inbreeding can be obtained from the general formula (2)where (3)Note that the elements of are functions of *r* and the fitness Since only the surviving lines are of interest, one may drop the lost state and work instead with the reduced submatrix of survivors, *T*, and the reduced state vector (Reeve 1955). This leads to the recursion (4)where *P* is the eigenvector of *T* and *V* is a diagonal matrix of the distinct eigenvalues of *T*. We obtain the relative diplotype proportions of surviving lines by normalizing the diplotype proportions at any generation *t* of inbreeding by the mean fitness in the population at time *t*, Let us define the normalized diplotype frequencies byUsing Equation 4 we derive analytical expressions for the diplotype probabilities at any inbreeding generation (Wolfram Research 2015). For models and and for the case without selection (model ), we list the nonnormalized diplotype probabilities at in *Appendices B* and *D* and those for intermediate generations in *Appendices C* and *E*. The expected proportion of lost lines () can be easily calculated from these nonnormalized diplotype probabilities, usingwhich shows that the proportion of lost lines depends on the inbreeding generation *t*, the fitness and the meiotic recombination rate *r* between the two incompatible loci.

### Breeder bias

In practical situations, the breeder would want to keep as many lines as possible and therefore tries to counteract the loss of lines by implementing what may be called “biased single-seed descent” (BSSD) (Figure A1). That is, rather than selecting only one seed at random to propagate a given line to the next generation, the breeder plants many seeds from one line and chooses one that appears viable (Figure A1). This is equivalent to arguing that the breeder will not propagate a lost line. This correction process can be modeled by normalizing each row element () of *T* by the row total,which has the effect that no lines are actually lost at intermediate generations or at The only exception is when there is complete lethality (*i.e.*, ). In this case, lines that have become fixed for a given incompatible homozygous diplotype will not produce any viable seed at all, thus leaving no alternative seeds to choose from. Although it is possible to find closed-form solutions for these renormalized diplotype probabilities, these expressions have no easy form and are therefore omitted.

### Time-dependent LD

Changes in diplotype frequencies alter haplotype proportions in the population. As we will see, all incompatibility models result in a relative gain in nonrecombinant diplotypes or, stated alternatively, in a loss of diplotypes carrying recombinant haplotypes. These haplotype distortions lead to increased LD within chromosomes (*i.e.*, short-range LD) and also between chromosomes (*i.e.*, long-range LD). To calculate LD between loci and we first obtain the haplotype probabilities for any time *t* aswhere *k* denotes the haplotype (*i.e.*, ) and − is any haplotype but *k* [*e.g.*, with ]. Explicit analytical expression for these haplotype probabilities for models and at generation can be found in *Appendix D* and those for intermediate generations in *Appendix E*. As a reference we also provide the results for the case without selection () in *Appendices B* and *C* (at generations and at intermediate generations, respectively). For the case of breeder bias analytical solutions are possible but have no easy form and are therefore omitted. Using these haplotype probabilities, we define the random variables and which take values 1 or according to whether locus 1 or 2 on haplotype *k*, respectively, carries allele *A* or *B*. A time-dependent measure of LD can be obtained by calculating (5)where and and are the means and variances of and respectively.

### Maximum-likelihood estimation

The analytical expressions for the diplotype probabilities (*Appendix E*) can be employed in a maximum-likelihood procedure for the analysis of multilocus RIL genotype data at any generation of inbreeding. This procedure provides a method for estimating the most likely incompatibility model to have generated the data as well as the fitness coefficients corresponding to the different diplotypes.

Consider a sample of *N* RILs collected at any inbreeding generation *t*, with one random sibling representing each line. Let () be a random variable denoting the number of lines with diplotype (or its equivalent class) at loci and Since the lines are independent, the probability mass function of the observations is given by a multinomial distribution (6)where Ignoring constant terms, we write the log-likelihood function () for a given incompatibility model and a fixed recombination fraction *r* as (7)where are the unknown fitness values. Maximization of (7) yields estimates of the fitness as well as the likelihood value of a given incompatibility model. Competing incompatibility models can be compared using standard model comparison criteria. We note that inferences regarding incompatible loci on the same chromosome are difficult, because the parameters *r* and are partly confounded in the likelihood equations (*i.e.*, *r* and often multiply each other; see *Appendix E*). This is particularly problematic when and are in tight linkage and selection is weak (see Table S3).

## Results

The following section highlights several important biological insights that may be of practical relevance for experimentalists working on genetic incompatibility or with populations of RILs in general. Throughout we present results for generation (as this is a typical reference generation in the construction of RILs by selfing) and generation (as this is the theoretical limit) and for the three incompatibility models ( and ) and the model without selection (). Results for any other inbreeding generation can be directly extracted from the analytical formulas presented in *Theory* and *Appendices C and E*. To simplify discussion we consider the special case (*i.e.*, no partial dominance). We note that the value is a discontinuity point in all models, as for they all reduce to (Table 2). For visual purposes this discontinuity point is not shown in the results in Figure 2 and Figure 3.

### Genetic incompatibility leads to a loss of lines during inbreeding

The most obvious consequence of genetic incompatibilities is that selection against certain diplotypes leads to the eventual loss of lines during inbreeding. The magnitude and rate of this loss depend on the mode of incompatibility (*i.e.*, models and ), the meiotic recombination rate (*r*), and the fitness *w*. To illustrate this, we plot the expected proportion of lost lines for two different values of *r* (0.05, 0.5) against *w* at generations and (Figure 2A).

The loss of lines is most severe for the dominance-epistasis model (). This is because the number of different diplotypes that are selected against is largest under this model (Table 2). As the fitness of the incompatible diplotype approaches zero () more than of the lines are expected to be lost by generation and this percentage is not much influenced by *r*.

It is perhaps not surprising that the recessive-epistasis model () is the most benign, with the loss of lines never exceeding 25% as selection acts exclusively against the genetically fixed recombinant diplotype Hence, the loss of lines at generation depends only on *r* but not *w*. With larger *r* more lines acquire recombinant haplotype during inbreeding and this haplotype can go on to fixation. By generation all lines will have been purged from the population. Hence, given sufficient time this process does not depend on the selection intensity, but does require that

For low fitness () selection is generally quite efficient such that the proportion of lost lines converges rapidly to its limiting value at However, for the proportion of lost lines at generation differs from what is expected at generation and this feature is common to all models. Another common feature of all three models is that the loss of lines is positively related to the recombination rate between the two incompatible loci. This is because selection acts primarily against recombinant diplotypes in all models (Table 2), so that the loss of lines is expected to be most severe when the incompatibility is due to unlinked loci.

Differential survival of lines during inbreeding has other, less obvious, population-level consequences: It affects genotype and haplotype frequencies, which in turn can distort LD patterns in the genomes of RILs. We discuss these effects in subsequent sections.

### Genetic incompatibility changes genotype frequencies beyond fixation

In the construction of RILs by selfing inbreeding is usually not taken farther than generation as the lines are considered nearly fixed at that point. Indeed, in the absence of selection () the diplotype frequencies are close to their theoretical limit (), with only of the lines still awaiting fixation. This situation is drastically different when genetic incompatibilities are present in the form of models and In this case, certain diplotypes, many of which are already genetically fixed such as are under persistent selection and thus continue to change the relative genotype frequencies among RILs, even at very advanced inbreeding generations. To illustrate this we plot the difference between the diplotpye proportions at and at [*i.e.*, ] (Figure 2B). For all three incompatibility models show a higher frequency of changing diplotypes compared to the case without selection. One major reason for this is that selection against specific recombinant diplotypes (*e.g.*, ) persists for much longer than the time it takes to generate them through recombination and fixation. This effect is clearest when the two incompatible loci are unlinked ().

With decreasing fitness the three incompatibility models begin to differ in subtle ways: For the frequency of changing diplotypes after generation is actually smaller for model than it is for the case without selection (); for example for and the frequency of changing diplotypes after generation in is compared to in This can be attributed to the fact that many genetically unfixed diplotypes (*e.g.*, ) are purged at a faster rate than the rate at which they become fixed. A similar, albeit less drastic, situation occurs in model but requires much stronger selection pressures and is dependent on *r* (Figure 2B). By contrast, in model the frequency of changing diplotypes never drops below that without selection. This is due to selection being restricted to the recombinant diplotype so that fixation for this diplotype needs to occur first before it can get purged from the population.

Taken together, these results raise important practical considerations: They imply that RILs that segregate incompatible genotypes cannot be viewed as an “eternal” genetic resource, as their genotype frequencies continue to change upon further propagation, particularly under weak selection. With plants this can be partly bypassed by stocking seeds from a reference generation that is then distributed to the community for phenotyping experiments. However, with RILs derived by sibling mating, lines can only be maintained by continued crossing. Experimental results obtained with genetic material from different inbreeding generations may therefore not be comparable.

### Genetic incompatibility increases LD within and across chromosomes

It is intuitively obvious that selection against certain diplotypes during inbreeding indirectly affects haplotype frequencies. Changes in the relative frequency of recombinant haplotypes distort LD relations between loci within or across RIL chromosomes. To visualize this, we plot LD against *w* for different values of the meiotic recombination rate, *r* (0.05 and 0.5) (Figure 2C). Probably the most important observation is the strong induction of long-range LD between genetically unlinked loci for all incompatibility models. Indeed, at generation the genotypes at the two incompatible loci are expected to be correlated in the order of 0.5, whereas they are expected to be uncorrelated in the absence of selection. For all models show that long-range LD rapidly reaches its maximum with generation time: LD is already near its limiting value at generation However, for long-range LD continues to increase beyond generation as the relative frequency of recombinant haplotypes slowly decreases as a result of differential survival of lines. LD within chromosomes is of course already high due to gametic linkage and scales with the genetic distance between the two incompatible loci. In this case, selection will reinforce LD even further, leading to (local) genetic map contractions in the genomes of RILs.

One counterintuitive observation in the LD patterns for models and is the slight increase in LD at generation as a function of fitness. To understand this it is necessary to discuss the fate of haplotypes during inbreeding under these two models. In both cases the proportion of recombinants depends on the fitness *w*, and both models show that low fitness values will lead to a higher proportion of recombinant diplotypes compared to higher fitness values *Appendices D and E*. However, recall that selection in both incompatibility models is against several diplotypes (Table 2), many of which carry the nonrecombinant parental haplotypes or Hence, with strong selection (low fitness) more lines are lost, but among the survivors there is an overrepresentation of diplotypes carrying recombinant haplotypes. By contrast, with lower selection (higher fitness) there are more surviving lines, but among these there is a higher proportion of parental nonrecombinant haplotypes.

### Preventing the loss of lines introduces additional biases

It seems sensible that many of the adverse effects of genetic incompatibility could be bypassed by preventing the loss of lines in the first place. However, preventing the loss of lines through counterselection (BSSD, Figure A1) does not imply that the diplotype frequencies are also corrected as if no selection had occurred. Selection against incompatible diplotypes persists, but the breeder chooses to propagate a compatible individual instead of losing a line by trying to propagate an incompatible one (Figure A1). In this way the breeder introduces unexpected biases into the inbreeding dynamics, particularly with regard to haplotype frequencies and LD patterns. This is clearly illustrated in Figure 3, where we plot the proportion of recombinant haplotypes among surviving lines. In general, we find that BSSD leads to a higher proportion of recombinant haplotypes than in the case of standard single-seed descent (SSD). However, these proportions are nowhere close to what would be expected in the absence of genetic incompatibility. The most unexpected observation is that for unlinked loci, when BSSD can actually produce a lower proportion of recombinant individuals among surviving lines. This means that even though more lines have been salvaged, the proportion of recombinant haplotypes in the final RILs is even lower than among surviving lines without breeder bias. The trade-off between the number of surviving lines and the proportion of recombinant individuals is important for complex trait mapping analysis where not only the sample size but also the proportion of recombinants are key determinants of mapping resolution. Making informed decisions regarding the use of BSSD is difficult, as the presence and/or severity of genetic incompatibilities are usually unknown prior to RIL construction. Be it as it may, the important observation about BSSD is that it will lead to another set of biases in the genomes of RILs (Figure 3, Figure S2). Breeders should be aware of these biases.

### Application to RIL genotype data

Simon *et al.* (2008) presented genetic maps of two *A. thaliana* RIL populations derived from crosses between Cvi × Col and Sha × Col accessions. Their analysis of the genotype data revealed several cases of long-range LD between pairs of markers on different chromosomes (Figure S1). In the Cvi × Col cross, long-range LD was detected between markers on chrs 1 and 5 and between markers on chrs 1 and 3. In the Sha × Col cross, long-range LD was detected between markers on chrs 4 and 5. The authors suggested that these LD patterns are the results of intense epistatic selection against certain parental genotype combinations during inbreeding. We reanalyzed the genotype data from both RIL crosses, using our ML approach (Equation 7). We focused on the two significant LD patterns originally described by Simon *et al.* (2008) and for which later experimental follow-up work established the precise mode and molecular basis of the incompatibilities (Figure 1). In each case, we performed ML estimation using our three incompatibility models ( and ) with and without breeder bias and, when appropriate, considered the symmetries and (Table S1 and Table S2). Our goal is to infer the most likely incompatibility model to have generated the observed genotype data and to obtain estimates of the fitness values.

#### Cvi × Col cross:

In their analysis of the Cvi × Col cross, Simon *et al.* (2008) noted that individual RILs that carried the Col|Col genotype at a marker on chr 1 were much less likely to carry the Cvi|Cvi genotype at a marker on chr 5, although these loci were physically unlinked. In an impressive follow-up study (Bikard *et al.* 2009) it was later demonstrated that the chr 1 and chr 5 incompatibility was due to a reciprocal loss of a duplicated gene (Figure 1A). Specifically, it was shown that Cvi carried a deletion of the gene on chr 5 and Col a nonfunctional version of it on chr 1, both of which acted recessively. individuals with the recessive epistatic combination Col|Col (chr 1) and Cvi|Cvi (chr 5) were found to be (nearly) embryonic lethal. Consistent with their follow-up results in the F_{2}’s, application of our ML approach to the original RIL genotype data correctly identified the recessive epistatic incompatibility model (model ) as the most likely, with nonfunctional alleles at chr 1 for Col and at chr 5 for Cvi (Table S1). In addition, we estimated that epistatic selection against the double recessive during inbreeding was substantial (fitness ) (Table S1), which is in line with the (near) embryonic lethality observed among the F_{2}’s.

#### Sha × Col cross:

The genetic incompatibility in the Sha × Col cross is more complex: Simon *et al.* (2008) observed that the combination Col|Col on chr 4 and Sha|Sha on chr 5 was nearly absent in the RILs. Molecular analysis of the two interacting genomic regions (Durand *et al.* 2012) revealed that Sha carries a duplicated gene on chr 4, which epigenetically silences its original copy on chr 5 in *trans*. Silencing is most likely achieved via the generation of small interfering RNA (siRNA) that promotes methylation at homologous sequences. Adding to this complexity, the authors showed that Sha has an active copy of the gene on chr 4, where no homologous gene exists for Col, while Col has an active copy of the gene on chr 5, where this copy is epigenetically silenced in Sha. Application of our ML approach to the genotype data revealed that the chrs 4 and 5 incompatibility is most consistent with a partial dominance model (model ), with strong selection against individuals with genotypes Col|Col on chr 4 and Sha|Sha on chr 5 () and weak selection against individuals with genotypes Col|Sha on chr 4 and Sha|Sha on chr 5 () (Table S2). These rather low fitness values underline the authors’ observation that incompatible individuals experienced a reduction in seed yield of Interestingly, our ML analysis also detected evidence for breeder bias in these data. This finding is consistent with the authors having made concerted efforts to counteract the loss of lines during RIL construction (Durand *et al.* 2012). Indeed, we estimate that without counterselection, of the lines would have been lost. However, these conclusions should be interpreted with caution as our simulations show that reliable detection of breeder bias in RIL data requires much larger sample sizes than in the populations considered by Durand *et al.* (2012) (see Table S3).

The predominance of recessive or (partial) dominance epistasis as a source of genetic incompatibilities in the Cvi × Col and the Sha × Col cross makes sense, considering that other incompatibility effects such as those associated with full dominance epistasis would have led to an initial loss of individuals, which may have prevented the construction of these RILs in the first place. We therefore suspect that the detection of long-range LD in multilocus RIL genotype data will most often be traceable to recessive or partial dominance epistasis or else to dominance epistasis in combination with weak selection.

## Discussion

RILs are a popular tool for studying the genetic basis of complex traits. Many populations of RILs have been created in a variety of organisms. The genotypic properties of RILs often diverge drastically from what is expected from theory. Widespread allele frequency distortions and unexpected long-range LD patterns are common. Such features are often the result of differential survival (or fertility) of certain combinations of parental genotypes during inbreeding. This is perhaps nowhere clearer than in the genomes of recently created eight-way RILs in mice, which were derived from eight different inbred parental strains (Collaborative Cross Consortium 2012). The construction of these RILs has been severely hampered by high lethality and infertility rates during inbreeding. Genotyping data at intermediate generations showed that certain parental genotypes were nearly absent in some genomic regions, and surviving lines displayed complex long-range LD patterns. These observations are consistent with selection having acted on entire networks of interacting loci. High-dimensional incompatibility networks can be viewed as multilocus extensions of the classical DM model. While interesting from a data analysis standpoint, theoretical modeling of such multilocus systems in the genomes of RILs is analytically not tractable, which makes this problem much less attractive from a theoretical standpoint. While the classical two-locus DM model represents a limiting case, it does give a plausible mechanistic description of how genetic incompatibilities initially arise in diverging subpopulations. Theory as well as empirical evidence suggests that, once DM-type incompatibilities take hold, additional incompatibilities accumulate exponentially (*i.e.*, they “snowball”) (Orr and Turelli 2001; Matute *et al.* 2010; Moyle and Nakazato 2010). This exponential accumulation suggests that two-locus incompatibilities expand into multilocus incompatibility networks over time, rather than accumulating independently in an additive manner.

In the present work we studied the selection signatures of different variants of the classical DM model in genomes of RILs obtained by selfing. Our analysis showed that DM-type incompatibilities can give rise to complex inbreeding dynamics. In our view, the most troublesome situation is the presence of weak selection as it will continue to change genotype frequencies and LD patterns among RILs, even beyond genetic fixation. Hence, RILs that segregate incompatible genotypes do not, technically, present a reference population for the community, and phenotypic results may not be comparable across studies. Our analysis also showed that counterselection by breeders cannot rescue the adverse effects of genetic incompatibility but introduces additional biases in the form of LD and haplotype distortions. While these issues can be of concern to breeders whose aim is to create these populations for downstream complex trait analysis, many existing RIL genotype data sets may present a largely unexplored resource for studying the basic principles underlying genetic incompatibilities. However, it is important to keep in mind that incompatibilities detected in RILs are not necessarily representative of natural populations, as the interacting alleles may be, individually or jointly, so rare that incompatible hybrids arise at only very low frequencies. Nonetheless, we argue that a deeper understanding of the mechanisms that cause genetic incompatibilities in experimental crosses may help us to establish the sufficient (molecular) conditions for speciation in the wild.

## Acknowledgments

We thank M. Shojaei Arani for discussions and for his contribution to the preparation of the formulas. This work was supported by grants from the Netherlands Organization for Scientific Research (to M.C.-T) and by a University of Groningen Rosalind Franklin Fellowship (to M.C.-T). F.J. acknowledges support from the Technische Universität München–Institute for Advanced Study, funded by the German Excellence Initiative and the European Union Seventh Framework Programme under grant agreement 291763.

## Appendices

### Appendix A: General Transition Matrix

### Appendix B: Model for

The following equations show the diplotype and haplotype probabilities for the model without selection () for

#### Model *M*_{0}: Diplotypes

#### Model *M*_{0}: Haplotypes

### Appendix C: Model at Intermediate Inbreeding Generations

The following equations show the diplotype and haplotype probabilities for the model without selection () at intermediate inbreeding generations, where we used and

#### Model *M*_{0}: Diplotypes

#### Model *M*_{0}: Haplotypes

### Appendix D: Models and for

Nonnormalized diplotype and haplotype probabilities with selection for incompatibility models and for for are as follows:

#### Diplotypes

##### Model *M*_{1}:

##### Model *M*_{2}:

##### Model *M*_{3}:

#### Haplotypes

##### Model *M*_{1}:

##### Model *M*_{2}:

##### Model *M*_{3}:

### Appendix E: Models and at Intermediate Inbreeding Generations

Nonnormalized diplotype and haplotype probabilities with selection for incompatibility models , and at intermediate inbreeding generations, for are shown. Note that and

#### Diplotypes

##### Model *M*_{1}:

##### Model *M*_{2}:

##### Model *M*_{3}:

## Footnotes

*Communicating editor: S. F. Chenoweth*Supporting information is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.115.179473/-/DC1.

- Received June 15, 2015.
- Accepted December 14, 2015.

- Copyright © 2016 by the Genetics Society of America