## Abstract

Self-incompatibility (SI) is a genetically based recognition system that functions to prevent self-fertilization and mating among related plants. An enduring puzzle in SI is how the high diversity observed in nature arises and is maintained. Based on the underlying recognition mechanism, SI can be classified into two main groups: self-recognition (SR) and nonself-recognition (NSR). Most work has focused on diversification within SR systems despite expected differences between the two groups in the evolutionary pathways and outcomes of diversification. Here, we use a deterministic population genetic model and stochastic simulations to investigate how novel S-haplotypes evolve in a gametophytic NSR [SRNase/S Locus F-box (SLF)] SI system. For this model, the pathways for diversification involve either the maintenance or breakdown of SI and can vary in the order of mutations of the female (SRNase) and male (SLF) components. We show analytically that diversification can occur with high inbreeding depression and self-pollination, but this varies with evolutionary pathway and level of completeness (which determines the number of potential mating partners in the population), and, in general, is more likely for lower haplotype number. The conditions for diversification are broader in stochastic simulations of finite population size. However, the number of haplotypes observed under high inbreeding and moderate-to-high self-pollination is less than that commonly observed in nature. Diversification was observed through pathways that maintain SI as well as through self-compatible intermediates. Yet the lifespan of diversified haplotypes was sensitive to their level of completeness. By examining diversification in a NSR SI system, this model extends our understanding of the evolution and maintenance of haplotype diversity observed in a recognition system common in flowering plants.

THE origin and maintenance of the extraordinary diversity observed at loci involved in genetically based recognition systems such as major histocompatibility complex in animals (Hedrick 1998), mating types in fungi (May *et al.* 1999) and self-incompatibility (SI) in plants (Wright 1939; Lewis 1949) has long fascinated evolutionary biologists. In all these systems, balancing selection maintains genetic variation (Charlesworth 2006a; Delph and Kelly 2014). Plant SI is widespread in flowering plants (Igic *et al.* 2008) and functions to prevent self-fertilization and the consequent deleterious effects of inbreeding depression. Here, negative frequency-dependent selection (NFDS), a form of balancing selection where a rare allele has a selective advantage (Wright 1939), maintains the high diversity observed in nature (Lawrence 2000). Yet one of the most intriguing questions in the evolution of SI is how new alleles (S-haplotypes) evolve (Uyenoyama *et al.* 2001; Chookajorn *et al.* 2004; Gervais *et al.* 2011). This evolutionary puzzle originates from coevolution of the male and female determining components of the incompatibility reaction. A unifying feature of all SI systems (see the list of acronyms in Table 1) is that the incompatibility reaction is controlled by a single polymorphic locus with low recombination between male- and female-specificities (Takayama and Isogai 2005). A mutation in one component may cause the breakdown of SI, with selection against the self-compatible (SC) individual due to inbreeding depression (Charlesworth 2006b). This raises the question of whether self-compatibility is involved in the process of S-haplotype diversification and if the evolutionary pathways involved vary between different SI systems.

An additional complexity in understanding the evolution of SI is that very different evolutionary dynamics are expected for self-recognition (SR) *vs.* nonself-recognition (NSR) SI (Fujii *et al.* 2016). Molecular characterization of the three main systems representing the Brassicaceae (pollen ligand and stigmatic receptor kinase), Papaveraceae (Ca-dependent signaling) and Solanaceae type, including the Plantaginaceae and Rosaceae (Ribonuclease (SRNase) and F-Box (SLF) protein) (for reviews, see Takayama and Isogai 2005; Iwano and Takayama 2012) has revealed striking differences between SR and NSR systems, especially in the evolutionary relationships between the male- and female-determining components (Fujii *et al.* 2016). In the SR systems (Brassicaceae and Papveraceae), it is the recognition of self that prevents fertilization (Takayama and Isogai 2005; Iwano and Takayama 2012). This results in tight coevolution between male- and female-determining components in a single haplotype shown by their congruent genealogies and a common evolutionary history. However, for NSR (Solanaceae type), the inability to recognize self prevents self-fertilization but allows fertilization by pollen from genetically distinct individuals (Takayama and Isogai 2005; Kubo *et al.* 2010). Here, the male-determining components (SLF) have coevolved to recognize and detoxify the female-determining SRNases from all haplotypes other than their own (Figure 1). High polymorphism and sequence divergence among SRNase genes, compared to high sequence conservation of SLF genes from different haplotypes reflects the different patterns of coevolution in NSR systems (Fujii *et al.* 2016). These inherent differences between SR and NSR systems could result in distinct pathways for the evolution of novel S-haplotypes. Moreover, a unique property of NSR systems is the degree of completeness and incompleteness, which will determine fitness and mate availability (see Figure 1). A complete haplotype has all the SLFs required to detoxify all other SRNases in the population, while an incomplete haplotype can vary in the number of missing SLFs (degree of incompleteness). Including this information is essential to fully characterize the NSR system and to understand how incompleteness can influence diversification dynamics.

Recent theory has provided a basis for understanding the evolution of novel incompatibility types in both SR and NSR systems. Based on the evolutionary dynamics of SR systems, Uyenoyama *et al.* (2001) and Gervais *et al.* (2011) present a two-step model for the evolution of novel S-haplotypes where first there is a mutation in the male specificity, followed by a corresponding mutation in the female specificity. Under conditions of strong inbreeding depression and lower selfing rate, Uyenoyama *et al.* (2001) found that new specificities arise via evolutionary pathways that include a loss of SI (*i.e.*, a self compatible intermediate). However, the novel S-haplotype was often found to replace the ancestral haplotype, so that diversification—an increase in haplotype numbers—was limited. Moreover, in this model, diversification through the pathway that involved an initial female (pistil) mutation followed by a mutation in the male (pollen) component was thought to be impossible (Uyenoyama *et al.* 2001). This model was analyzed further by Gervais *et al.* (2011), who found that diversification was possible under conditions of high inbreeding depression and a moderate-to-high rate of self-pollination. Furthermore, the rate of diversification was higher with fewer S-haplotypes in the population (Gervais *et al.* 2011) since there is stronger selection when there are fewer S-haplotypes.

For NSR systems, Fujii *et al.* (2016) recently presented a conceptual two-step model for novel S-haplotype evolution. In this model, a mutation to generate a new SLF occurs first on some haplotype, which, given there is no associated fitness cost, increases in frequency via drift. Once the new SLF is common enough, a novel self-incompatible haplotype can be generated by a corresponding mutation in the SRNase in another haplotype (*i.e.*, the two mutations occur on different haplotypes). Fujii *et al.* (2016) then provide simulations to support this model that suggest diversification via this pathway is possible, but that genetic exchange among S-haplotypes is important for new haplotype evolution (Fujii *et al.* 2016). Although this model provides a useful basis for investigating the evolution of novel S-haplotypes in NSR systems, there are questions regarding its plausibility as the driving force in generating novel SI haplotypes. First, we observe that, for the newly generated SI haplotype to be able to invade, the novel SLF gene created in the first step must increase to a high frequency as otherwise the fertilization of the new SI haplotype will be unlikely. This implies that the new SLF gene should ideally occur in many haplotypes, and therefore either recurrent mutations must happen on different haplotypes, or genetic exchange must be common. Second, even if the new SLF spreads to reach a sufficient frequency, the newly created incomplete haplotype is still rejected by at least two haplotypes—itself and the ancestral type—and is thus at a selective disadvantage due to mate limitation. Finally, their model only considers the evolution of new incomplete haplotypes through a pathway that maintains SI. Consequently, it remains unclear to what extent the path proposed by Fujii *et al.* (2016) is responsible for the diversification of SI haplotypes in NSR systems. Further theory is therefore required to examine all potential evolutionary pathways, including those with SC intermediates, and examine how the evolutionary outcomes vary with parameters such as inbreeding depression, selfing rate, number of haplotypes and with mate availability mediated by incompleteness.

Here, we combine analytical theory with simulations to examine the interplay between completeness, drift, and mutation for diversification in a NSR SI system. First, we propose an explicit population genetic model for the evolution of novel S-haplotypes for NSR systems that includes pathways with a maximum of two selected (non-neutral) mutations on a single haplotype (Figure 2). In this model, there are five potential evolutionary pathways, all starting from an ancestral SI haplotype, which we arbitrarily label The S-haplotype may be in two states, with either the presence () or absence () of the novel SLF, so that a single neutral mutation separates the two states. Following this, the first non-neutral mutation may be a change in the SRNase (female specificity) gene (pathway 1 from or pathway 5 from ), the simultaneous loss and gain of an SLF (male specificity) gene (pathway 2), or gain of an SLF gene (pathway 3 from or pathway 4 from ) (Figure 2, A and B). The second non-neutral mutation then involves either the gain of the novel SRNase (pathways 2, 3, and 4; notice that pathway 3 requires an additional neutral mutation), loss of the novel SLF to restore SI (pathway 1), or gain of the ancestral SLF (pathway 5). Consequently, in this model, a new S-haplotype can evolve via a number of pathways, including four that involve a SC intermediate, and one in which SI is maintained. Genetic and demographic factors such as selfing rate, level of inbreeding depression, mutation rate, number of haplotypes, and population size (Wright 1939; Uyenoyama *et al.* 2001; Gervais *et al.* 2011) may affect the rate and trajectory of diversification at the S locus. Despite their potential importance, to date, no models have examined these conditions and how they interact with S-haplotype diversification in NSR systems. Our model examines these conditions and relates this to the different evolutionary pathways that may generate novel S-haplotypes. Moreover, by considering all pathways, and reducing our model to correspond to previous models (Uyenoyama *et al.* 2001; Gervais *et al.* 2011), we can also directly compare these studies (based on SR) with ours, which considers the dynamics of NSR SI systems.

To investigate the pathways and conditions associated with novel S-haplotype evolution, we first establish a deterministic population genetic model for the NSR system. By examining all potential evolutionary pathways (Figure 2), we ask: what conditions (inbreeding and self-pollination rate) are associated with the evolution of new S-haplotypes and does this vary with evolutionary pathway? Do novel S-haplotypes evolve through SC intermediates? How does completeness (having a full set of SLFs from the population) influence diversification? And, does this vary with haplotype number? Following this, using stochastic simulations with finite population size, we ask, in addition to the effect of inbreeding depression and selfing rate, what is the influence of population size, mutation rate, and number of potential haplotypes on the evolution of new specificities in a NSR system? Finally we ask if certain evolutionary pathways are more common for SI haplotypes with long lifespans and high frequencies. Diversification was possible in our analytical model, but the parameter space for diversification was more limited than in stochastic simulations and varied with haplotype number and the level of completeness. Furthermore, in the analytical model, diversification generally resulted in a short-term increase in S-haplotype numbers, especially in the pathway that maintains SI. This increase was only short-term because incomplete haplotypes slowly go extinct unless they acquire a full set of SLFs. In comparison, in the stochastic simulations, S-haplotype diversification was observed even for the pathway maintaining SI. This difference highlights the potential importance of drift, unconstrained mutational order, and mutation rate for diversification outcomes. By considering all evolutionary pathways, and the conditions associated with S-haplotype diversification, our model combines analytical theory and stochastic simulations to provide new insights into the evolutionary dynamics of NSR SI systems in plants.

## Methods

### The NSR system

The genetic mechanism of our NSR system is based on the model proposed by Kubo *et al.* (2010) and Fujii *et al.* (2016). This is a gametophytic SI system where incompatibility is determined by the haploid genotype of the pollen. We assume that each haplotype consists of two tightly linked loci: an *R*-locus and an *F*-locus. An *R*-locus determines the female specificity and consists of a single RNase whereas, at the *F*-locus, male specificity is determined collaboratively by several SLF genes (F-boxes) In the analytical model (see below) we make no assumptions on the length (*L*) of this sequence, and so the appearance of a new SLF may either keep the length of the haplotype *L* intact (*e.g.*, a change of specificity of an old SLF), or *L* may increase due to a duplication event. However, in the stochastic simulations (see below), we assume that the number of SLFs in each haplotype is constant (*L* is constant), implying that a mutation in the SLF results in the simultaneous appearance (addition) and disappearance (deletion) of an SLF. We assume a one-to-one correspondence between SLFs and RNase, such that each SLF is assumed to detoxify (recognize) exactly one RNase, and each RNase is recognized by exactly one SLF, This implies, for example, that a pollen haplotype needs to have both SLFs and in order to recognize, and thus be able to fertilize, a diploid plant with and RNase (Figure 1). Conversely, incompatibility results from the failure of a pollen grain to recognize both RNase of a diploid female. For a pollen grain to be able to fertilize all individuals with incompatibility types other than its own type in the population it needs to have as many SLFs as there are RNase present in the whole population except the one recognized by its own RNase. If a haplotype has all the required SLFs, we call it *complete* (and otherwise *incomplete*). We emphasize that completeness is a property of a haplotype, but also depends on the composition of the population (Figure 1). The level of incompleteness of a haplotype is described by the completeness deficit, which measures the number of missing SLFs. A *haplotype class* represents all haplotypes that have the same female specificity (RNase) but may have different sets of SLFs. We treat as one SI class all complete and incomplete SI haplotypes with the same RNase (Figure 1).

### Evolutionary pathways for the generation of new SI haplotypes: the cube

In the NSR system, the generation of a new SI haplotype requires mutational changes not only in a focal haplotype but also in other haplotypes in the population. Moreover, some of the required mutations may be under selection, but some are not. We thus need to make a distinction between the number of neutral and non-neutral (selected) mutations in the focal haplotype, as well as the numbers on other haplotypes in the population. In this paper, we consider all evolutionary pathways for the generation of new SI haplotypes that allow up to two selected mutations on a single haplotype. As in the previous work on the evolution of new SI haplotypes (Uyenoyama *et al.* 2001; Gervais *et al.* 2011), we start by considering an initial population of *k* complete SI haplotype classes, and discuss all the mutations that lead to a new complete SI haplotype class.

#### Step 1 (initial condition; neutral mutations):

For every diversification pathway, the first mutation in the population must necessarily give rise to a haplotype with a novel male specificity, *i.e.*, a new SLF; a haplotype with a new female specificity will never be fertilized and thus can never invade (Figure 2). We start our analysis by either assuming that the novel SLF already exists in some SI haplotype classes in the population, or that it fixes due to drift. And so we enter the Cube (Figure 2).

#### Step 2 (1st non-neutral mutation):

Suppose that the novel not-yet utilized SLF, which we label is fixed within *n* different SI classes. We will follow changes on a haplotype, say (or any other of the *k* existing SI classes; note that is not necessarily the class that underwent the previous diversification event), which may or may not have the novel SLF We write if the haplotype class has the novel SLF and if not. Each pathway is described in Figure 2. Three selectively different events may follow: a mutation in the *R*-locus, either (i) on a haplotype (pathway 1), or (ii) on a haplotype (pathway 5), or a mutation in the *F*-locus (iii) such that haplotype ( or ) obtains an SLF that corresponds to its RNase (pathways 2, 3, and 4).

#### Step 3 (2nd non-neutral mutation):

The final step requires either gain of an SRNase (pathways 2, 3, and 4 – notice that path 3 requires an additional neutral mutation), gain of the ancestral SLF (pathway 5), or replacement of the new SLF by the ancestral SLF (pathway 1); see Figure 2.

#### Final state of the population:

All five evolutionary paths lead to a novel complete SI haplotype If all initial *k* SI haplotype classes remain in the population during this process, we say that the paths are diversification pathways. Recall, however, that since we assume that in of the initial *k* SI classes is absent, then, after the diversification process (provided no further mutations have happened in the population), SI classes are incomplete and will not be able to fertilize the new SI class

Two remarks are in order. First, the steps described above give a detailed description of all the changes in the population needed to generate a new complete SI class. This, however, may seem an overly strict requirement, especially for large *k*, since missing one or two SLFs only slightly affects the frequencies of haplotypes compatible with the focal haplotype. If we relax the requirement of completeness, we need, in principle, only two changes (one non-neutral) in the population. First, a single haplotype class gains a “not-yet utilized” SLF after which another haplotype class undergoes a mutation in the *R*-locus, generating an incomplete SI haplotype (step 2, first non-neutral mutation of pathway 5, Figure 2). This path was also identified by Fujii *et al.* (2016). It remains to be seen, however, whether this path is likely (or even possible) under the various levels of incompleteness (parameter *n*) and demographic parameters used in this paper. Second, if we assume that all *k* SI classes initially have the new SLF (*i.e.*, ), and then restrict the subsequent mutations to the diagonal of the cube (highlighted rectangle in Figure 2, *i.e.*, considering only pathways 1 and 2), we recover the SR model in Uyenoyama *et al.* (2001) and Gervais *et al.* (2011). This allows us to directly compare the evolutionary diversification pathways in NSR and SR systems. Finally, we remark that since in the NSR model pathways 1 and 2 imply a simultaneous loss and gain of two specific SLF, we predict that these pathways (if feasible) are less common than the alternative pathways 3, 4, and 5. This should be observed in the stochastic simulations, where the rate of mutations is considered explicitly.

### The life cycle of an individual and the dynamics of the population

In this section, we first give the life cycle of an individual and then give the recurrence equations to study the dynamics of haplotypes for both infinite and finite population models. The infinite population model is the large population limit of the stochastic finite population model.

Consider a well-mixed population with nonoverlapping generations, such that one iteration of the (finite and infinite population) model represents the full life-cycle of an individual (*e.g.*, annual plants). At the beginning of the season, each diploid individual plant produces, and receives, a large number of haploid pollen grains. Of all the pollen received a proportion *α* is assumed to come from the same individual, and from a pollen pool from all the other plants (*i.e.*, global pollen dispersal). The proportion of haplotypes received is proportional to the frequency distribution of the pollen in the whole population. Importantly, we assume that all outcrossed mating events are among unrelated individuals and that self-fertilized offspring undergo inbreeding depression Uyenoyama *et al.* (2001), Gervais *et al.* (2011). Self-fertilized offspring survives until adulthood with a fixed probability relative to outcrossed offspring. After offspring are produced, all adults die. A possible mutation occurs at the time of reproduction. We assume that the S-locus is nonrecombining. In this paper, we also assume that there is no geographic structure. Thus we make no distinction between globally and locally incomplete S-haplotypes, implying that mate limitation due to completeness plays a lesser role in structured populations (see *Discussion*).

#### Infinite population model:

Following the assumptions of the life cycle of an individual (see above), the probability that an individual with genotype *g* is self-fertilized is equal to the number of self-pollen grains received that are SC, divided by the total number of compatible (self and nonself) pollen grain received:(1)where is a diploid fertilization function that gives the fraction or 1 of self-pollen that can self-fertilize, is a haploid fertilization function that gives the probability that a pollen with haplotype *r* can fertilize a diploid individual *g*, and is the frequency of (haploid) pollen *r* in the whole population. Similarly, the probability that an individual with genotype *g* is outcrossed with haploid pollen *h* is(2)If the plant is self-fertilized, the offspring survives until adulthood with probability relative to outcrossed offspring. The frequency of (diploid) genotype *r* in the next generation is(3)where is the probability that a diploid individual *g* that is fertilized by a haploid pollen *h* produces a diploid offspring *r* (according to Mendelian rules), is the average fitness in the population, and where, for clarity, we use *x* to denote (diploid) genotype frequencies.

To study the conditions for the various diversification pathways we suppose, in the infinite population model, that initially the population consists of *k* SI classes, with equal frequencies and no SC haplotypes. Moreover, *n* out of these *k* classes have the novel not-yet utilized SLF fixed within the class. The haplotypes in the remaining classes do not have We note that each haplotype may have a different number of SLFs, *i.e.*, *L* is not fixed in the infinite population model.

#### Finite population model:

Let *N* denote the number of individuals considered in each simulation (*i.e.*, haplotypes). Each haplotype is represented by a sequence(4)denoting the states at a single female determining SRNase and a fixed number *L* of male SLF genes. Keeping *L* fixed implies that the appearance of a new SLF gene is accompanied by loss of another SLF gene. However, the effect on fitness becomes small when *L* is large as it increases the likelihood that the lost SLF gene does not correspond to any of the RNase in the population, and, thus, has no affect on the mate availability of the haplotype. Unlike in the infinite population model, where we assumed initial presence of SI classes, the initial state in the finite population simulations consists of identical SC haplotypes, *i.e.*, all haplotypes carried the same SRNase and the same (random) sequence of SLF-genes. This choice of the initial state allowed us to capture the initial phase of the evolution of SI haplotypes where SI appears and forms a functional system with at least three SI haplotypes. Moreover, the combination of mutation and drift results in intermediate populations, which can be much more diverse in comparison with the infinite population model, *i.e.*, the finite population model allows multiple simultaneous diversification events at the same time. Thus, the finite population model may reveal some of the limitations of the infinite population model. Each generation consisted of the life-cycle described above; while each life cycle consists of mutation, followed by reproduction and viability selection.

##### Mutation:

We assumed a finite space of distinct RNase types and SLFs where the pollen type targets the RNase (). The number of mutations in the RNase and in the SLFs were drawn randomly from a binomial distribution with parameters for RNase mutations and for SLF mutations and randomly placed on the genotypes. Here, and represent the probability of a mutation in a given generation either in the RNase or in a single SLF. The binomial distribution can be replaced by the Poisson distribution with parameters (female) and (male), since the two distributions are the same in the limit of small and large *N*, *L* (valid throughout our simulations). Mutation rate was the same for all RNase-mutations and similarly for all SLF mutations for any

##### Reproduction and viability selection:

The second part of the simulation consisted of randomly generating the mating partners and their offspring, incorporating the compatibility between individuals and selection in terms of selfing *α* and inbreeding depression *δ*.

In the infinite population model, the frequencies of individual genotypes *g* in the next generation can be obtained by solving a system of difference equations (3). However, stochastic simulations use probabilistic rules to determine offspring production for all parent combinations. These rules quantify the probability that female gametes *g* and male pollen *h* produce an (adult) offspring as(5)where and are defined in (1) and (2), and = 1 when and 0 otherwise. The mean population fitness ensures that The first term in (5) captures self-fertilization and is nonzero only when while the second term in (5) reflects outcrossing events between genotypes *g* and *h*. The simulation required two steps: using the current genotypes their frequencies and compatibility relationships we first generated *N* parental pairs (distinguishing between females *g* and males *h*) by stochastic sampling with probabilities . Next, we randomly generated a single offspring for each parental pair using Mendelian inheritance.

We ran the simulation for a fixed number of generations ( generations) with parameters *N*, *α*, *δ*, summarized in Table 2. The structure of the space of possible haplotypes depends on our choice of parameters and *L*. We recorded the list of genotypes in the population at all times, as well as a list of all mutations. This allowed us to trace the key measures and to:

Classify haplotypes based on their haplotype classes. Initially, a single SC class is present in the population (no SI classes were initially present).

Classify haplotypes based on compatibility among SI classes. We distinguish between SI and SC haplotypes and, in combination with the grouping in 1, we plot the frequencies of SI and SC haplotypes within each class.

Classify haplotypes based on completeness deficit,

*i.e.*, the number of nonself SI classes that cannot be fertilized because of missing SLFs. This measure depends on the number of SI classes, as it cannot exceed the number of SI classes −1. The minimal completeness deficit of an event is computed as a minimum deficit through all times during the lifetime of the event and through all SI haplotypes within the class at each time.Identify all birth/death events of the SI classes. Birth of a new SI class of type

*k*is an event that occurs at generation*t*if there is at least one SI haplotype with SRNase at the*t*-th generation while there was none such haplotype in the previous generation. Death of the*k*-th SI class occurs when the last SI haplotype from that SI class vanishes at generation*t*, provided there was at least one such individual at generationTrace the genealogy of the SI classes. We can trace the order of mutations that led from the ancestral haplotype to any other haplotype and record the times at which the mutation occurred. This allows us to trace the pathway leading to a birth (or death) event of any SI class. We chose the representative haplotype of each event as the first haplotype that reached the minimal completeness deficit of this event. The results were almost identical when the last haplotype in the class was chosen as a representative haplotype. We then traced back all its ancestors from the current and previous RNase class and projected it onto a mutation cube in Figure 2.

### Data availability

The authors affirm that all data necessary for confirming the conclusions of the article are present within the article, figures, and tables. Supplemental Material (File S1 and File S2) available at Figshare: https://doi.org/10.25386/genetics.6148304. File S1 contains figures that further clarify the following features: (i) effect of population size on the average number/frequency of SI classes; (ii) changes in the minimal completeness deficit in time for a single class; and (iii) diversification diagrams for all studied pathways, including the summary figure for File S2 contains the code required for a stochastic simulation of the SLF system with an example. This file also includes the output in the form of figures and tables.

## Results

### Theoretical predictions of evolutionary outcomes

Our model examines five potential pathways for the evolution of new SI haplotypes (see Figure 2), which includes four pathways with SC intermediates (pathways 1, 2, 3, and 4) and one where SI is maintained (pathway 5, see Figure 2). These pathways are also defined by the initial state of the population (*i.e.*, before the first functional mutation occurs), where pathways 1, 2, and 3 assume that the novel SLF () is initially present in the population, while the novel SLF is initially absent in pathways 4 and 5 (). The transition between the two initial states is selectively neutral and requires only a single mutation in the SLF of haplotype We divide the parameter space, proportion of self-pollination (*α*), and inbreeding depression (*δ*), into regions that represent; diversification (green; long-term increase in the number of SI haplotypes, loss of the intermediate mutant), short-term diversification (light magenta; short-term increase in the number of SI haplotypes), exclusion of the novel SI haplotype ( gray), novel SI haplotype replaces its ancestral SI haplotype (red; no diversification), SC haplotypes go to fixation (below the thick line; no SI present in the system), see Figure 3.

#### The maintenance of complete and incomplete haplotypes:

To disentangle the complexities involved in the diversification process, it is first useful to compute the effect of (in)completeness (number of missing SLFs) on mate availability and fitness, and determine whether (and when) complete and incomplete SI haplotypes coexist in the population (see Appendix A for the exact calculations). Complete SI haplotypes are maintained in the population due to NFDS, and, when rare, they have a selective advantage and increase in frequency. However, populations with incomplete haplotypes may not maintain all haplotypes. Here, rare incomplete haplotypes are expected to have a fitness advantage over common incomplete haplotypes with an equal, or lower, levels of completeness, but not over haplotypes with higher levels of completeness (*e.g.*, fully complete haplotypes). Therefore, even though incomplete haplotypes are under NFDS, they may not have sufficiently high fitness to increase in frequency when rare. We thus aim to resolve the conditions under which haplotypes with varying levels of completeness are maintained in the population. This is useful for investigating the feasibility of diversification pathways because, in general, diversification will result in some fraction of incomplete haplotypes.

Aligned with our assumptions on the diversification pathways (see *Methods*), we provide the exact analytical calculations for which SI haplotypes, some complete and some incomplete, can coexist in the population. Out of the haplotypes, are assumed incomplete and lack a single but identical SLF. The remaining SI haplotypes are assumed complete, including one that cannot be fertilized by the incomplete haplotypes corresponding to the haplotype in the diversification pathways (Appendix A). We find that when there were, in addition to no complete haplotypes () coexistence is possible only when the initial number of haplotypes (*k*) is three, but not for With one complete haplotype () coexistence is possible only for but never for and for coexistence is not possible for any However, when all haplotypes are complete (), coexistence is possible for all *k*. This implies that diversification is possible only when either a single haplotype is complete () but the total number of haplotypes is small (), or when all haplotypes are complete (). Despite these rather strict requirements for diversification, there are several caveats. First, the frequency of an incomplete haplotype, if initially present in the population, decreases to zero very slowly. This haplotype could therefore be rescued from extinction by gaining the missing SLF, which results in increased mate availability. Second, these results do not explicitly consider diversification events that involve the coexistence of SI and SC types. Third, we considered only the case where all incomplete haplotypes lack a single and identical SLF, thus coexistence and diversification may be possible for haplotypes with varying levels of completeness and different missing SLFs. Finally, our results hold only for global pollen dispersal. When dispersal is local, we expect that globally incomplete haplotypes that are locally complete will be maintained in the population. Next, we complement these results by studying all evolutionary diversification pathways, including intermediate haplotypes, to determine which diversification pathways are the most likely.

#### Pathway 1: SRNase as the first mutation:

Previous models (Uyenoyama *et al.* 2001; Gervais *et al.* 2011) suggested that diversification is not possible through pathway 1 (first mutation in the female-specificity). Our model can be reduced to correspond to these models for pathways 1 and 2 when all SI haplotypes are assumed complete and the dynamics are constrained to the diagonal of the cube (highlighted pathways in Figure 2). Pathway 1 represents a mutation first in the female SRNase leading to a SC intermediate followed by the pollen-part mutation (SLF) to produce a SI haplotype (Figure 2). Our results show that diversification via this pathway is possible for all levels of completeness (*n*) and haplotype number (*k*) (see Figure S3 in the Supplemental Material). The apparent discrepancy with our previous result that diversification is possible only for or originates from the fact that here, in pathway 1, the intermediate SC haplotype is not excluded in the diversification process. However, when considering diversification through all pathways (Figure 2), diversification via pathway 1 occurs in the parameter region where SC haplotypes (produced in pathways 2, 3, and 4) have a fitness advantage, ultimately resulting in the loss of SI (region below the thick black line in Figure 3, see also Figure S3 in the Supplemental Material). Consequently, for both SR and NSR systems, it appears that diversification is unlikely through this evolutionary pathway.

#### Pathways 2–5: the effect of completeness and haplotype number:

For pathways 2–5, completeness interacts with haplotype number to determine the values of self-pollination *α* and inbreeding depression *δ* where diversification is possible. When all haplotypes were incomplete (), diversification was not possible through any of the pathways, independent of the number of haplotypes in the population (Figure 3Ai). In contrast, for example, when long-term diversification occurred under conditions of high *α* and *δ* under two scenarios; first, through pathways 2, 3, and 4, when the number of complete haplotypes was one ( green region in Figure 3Aii), and, second, when all haplotypes were complete, through pathways 2 and 3 (green region in Figure 3Aiv). When the number of complete haplotypes was between 1 and *k*, only short-term diversification was possible (light magenta region in Figure 3Aiii), and this occurred through pathway 5 (SI maintained). Because all intermediate SC haplotypes are excluded after diversification, in the short-term, new SI haplotypes coexist in the system following diversification, but all incomplete haplotype classes then slowly go extinct (see results above). In this case, extinction is only prevented by mutations that result in incomplete haplotype classes becoming complete (obtaining all SLF genes in the population).

The effect of completeness on diversification was also observed at higher haplotype numbers. When there was no diversification when (the condition for possible diversification for is Appendix A), with the novel SI haplotype () unable to invade the population (Figure S6 in File S1). This is in contrast to diversification in the narrow region of self-pollination and inbreeding depression when there are fewer haplotypes ( cf. Figure 3Aii and Figure S6ii). Yet, similar to when short-term diversification was observed when the number of complete haplotypes were between 1 and *k* (Figure 3Aiii and Figure S6iii), and long-term diversification at high self-pollination (*α*) and inbreeding depression (*δ*) when all haplotypes were complete (Figure 3Aiv and Figure S6iv). This suggests that the conditions for diversification are restricted to a narrow region of *α* and *δ*, but that this is more flexible when there are fewer haplotypes (smaller *k*), as diversification can occur at both low and high levels of completeness. Moreover, completeness will determine if there is long- or short-term coexistence of novel haplotypes in the population. The parameter spaces for diversification for each evolutionary pathway across a broader range of haplotype numbers and levels of completeness are outlined in Appendix A–C.

If the selfing rate *α* and inbreeding depression *δ* are constant, the SR model of Uyenoyama *et al.* (2001) and Gervais *et al.* (2011) predicts very low numbers of S-alleles. This is because the diversification regions from *k* to S-alleles generally do not overlap, unless *k* is small and thus, in most cases, only a single diversification event is possible. Interestingly, the diversification regions for the infinite population NSR model are identical to those for the SR model, see Figure 2 in Gervais *et al.* (2011). The similarity between the SR and NSR models is because all haplotypes are initially assumed complete (in both SR and NSR models) before the novel female specificity arrives in the population. We show the diversification regions for S-alleles in Figure 3. Multiple diversifications may occur only when the diversification regions overlap (gray color in Figure 3B). For example, gray region I allows diversification from to since it is the intersection of the regions for and Similarly, region II allows diversification from to and the narrow gray region III allows diversification from to For other parameter combinations, only a single diversification event is possible in the infinite NSR model. Limited diversification is one of the major drawbacks of the deterministic SR and NSR models. However, demographic stochasticity may be the key to, at least partially, resolving this puzzle since stochasticity very often leads to different dynamics compared to deterministic models.

### Stochastic simulations: an introductory example

The deterministic model for infinite population size appears inadequate to explain the generation and maintenance of the large number of haplotypes observed in natural populations. First, for both SR and NSR systems, the deterministic model predicts that the number of SI haplotypes increases by at most two when selfing rate and inbreeding depression are constant (see also Gervais *et al.* 2011). Second, in our deterministic NSR model, diversification often leads to the eventual loss of all incomplete haplotypes. Consequently, we next examine stochastic simulations, which include features such as unconstrained mutational order and drift. We find that stochastic simulations solve some of the problems of the deterministic model in explaining haplotype diversification.

We begin by presenting an introductory example that clarifies the essential concepts and parameters used in the following sections. All simulations begin () with a population of SC individuals, with the same SRNase class and a random set of SLF genes including the SLF that corresponds to its own SRNase (Figure 4A). In this example, there are 50 possible SRNase haplotype classes. Only SI haplotypes are presented in Figure 4B, and these can be classified into two classes: complete haplotypes with all SLF genes corresponding to other SRNase haplotypes in the population, and incomplete SI haplotypes that are missing some SLFs. Simulations are run for 10,000 generations, and the dashed line for each SRNase class represents the emergence of that class; gray, short lines are SI haplotypes that have a lifetime of <100 generations, light green lines are incomplete haplotypes with a lifespan of >100 generations and dark green complete haplotypes with long lifespan (>100 generations). In Figure 4B the length of the line shows lifespan. Complete haplotypes are generally present at higher frequencies, and have longer lifespans than incomplete ones. Completeness (having a full set of SLF genes) determines mate availability. In this example, at generation 500, haplotype class 4 is complete and can therefore mate with all other haplotypes (Figure 4Ci and ii). In comparison, haplotype class 21 is incomplete and has reduced mate availability, both when considering higher frequency ( Figure 4Ci) and rare classes (Figure 4Cii).

### Stochastic simulations: establishment and the number of SI classes

Next, we examine the conditions associated with S-haplotype diversification. The frequency of SI haplotypes was greatest () at intermediate to high values of self-pollen deposition () and high inbreeding depression () (Figure 5A). Here, the average frequency of SI types in the population was highest (closest to 1) with high values of both self-pollen deposition () and inbreeding depression () (Figure 5A). In the parameter space (), the average number of SI haplotypes that evolved was 7–14 for population size 1000 (Figure 5B), although some of these are rare. Population size influenced the relative frequencies of SI to SC haplotypes, so that the overall relative frequency of SI to SC increased with population size (see the results for in File S1).

### Stochastic simulations: evolutionary dynamics and the interplay between SI and SC classes

Our model shows the three stages in the evolution of a NSR SI system: the establishment, diversification and stationary phases (Figure 6). During the establishment phase (duration of which is proportional to the mutation rate) the relative frequency of SC haplotypes decreases as these are replaced by SI haplotypes. Once the SI system is established (defined as ), the population enters the diversification stage, which is characterized by an increase in the number of SI classes. Finally, during the stationary phase, SI classes appear and disappear but their long-term average number remains constant. During the stationary phase, there were many cases of equal frequencies of SI classes, but also some low frequency SI classes. Recurrence of SC haplotypes was observed for all parameter combinations; however, their frequency and occurrence varied based on the mutation rate ( *vs.* ) and on the number of potential haplotypes ( *vs.* ). Higher mutation rate lead to a greater frequency of more transient events, whereas for lower mutation rates events were less frequent, but persisted in the population for longer (Figure 6A *vs.* Figure 6B and Figure 6C *vs.* Figure 6D). Moreover, fewer potential haplotypes () resulted in a lower abundance of SC haplotypes (Figure 6A *vs.* Figure 6C) and their faster elimination from the system (cf. Figure 6, B and D). These patterns were qualitatively consistent for *α* and *δ* combinations within the parameter space where SI invades (see Figure 5). Yet, the frequency of SI types was greater for the parameter space where new SI haplotypes are more likely to invade. Equal frequencies of the most common SI classes were apparent in the stationary phase, and this was most consistent for lower mutation rates (Figure 6, B and D). Low frequency SI haplotype classes were, however, also present at sufficient numbers leading to the occasional replacement of the most common SI classes by the low frequency SI classes.

To track SI haplotypes (ignoring SC haplotypes), we recorded the first occurrence of a given class (Figure 6) that may have arisen from either a SC or SI haplotype (see Figure 2). Each line (row) therefore shows an event that begins with the occurrence of a novel SI haplotype and ends with the extinction of the last SI haplotype from that class; the length of the line is therefore the lifespan of the haplotype class. Short events occur when the haplotype class is lost due to demographic stochasticity, while the long events represent haplotype classes that reach sufficient frequency and are maintained in the population. There were many more short than long events, and the lifespan of SI haplotype classes varied in relation to mutation rate and potential haplotype number. Higher mutation rates led to a greater proportion of short events (Figure 6, A and C). Moreover, for a given mutation rate, there was higher turnover and less stability of the SI classes when there were more potential haplotypes ( Figure 6, A and C). Consequently, the highest turnover of SI classes occurred at higher mutation rates and numbers of potential haplotypes (Figure 6C), compared to the longer lifespans and less turnover observed for lower mutation rates and number of potential haplotypes (Figure 6B).

### Stochastic simulations: most likely evolutionary pathways for SI haplotypes with long lifespans

We now examine how the evolutionary pathway of long lifespan SI haplotypes depends on mutation rate (*μ*) and potential haplotype numbers (). Both mutation rate and potential haplotype number influenced the likelihood of each pathway for SI haplotypes with a lifespan of >100 generations. High mutation rate () and a lower number of possible haplotypes () favored the pathway that maintains SI (Figure 7A). Yet, with a greater number of possible haplotypes (), a higher mutation rate resulted in high transition likelihoods for both pathway 4 with a SC intermediate, and for pathway 5 that maintains SI (Figure 7C). In comparison, for lower mutation rates (), we observed a larger number of transitions through the pathway with an SC intermediate (pathway 4), and this was more consistent for different potential haplotype numbers (Figure 7, B and D). Furthermore, and aligned with our predictions on the unlikeliness of pathways 1 and 2 (they require a simultaneous loss and gain of two specific SLFs), we do not observe these pathways in our simulations (see above for more discussion on pathway 1).

#### The effect of completeness:

Complete haplotypes with no missing SLF genes had the longest lifespan and maintained the highest maximal frequencies compared to haplotypes missing between two and four SLF genes (minimal completeness deficit between two and four; Figure 8). In comparison, missing only one SLF gene had a smaller effect on lifespan and frequency. All completeness deficits were represented in the short lifespan and low frequency classes. However, the complete haplotypes, or those missing only one SLF gene, had the longest lifespans ( generations) and highest () frequencies (Figure 8). We provide further illustration of the importance of completeness for haplotype lifespan in Figure S2 of File S1. In this example, we present a single representative SI class and its completeness deficit throughout its lifetime, showing that extinction of an SI class is associated with the loss of completeness.

## Discussion

We use analytical theory and stochastic simulations to examine the conditions under which novel S-haplotypes evolve for NSR SI. In addition to a pathway that maintains SI, diversification may occur through SC intermediates. Our results also highlight the importance of completeness and haplotype number for diversification in NSR systems, how this varied with evolutionary pathway, and how this interaction determines the long- or short-term coexistence of novel haplotypes. With high inbreeding depression and moderate-to-high self-pollination rates, it is possible for new specificities to evolve in our model, and to produce a population in which almost all haplotypes are functionally incompatible. However, only 14 S-haplotypes, at most, were present in finite populations. We first discuss the conditions that may promote diversification and how these may vary through the evolutionary process. Then, we relate our results to previous models and examine how they extend our current understanding of novel S-haplotype evolution in NSR systems. We conclude by discussing future directions for theoretical models, and how these, when combined with genomic data, can provide insight into the evolution of diversity in NSR systems.

### Evolutionary pathways for diversification: self-pollination, inbreeding depression and haplotype number

Self-pollination and inbreeding depression influenced diversification outcomes: yet this varied for the different evolutionary pathways and with different levels of completeness and haplotype number. We first discuss how these parameters influence diversification for each pathway and then discuss the interaction between selfing rate and inbreeding depression.

#### First mutation in the female-specificity with a SC intermediate:

Diversification was unlikely through a pathway where the first mutation happens in the SRNase resulting in an intermediate SC haplotype (pathway 1); this pathway is identical to the path presented in Uyenoyama *et al.* (2001) and Gervais *et al.* (2011). Although this pathway is, in principle, possible (see Figure 3), it is unlikely to contribute to the observed diversity because, when comparing all pathways simultaneously, this parameter region overlaps with the region where SI is lost from the population via alternative pathways (region below the thick line in Figure 3). Furthermore, this pathway is not observed in the simulations. The discrepancy between our model results likely originates from differences in the order of mutations between the two theoretical approaches. In contrast to the analytical model, where diversification through a pathway occurs in a set sequence, the order of mutations is random in simulations. This means that, like Gervais *et al.* (2011), pathway one was not observed in our simulation results because either an SLF or SRNase mutation can occur first resulting in the loss of the SI system. Consequently, although theoretically possible, pathway 1 can only occur under the restricted conditions observed in the deterministic model and is unlikely to contribute to long-term diversification.

#### First mutation in the male-specificity with a SC intermediate:

Haplotype number influences the strength of balancing selection during S allele diversification. In our analytical model, long-term diversification was more likely with lower haplotype numbers for pathways where the first mutation happened in the male specificity (*e.g.*, Figure 3). This is because, for lower haplotype numbers, the intermediate SC haplotypes have a substantial advantage in mate availability over SI haplotypes, as SI haplotypes cannot fertilize their own haplotype class, which is present in high frequency for low haplotype numbers. Self compatible haplotypes can therefore invade the population, eventually resulting in diversification after the next mutation. This effect gets weaker with increasing numbers of haplotypes as SI haplotypes become less mate-limited, resulting in a smaller parameter region for diversification, and this occurs at lower inbreeding depression. Consequently, there is generally no overlap in the parameter space for diversification for different numbers of haplotypes initially present in the population. Thus, surprisingly, even though in principle the number of S-haplotypes can increase to infinity in our analytical infinite population model, the number of haplotypes increases by at most two for any fixed level of inbreeding depression and self-pollination. However, in nature, variation in selfing rate and inbreeding depression may facilitate traversing nonoverlapping parameter spaces. An equivalent result was found for SR systems in Gervais *et al.* (2011), indicating the importance of frequency-dependent selection for evolutionary outcomes. This has interesting implications for considering the application of these models to empirical data on the distribution and number of S-haplotypes (see *Discussion* below). For example, small populations with fewer S alleles may provide the conditions that promote diversification in a metapopulation.

#### First mutation in the female-specificity with a SI intermediate:

In contrast to the above pathways where diversification was unlikely for higher haplotype numbers, diversification for a pathway where SI is maintained (pathway 5; Figure 3) is possible for any initial haplotype number. Moreover, since all haplotypes outcross, this pathway has no fitness costs associated with self-pollination and inbreeding depression. This implies that, in principle, multiple consecutive diversification events are possible for this pathway for any level of inbreeding depression and self pollination. However, in our deterministic model the diversification was only short term as all incomplete SI haplotypes slowly go extinct.

#### Purging of deleterious alleles:

Theory predicts a negative relationship between inbreeding depression and selfing rate, such that the purging of deleterious recessive alleles through selfing reduces inbreeding depression (Lande and Schemske 1985). However, the combination of high inbreeding depression and selfing rate is not unrealistic, given that studies have found inbreeding depression in populations and species with high self-fertilization rates (Byers and Waller 1999; Winn *et al.* 2011). Moreover, Gervais *et al.* (2014) found that purging had little effect on the spread of SC individuals if most deleterious alleles had weak fitness effects. In our study, the combination of self-pollination and inbreeding depression where diversification was observed varied with evolutionary pathway, indicating the potential for different conditions to favor diversification through alternative pathways. In our model, inbreeding depression was fixed through time (*i.e.*, purging was not considered), even though its strength may vary with population size (Bataillon and Kirkpatrick 2000) and the degree of biparental inbreeding (Porcher and Lande 2016). Sheltered genetic load may also influence the dynamics of deleterious alleles and inbreeding depression (Porcher and Lande 2005; Llaurens *et al.* 2009), although this is more likely to be important for sporophytic SI systems with dominance hierarchies among alleles (Llaurens *et al.* 2009). By considering the importance of dynamic inbreeding and genetic linkage, future models could further examine how these apply to the evolution of novel S-haplotypes in an NSR SI system.

### Evolutionary pathways for diversification: the effect of completeness

The relative importance of completeness for diversification may vary for self- *vs.* NSR systems. Studying an SR model, Sakai (2016) found that an incomplete SI system was essential for diversification, and that this occurred during the initial evolution of the SI system. Here the pollen genes (male component) for a given specificity were not fully rejected by the female-determining genes for that specificity: leading to incomplete rejection following the matching of SI haplotypes. In this model of diversification, S alleles evolve before the species split and then are maintained in different species after diversification (Sakai 2016). In our model, incompleteness (missing SLF genes related to SRNases in the population) reduces mate availability and fitness. One of the key results of this study on NSR SI is the importance of completeness for long-term diversification. Long-term diversification was observed only when one haplotype is complete () or all haplotypes are complete (). For all other completeness levels only short-term diversification is possible because incomplete haplotypes slowly go extinct unless rescued by mutations that restore completeness (a full set of SLFs). This is analogous to evolutionary rescue (Gomulkiewicz and Holt 1995; Gonzalez *et al.* 2013), where mutation can prevent extinction, enabling the haplotype to persist in the population. Our simulation data also show the potential importance of completeness for haplotype lifespan and turnover. Given that mate availability scales with completeness deficit, incomplete haplotypes with more missing SLFs are likely to be selected against, reducing their lifespan and contribution to long-term diversification outcomes.

#### Completeness and the pathway that maintains SI:

Our results also raise a number of questions about diversification via the pathway that maintains SI (pathway 5). Diversification is not possible for this pathway when the number of complete haplotypes is zero or one (*i.e.*, or 1). When the number of complete haplotypes is >1 diversification is possible, but only if all incomplete haplotypes rapidly become complete, otherwise they go extinct. This challenges the feasibility of the evolutionary pathway for diversification proposed by Fujii *et al.* (2016). Yet, when we consider finite populations we do see diversification through this pathway. This implies that conditions in the simulations such as a random order of mutation events, higher mutation rates, finite population size, and having a more flexible SLF template (*i.e.*, more SLFs to begin with) may facilitate diversification through this pathway. In conclusion, comparing the results of our deterministic and stochastic models highlights the potential importance of completeness for long-term diversification in the NSR SI system. These results, however, are based on the assumption of global dispersal of pollen. It is possible that the importance of incompleteness may decrease if pollen has a limited range, as this may reduce the effects of missing SLFs on mate availability. Future models that extend our results to nonglobal dispersal may therefore assist our understanding of the role of incompleteness in S-haplotype diversification.

### Congruence of theoretical models and empirical data: estimates of haplotype number

The extremely high level of S-haplotype diversity observed in natural populations is well established (Lawrence 2000; Castric and Vekemans 2004), raising interesting questions about the congruence of theoretical models with empirical data. The number of S-haplotypes derived from our model ( for population size 1000) were far fewer than the diversity commonly observed in natural populations of species with SR and NSR SI (20–40 SI haplotypes; Lawrence (2000)). These results are similar to the simulation results of Gervais *et al.* (2011) who found that diversification peaked at between 7 and 18 alleles. It has been suggested that population structure may provide the conditions for diversification (Uyenoyama *et al.* 2001; Gervais *et al.* 2011). Incomplete reproductive barriers and hybridization among species may also create the population structure required to enhance diversification (Castric *et al.* 2008). In this case, novel S-haplotypes may evolve in separate species, which are then exchanged among species via introgression. These novel S alleles may spread through the population once they are decoupled from the hybrid genetic background. These ideas are intriguing given the transpecific nature of S alleles and the maintenance of SI during speciation (Igic *et al.* 2004). Further models that include population structure, as well as introgression, are therefore required to assess the potential importance of this for diversification at the S locus. The total number of S-haplotypes predicted by Sakai (2016) was higher (40–50 alleles), and more in line with population estimates. The mechanism of diversification in this model, however, suggests that novel S alleles evolve during the initial evolution of the SI system from self-compatibility. It is also based on a SR SI, and involves variation in the strength of the incompatibility reaction, suggesting that the mechanism may be less applicable to S-haplotype diversification in NSR systems.

There are a number of assumptions required to simplify the inherent complexities of modeling the evolution of new S-haplotypes in NSR systems. These assumptions may have contributed to the lower haplotype estimates compared to that observed in natural populations. We first assume that there is no recombination within the S-locus. However, Kubo *et al.* (2015) provide evidence of genetic exchange at the S-locus for petunia, and suggest that SLFs may be shared among S-haplotypes via this process. Inclusion of genetic exchange may therefore facilitate novel S-haplotype evolution, as suggested by Fujii *et al.* (2016), who found that this had a important role in evolution of novel S-haplotypes in their NSR model. Second, in our model, we assume a one-to-one relationship between each SLF and SRNase. Yet recent evidence of SRNase recognition by multiple SLFs in the collaborative NSR system of petunia Kubo *et al.* (2015) suggests the need for integrating these dynamics into models of novel S allele evolution in NSR systems. Consequently, extension of our model to include a collaborative network with redundant SRNase recognition by multiple SLFs may facilitate the evolution of novel S-haplotypes. Finally, we assume equal mutation rates for male- and female-determining components of the S locus. Given the larger size of the SLF compared to SRNase genomic region, and some evidence of greater variation and turnover of SLFs (Kubo *et al.* 2015), the potential influence of higher mutation rates for SLF genes could be tested. Consequently, extending our model to include genetic exchange, the collaborative nature of the NSR system, unequal crossing over and variation in mutation rates for male- and female-determining components may result in different evolutionary outcomes and equilibrium number of S-haplotypes.

### Haplotype lifespan and frequency: mate availability and negative frequency-dependence

The interaction between haplotype completeness and mate availability can influence the lifespan and frequency of novel SI haplotypes. In the NSR SI system modeled here, mate availability is inversely related to the deficit in SLF genes (minimal completeness deficit), so that complete haplotypes, with no missing SLFs, have the highest mate availability and are able to mate with all other haplotypes in the population. Our results of longer lifespan and high frequency for complete haplotypes (minimal completeness deficit of zero), reflects its importance to mate availability and fitness. Interestingly, and in contrast to our infinite population model, the moderate to high longevity and frequency of haplotypes missing one SLF (minimal completeness deficit of one) suggests that these haplotypes can still maintain high fitness. The deficit in SLF genes may also affect the strength of NFDS, so that NFDS weakens when there are fewer mating partners. This may contribute to the stochastic loss of haplotypes with a greater deficit in SLF genes, which is reflected in their shorter lifespans and lower frequencies. Taken together, these results highlight the importance of NFDS and suggest that both evolutionary pathway and mate availability contribute to the outcomes and success of novel SI haplotypes. It would be interesting to see if these results are still apparent with extensions to the model to include local pollen dispersal, since this may lessen restrictions in mate availability.

### Conclusions

Our model demonstrates that novel S-haplotypes can evolve across a range of parameter values (inbreeding depression and self-pollination), but that this varies with evolutionary pathway. This result generates intriguing questions about the role of SC intermediates in S-allele diversification and how different conditions may favor alternative pathways. For example, when considering empirical data, does the presence of low frequency SC individuals in populations (Raduski *et al.* 2012) represent points in the diversification process? This also raises questions regarding the viability of SC individuals as intermediates for the evolution of new specificities under different models of inbreeding depression. Extensions of this model to include population structure may also help to reconcile differences between theoretical models and the number of alleles commonly observed in plant populations. Combining genomic data with model predictions could provide insight into the evolutionary dynamics of NSR SI. For example, variation among individuals in SLF gene position and copy number may provide information on recombination frequency and gene duplication events (see Kubo *et al.* (2015)); while the distribution of mutations within SLF genes may indicate the steps required to produce a novel SLF during allelic diversification. Indeed, previous studies have provided some estimates of the number of mutations required to alter S-allele specificities (Matton *et al.* 1999; Chookajorn *et al.* 2004), but given differences in the molecular mechanism and variation in size of the female- and male-determining components, this may vary with SI system. Combining theoretical models with data on the genomic structure of the SLF region will therefore improve our understanding of haplotype diversification for NSR SI: providing a fascinating example of the evolutionary dynamics involved in genetically based recognition systems.

## Acknowledgments

We thank Deborah Charlesworth and three anonymous reviewers for helpful comments on the manuscript. We also thank Vincenzo Natali for his influential movie The Cube that was a great source of inspiration. Here, the characters move through cube-shaped rooms with various death traps, as do the S-alleles in our work, which seem to be searching for a successful escape route through a sequence of mutational cubes, facing the self-compatibility and incompleteness traps. The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement number 329960, European Research Council (ERC) research agreement number 250152 and Research Executive Agency (REA) grant agreement number 291734.

## Appendix A

### The Coexistence of Complete and Incomplete SI Haplotypes and the Necessary Condition for Diversification

In this section, we are interested in whether SI haplotypes, some complete and some incomplete (missing one SLF), can be stably maintained in the population. Then, we discuss how the derived conditions can be used to study the feasibility of the diversification pathways 1–5 discussed in the main text.

Because we aim to examine the conditions associated with each diversification pathway, we study the coexistence of the “final” number of (complete and incomplete) SI classes after a diversification event. That is, we construct a set of equations, using (3), to describe the dynamics of *n* SI haplotypes that have *m* SI haplotypes that are missing as well as the “novel” haplotype where Therefore, and in line with the assumptions on diversification pathways in the main text, we study the possible coexistence of complete haplotypes (*n* complete resident types plus a complete type ) and incomplete SI haplotypes that lack the SLF for Notice that the haplotype (in the main text used as the ancestral haplotype) belongs to one of the *n* haplotypes if it has (in the main text ), and if it lacks it belongs to one of the *m* haplotypes (in the main text ).

For convenience, we will use notation (more generally, we use indices ) to describe all the complete resident haplotypes and call it a group of *n* complete SI haplotype classes. Similarly, (more generally, we use indices ) will denote the group of all incomplete SI types. To simplify our notation we write for the sum of all haplotype groups (pollen) that can fertilize a (diploid females) genotype group *g*. This change of notation (and its slight abuse) will help considerably in writing the dynamical model. Similar notation was also used in models of Uyenoyama *et al.* (2001) and Gervais *et al.* (2011).

#### The model

We first write equations for the most general model where implying after which we discuss the remaining four cases and See the equations and the explanation below for why such a distinction must be made. The expected fitness for all genotype groups (see the definition above), in the most general case (), can be written as(A.1)(A.2)(A.3)(A.4)(A.5)The frequency of all haplotypes that can fertilize each diploid group are(A.6)(A.7)(A.8)(A.9)(A.10)and the haplotype group frequencies in terms of genotype group frequencies are(A.11)(A.12)(A.13)We now examine how these equations were built. For example, denotes the expected fitness of an arbitrary (diploid) genotype where both haplotypes belong to the group Therefore, looking at the first term in (A.1), females in group with frequency will produce a offspring with probability This is because there are haplotypes that can fertilize an arbitrary diploid female in this group [the frequency is ], and is the frequency of all the pollen that can fertilize an arbitrary diploid genotype in this genotype group. Similarly, the second term in (A.1) says that females in group with frequency will be fertilized by haplotypes from group to produce a offspring with probability This is because there are haplotypes that can fertilize an arbitrary diploid female [the frequency is ], is the frequency of all the pollen that can fertilize an arbitrary genotype in this diploid genotype group, and then with probability one half the offspring is of type

#### Some general comments

There are a number of reasons why we must write separate models for the special cases Firstly, notice that when all expressions will have zero or negative terms because there are not enough haplotype classes in or that can fertilize the corresponding genotype group. This is easily corrected by neglecting all the negative terms (*e.g.*, by multiplying the term by an indicator function which returns value 1 only if ). However, when the expressions and are zero or negative, this has the consequence that *no haplotype group* can fertilize these genotype groups. This means that the terms in the expected fitnesses that contain terms and must be removed altogether. For female can never be fertilized, and for no female with can be fertilized. The same argument must be taken where incomplete haplotypes exist in the population. Furthermore, in general, the expected fitness in the population is where we sum over groups *g* that can actually be fertilized, and is equal to only if all groups can be fertilized.

#### Results

We classify the results according to how many of the *k* resident haplotypes are complete (*n*). We start by recalling that whenever *i.e.* then all haplotypes are complete, *i.e.*, all haplotypes are maintained in the population and protected from extinction (protected coexistence).

##### Case *n* = 0: diversification is possible for *k* = 3, but never for :

*Proof:* For the above system simplifies considerably. As all fitnesses except and are zero, and as the frequencies sum up to one, the dynamics are fully determined by a single difference equation, *e.g.*, where(A.14)and where [see (A.8)], [see (A.13)] and the average fitness is (because a female will be fertilized but will not). The difference equation thus takes the form(A.15)from which we can calculate the equilibria by setting There exists only a single nonboundary (from now on nonboundary means that all frequencies are nonzero) interior equilibrium with associated eigenvalue Clearly, the nonboundary equilibrium is an interior equilibrium (all frequencies are positive) only for and for this value the eigenvalue is *i.e.*, the equilibrium is locally asymptotically stable. We conclude that coexistence of three incomplete haplotypes and is possible. For all no interior equilibria exist and so coexistence is not possible.As shown above, interestingly, three incomplete SI haplotypes can coexist at a stable equilibrium together with one complete SI haplotype. This is surprising because female can never be fertilized and so males would have to compensate for this fitness loss (see above). For this actually is possible, because -males can fertilize three out of six nonself genotype classes, whereas resident -males can fertilize only one out of six nonself genotype classes. However, when *m* increases these proportions become more and more similar, so that the advantage of -males decreases enough for the haplotype to go extinct.

##### Case *n* = 1: diversification is possible for 3 ≤ k ≤ 6, but never for 7 ≤ k:

*Proof*. For genotype group does not exist, and so we set and in (A.1). Importantly, also in (A.9), and so females remain unmated. The only complete haplotype that could fertilize is part of the female genotype and is self-incompatible. We were not able to find an expression for an interior equilibrium for all and so we calculate the stability of boundary equilibria: whenever all boundary equilibria are unstable there is negative frequency-dependent selection and all haplotypes will be able to coexist. The only (potential) boundary equilibria (see below) in this system are where or because only two SI haplotypes () can never persist in the system and so an equilibrium where does not exist.

The stability of we calculate the dominant eigenvalue associated with an equilibrium where which in term of genotype group frequencies is We take the Jacobian of the system, evaluate it at the resident equilibrium and obtain a dominant eigenvalue(A.16)which is >1 whenever

The boundary equilibrium at this equilibrium haplotypes are the residents. From the previous case we see that they can coexist for Interestingly, however, the model with does not reduce to the model with when we take the limit This is because in (A.4) the term where does not vanish for and so it will always contribute in production of *i.e.*, the production of

Since the only boundary equilibrium where is unstable whenever this is when coexistence of the haplotype classes is possible.

For the haplotype can now be fertilized, but only when paired with because there is only a single haplotype in group In contrast to the previous case, females also contribute to fitness, but they are still at a selective disadvantage. For small *k* males can compensate for this, but similar to above, when *k* increases, the incomplete males are able to fertilize an ever greater proportion of females. For sufficiently large *k* the haplotype loses its advantage and the haplotype goes extinct.

##### Case 2 long-term persistence of the diversification is not possible for any

In this case, we can derive analytical results for any fixed *m* (we used ), but not for arbitrary *m*.

For every the boundary where has an associated dominant eigenvalue which is always >1, and the boundary where has associated dominant eigenvalue 1 (as in the previous case an equilibrium where only is not a boundary equilibrium). However, we were unable to solve for all the equilibria as a function of *m* to exclude the possibility of stable interior equilibria. Nevertheless, we were able to solve the equilibria for specific values of *m* (we performed the calculations for ) and found that no interior equilibria exist (also, we have no reason why this should be any different for greater values of *m*). In addition, we performed numerical investigations to check that all trajectories approach the boundary equilibrium The convergence is not asymptotical (dominant eigenvalue 1) and takes, approximately, generations for the frequency of to be below for any *x*.

In summary, we find that the equilibrium has eigenvalue 1, while all other boundary equilibria are unstable and no interior equilibria exist. The convergence to the extinction of thus takes a very long time, leaving the possibility for a diversification event if the incomplete haplotypes persist in the population long enough for (all of) them to gain the missing SLF

*Case**n* = *k*: All SI haplotypes are complete and so negative frequency-dependent selection maintains the coexistence of all haplotypes.

#### Necessary conditions for diversification

If the above conditions are violated then no coexistence of SI haplotypes is possible. Consequently, these conditions must necessarily be valid for a diversification to take place, but only when the (possible) SC intermediate haplotypes are excluded from the final composition of haplotypes (the condition for coexistence are derived only for SI haplotypes). Since intermediate SC haplotypes go extinct in pathways 2, 3, and 4, we can apply our results there. In addition, it gives us a possible explanation for why pathway 1 is, in principle, possible for almost any combination of This is because the intermediate SC haplotypes persist in the system throughout the diversification process and thus (apparently) decrease average fitness enough for the incomplete haplotypes not to be too disadvantaged to go extinct. Also, these results can be used for pathway 5 as this only has SI intermediates (see more discussion in Appendix C).

The above conditions, however, are not sufficient because (as discussed above) it is possible that even if complete and incomplete haplotypes coexist, the necessary intermediate mutants, or the final mutant are not able to invade the population. Interestingly though, the above conditions correctly predict whether diversification occurs in all cases except for a case where (relevant in pathway 4, see also below).

#### Sexual selection: when are females selected against?

The above equations also reveal the similarities and differences between SR and NSR models in terms of sexual selection. In this paper, similarly to several other models addressing SI systems, we have assumed that the population is well mixed (*i.e.*, pollen disperses globally) and that each plant produces a large amount of pollen. Consequently, every female in the population will be fertilized if there exists at least one pollen grain that is able to fertilize it (*i.e.*, compatible with that female). This is usually true in SR models (with no pollen limitation) where nonrecognition results in compatibility, and so in SR models there is no sexual selection on females. Males, however, undergo frequency-dependent competition for fertilization and are under selection. In addition to sexual selection, individual fitness may be affected by inbreeding depression.

The situation is different in SLF-based NSR models that involve incompleteness. A key feature in these models is that a male must be able to recognize a female, *i.e.* have the corresponding SLFs, in order to be able to fertilize this female. In such models, a novel female type might thus go unrecognized as no haplotype has yet the corresponding SLF. Thus, in contrast to SR models, in NSR models females can be under selection. This is the case, for example, in our model where haplotypes are assumed complete with respect to all haplotypes except (or in cases where the mutants of have not yet gained the SLF ); if there are not enough complete SI haplotypes females having may not get fertilized. For no females (where *x* is any haplotype) can be fertilized, and, for females, can never be fertilized because the only complete SI haplotype cannot fertilize a female with However, for all females can be fertilized (if we for now ignore the intermediate mutants who may lack ). Therefore, only when there is no sexual selection acting on females, in which case all fitness effects come via males, or via inbreeding depression.

## Appendix B

### Diversification Pathways 1–5: Analytical and Numerical Results

To study whether diversification occurs for any of the pathways, we use (3) to construct the equations that correspond to each pathway separately. Then, for each pathway and possible initial state of the population we ask: can the first mutant invade the population? If so, will it coexist with all the haplotypes? Then, will the second (final) mutant invade and coexist with all the haplotypes? If these steps occur then this is diversification. There are in fact two possible final states; one where the intermediate (first) mutant is excluded, and one where it coexists with all initial and final SI haplotypes. It turns out that this distinction is important in SI models that allow for incompleteness because SC intermediate haplotypes influence the degree and nature of selection experienced by other, in particular incomplete, haplotypes (see Appendix C).

#### Pathways 1–4

Beyond the conditions derived in Appendix A (which we discuss in Appendix C) and the conditions derived for an analogous SR model of Uyenoyama *et al.* (2001) and Gervais *et al.* (2011), we did not find further analytical conditions concerning the feasibility of pathways 1–4; The conditions (analytical and numerical) for diversification pathways 1 and 2 are identical to pathways considered in Uyenoyama *et al.* (2001) and Gervais *et al.* (2011) when initially all haplotypes are assumed “complete” with respect to the not yet existing novel RNase (). Moreover, for pathways 2, 3, and 4, the first mutation is also identical to the pathways in Uyenoyama *et al.* (2001) and Gervais *et al.* (2011) for any *n* in our model, simply because the first mutation happens in the pollen and its fitness is therefore not affected by the presence or absence of the not yet utilized SLF. The second mutation however, which happens in the RNAse, does influence fitness differently for complete and incomplete haplotypes, consequently delineating our NSR model from the SR model in Uyenoyama *et al.* (2001) and Gervais *et al.* (2011). Similar to Uyenoyama *et al.* (2001) and Gervais *et al.* (2011), however, the outcome of the second invasion has to be computed numerically since we were not able to find an explicit solution for the intermediate equilibrium state. The complete numerical solution for their diversification events can be found in Figures S3–S5 for three values of and all possible initial conditions

#### Pathway 5

The upcoming results, together with Appendix A, enable us to solve the pathway 5 (almost) fully analytically. The analysis is considerably simpler than for the other pathways because all haplotypes, including the intermediate haplotype, are all SI and so their fitness is not affected by inbreeding depression. To study diversification for pathway 5 we use (3) to analyze firstly whether the first mutant (*i.e.* mutant of the ancestral haplotype that has the novel but lacks and ) can invade the population, and, if so, whether the second mutant can invade and coexist with all the other haplotypes.

##### Initial condition:

To consider pathway 5, at least one haplotype (the ancestral ) must be incomplete

##### 1st step:

In this paragraph we discuss the invasion of the incomplete mutant into a resident population for all

*Case n* = 0*:* (analytical result) Incomplete haplotype is not able to invade the resident population for any *k*. This is simply because females of the mutant are never fertilized (), and the dominant eigenvalue is half, since males, when rare, have equal fitness compared to an average resident (a rare incomplete male with deficit one can fertilize as larger a proportion of females as can a common “complete” resident, *i.e.*, all but one haplotype class).

*Case n* = 1*:* (analytical result) Incomplete haplotype is not able to invade the resident population for any *k*. The reason is similar to above, except that now females can be fertilized, but only if paired with incomplete haplotypes not when paired with complete haplotypes (). Females are thus at a disadvantage and since the contribution of males is as described in the previous case, the dominant eigenvalue is <1 [it is for all ].

*Case * incomplete haplotype can “neutrally” invade the resident population for any (dominant eigenvalue is one; analytical result) and then coexist with all the other haplotypes at a line of equilibria (*i.e.* at one of the infinite number of equilibria positioned on a line). We obtained the analytical expression for the line of equilibria for the cases and and, for the remaining cases, we found it numerically.

The dominant eigenvalue is 1 because now mutant females can be fertilized when paired with both and (), and males are as described in the previous two cases. The line of equilibria is a consequence of the fact that the mutant and its ancestor have exactly the same fertilization properties: all females can be fertilized, and males can fertilize exactly the same haplotype classes, all but their own and each other. They are thus selectively neutral with respect to each other, and the only force that can change their relative frequencies is drift in finite populations.

##### 2nd step:

Here we assume that the first mutant and its ancestor can coexist long enough for the second (final) mutant to appear in the population (case ). In principle, we should evaluate the invasion ability at the line of equilibria, which can, in some cases, be calculated (see above), but we take the simpler route and assume that the first mutant is still rare (negligible frequency) by the time arrives in the population. In such a scenario we can apply the results from section Appendix A. Following this, for all the novel mutant can invade the population (we also expect this result to hold when evaluating the population at the line of equilibria). However, as discovered in Appendix A, for all incomplete haplotypes are at disadvantage and eventually go extinct, but, since the extinction is very slow, the corresponding haplotype classes can be rescued by completing the haplotypes (see above).

## Appendix C

### Discussion on the Necessary Conditions for Diversification and the Analytical and Numerical Results on Pathways 1 to 5

Firstly, even though it seems like pathway 1 provides the most compelling parameter region () for diversification, for the very same parameter region, the other pathways 2, 3, and 4 lead to full extinction of all SI haplotypes, and replacement by a single SC haplotype. This suggests, that, if the population experiences favorable conditions for diversification for pathway 1, any mutation that leads to a complete SC haplotype (pathways 2, 3,and 4) will result in the loss of incompatibility from the population. Secondly, we may argue that the diagonal mutations where a specific SLF mutates to another specific SLF, is approximately *L* times less likely than any other mutation on the cube, thus taking a longer time to occur. We thus predict that pathway 1 (but also pathway 2) is unlikely to be responsible for diversification.

We have shown that diversification via pathways 2, 3, and 4 is possible for and for and in which case (long-term) stable coexistence occurs for SI haplotypes, of them incomplete (missing ). Interestingly, the coexistence of complete and incomplete haplotypes is long-term, until one of the incomplete haplotypes becomes complete, resulting in altogether three complete SI haplotypes, which then drive all the remaining incomplete haplotypes to extinction. However, this happens very slowly and so the incomplete haplotypes (“classes”) can be rescued by gaining the missing SLF before the class goes extinct. Nevertheless, unless all incomplete haplotypes become complete, the diversification process will result in the destruction of incomplete haplotypes and the number of surviving haplotype classes drops to the number of complete haplotypes in the current population (which is likely to be lower than the number in the initial resident population *k*). This diversification path may therefore eventually lead to a reduction in haplotype classes.

If the initial resident population (pathways 2–4) were to have at least two complete haplotypes then immediately after the invasion of all incomplete haplotypes proceed toward extinction (and we are back to the above situation). The diversification is thus only short-term, and will not persist unless, as above, all incomplete SI haplotype classes are rescued by gaining the missing SLF We thus predict that, for small *k*, diversification happens usually for since long-term stable coexistence is possible; while for higher *k* and/or greater *n* diversification may occur, but only for higher mutation rates. In addition, we should observe sudden drops of the number of haplotype classes associated with the creation of new complete SI classes.

Pathway 5, interestingly, is free from inbreeding depression during the diversification process because all individuals outcross. However, our analytical conditions predict that if diversification happens, it is very likely only short term, unless the mutation rate is sufficiently high for the incomplete haplotypes to become complete. Moreover, at least two haplotype classes need to have the not yet utilized SLF to initiate the diversification process.

## Footnotes

Supplemental material available at Figshare: https://doi.org/10.25386/genetics.6148304.

*Communicating editor: M. Beaumont*

- Received January 23, 2018.
- Accepted April 28, 2018.

- Copyright © 2018 by the Genetics Society of America