## Abstract

Speciation is fundamental to the process of generating the huge diversity of life on Earth. However, we are yet to have a clear understanding of its molecular-genetic basis. Here, we examine a computational model of reproductive isolation that explicitly incorporates a map from genotype to phenotype based on the biophysics of protein–DNA binding. In particular, we model the binding of a protein transcription factor to a DNA binding site and how their independent coevolution, in a stabilizing fitness landscape, of two allopatric lineages leads to incompatibilities. Complementing our previous coarse-grained theoretical results, our simulations give a new prediction for the monomorphic regime of evolution that smaller populations should develop incompatibilities more quickly. This arises as (1) smaller populations have a greater initial drift load, as there are more sequences that bind poorly than well, so fewer substitutions are needed to reach incompatible regions of phenotype space, and (2) slower divergence when the population size is larger than the inverse of discrete differences in fitness. Further, we find longer sequences develop incompatibilities more quickly at small population sizes, but more slowly at large population sizes. The biophysical model thus represents a robust mechanism of rapid reproductive isolation for small populations and large sequences that does not require peak shifts or positive selection. Finally, we show that the growth of DMIs with time is quadratic for small populations, agreeing with Orr’s model, but nonpower law for large populations, with a form consistent with our previous theoretical results.

- speciation
- Dobzhansky–Muller incompatibilities
- sequence entropy
- population size
- coevolution
- genotype–phenotype map

SPECIATION is of great importance in generating the observed diversity of life, yet it is still poorly understood, especially at the genetic level. Two populations are said to have speciated when they have developed *reproductive isolation* (RI), that is, when they can no longer interbreed. A standard model of how *postzygotic* reproductive isolation arises is due to Dobzhansky, Muller, and Bateson (Bateson 1909; Dobzhansky 1936; Muller 1942), where so-called Dobzhansky–Muller incompatibilities (DMIs) arise due to epistatic interactions; for example, two geographically isolated lineages evolving allopatrically from a common ancestor *ab* can fix the allelic combinations *aB* and *Ab*, respectively, yet the hybrid genotype can be inviable due to the epistatic interactions between these two loci. In polygenic systems, where many loci code for an additive quantitative trait, a similar hybrid incompatibility arises; quadratic, or any nonlinear, selection induces epistasis such that divergent populations, under the action of drift, maintain different underlying allelic combinations at the many loci (Wright 1935a,b) for the same optimal trait value, which when combined in hybrids can lead to incompatibilities (Barton 1989). Although there are many examples of genes directly involved in reproductive isolation (Wu and Ting 2004), we still lack a theoretical understanding of the functional relationship between genes and their role in the development of hybrid incompatibilities and speciation dynamics. In this article, we examine an important example of such a functional relationship, the genotype–phenotype map of transcription factor–DNA binding. Using a simple biophysical model of transcription factor–DNA binding we analyze how incompatibilities can arise between allopatric lineages.

Despite many studies of the evolution of RI, very little attention has been paid to the role of population size; however, there is indirect and direct evidence that smaller populations develop incompatibilities more quickly. The observation of the large diversity of species on small young islands, such as Hawaii (Mayr 1970), or on the island of Cuba (Glor *et al.* 2004) and in the East African Great Lakes (Owen *et al.* 1990; Santos and Salzburger 2012), where in the latter two cases each one has been subject to historically fluctuating water levels and thus opportunities for allopatric speciation, suggests that smaller populations speciate more quickly. This is in contrast to lower levels of reproductive isolation observed in marine species with large ranges and population sizes, for example, the relatively small fraction of Pacific–Caribbean species pairs separated by the Isthmus of Panama a few million years ago compared to those that are not reproductively isolated (Mayr 1954, 1970; Rubinoff and Rubinoff 1971). There is also evidence that reproductive isolation arises more slowly in birds compared to mammals (Fitzpatrick 2004). Strikingly, even after ∼55 MY divergence (Cooper and Penny 1997), domestic chickens (*Gallus gallus*) can still hybridize with helmeted guineafowl (*Numida meleagris*), where estimates of the effective population size of domestic chickens range from to (Sawai *et al.* 2010), whereas in contrast, cichlids develop reproductive isolation as quickly as MY after divergence (Stelkens *et al.* 2010) and have relatively small population sizes [ (Oppen *et al.* 1997; Fiumera *et al.* 2000)]. This population size trend is further supported by net rates of diversification (Coyne and Orr 2004) inferred from phylogenetic trees (Barraclough and Nee 2001; Nee 2001). On the other hand, there are examples that buck this trend, such as *Drosophila*, which shows rapid speciation, for example, in adaptive radiations in Hawaii at large population size (Ayala *et al.* 1996).

Where does current theory stand in light of these observations? There are a number of theoretical models of allopatric speciation based on the Dobzhansky–Muller mechanism, which consider independent lineages evolving neutrally or under varying selection pressures on each lineage (Nei *et al.* 1983; Orr 1995; Orr and Orr 1996; Orr and Turelli 2001; Gavrilets 1999, 2003, 2004). Models that involve positive selection driving divergence are unlikely to be able to explain this dependence on population size, since larger populations respond more quickly to a given selection pressure (Gavrilets 2003). This leaves models of speciation where populations diverge neutrally or under stabilizing selection pressure; the models of Nei *et al.* (1983) and Gavrilets (1999) tackle precisely this question in the strong mutation regime ( where *n* is the number of nucleotides or base pairs for the loci of interest, the base-pair mutation rate, and *N* the population size) where the population is highly polymorphic. They find slower divergence in larger populations due to the lower reproductive success of members of the population who have diverged farther from the fitness maximum, resulting in a slower speciation rate. However, in neither of these models is there a dependence on population size in the weak mutation, nearly monomorphic regime, where Models of hybrid incompatibility that rely on fitness epistasis on quantitative traits (Wright 1935a,b) also predict that smaller populations should develop reproductive isolation more quickly, as drift helps populations shift between stable equilibria more rapidly (Barton 1989); but again by the polygenic nature of the population described in the model, we expect such a system to have evolutionary dynamics in the strong mutation regime.

A model that could give rise to more rapid RI for small populations is based on founder events or peak shifts, where small founder populations split and become isolated (Lande 1979, 1985; Barton and Charlesworth 1984; Barton and Rouhani 1987); the strength of drift is larger in small populations, allowing them to more easily pass through fitness valleys. A major problem with such models is that for isolation to occur on reasonable timescales the product of the fitness barrier and population size needs to be sufficiently small. However, this condition also means that gene flow is relatively unimpeded between peaks (Coyne and Orr 2004), destroying the reproductive isolation the model seeks to establish. Finally, the work of Orr and co-workers provided a framework to understand how incompatibilities might arise in allopatry through sequentially fixing mutations in the weak mutation regime () (Orr 1995; Orr and Turelli 2001); they showed that the number of potential or untested incompatibilities “snowballs” like for interactions between pairs of loci, where *K* is the number of substitutions separating the two lineages. However, the starting point of this model is the assumption of neutral, population size independent, divergence between lineages with a fixed probability that each untested combination is incompatible and so cannot address the question of the population size dependence.

A common theme of the above theories is that they are phenomenological with respect to the underlying genetic basis of incompatibilities. Johnson and Porter (2000, 2007) examined the evolution of decreased hybrid fitness for simple models of gene regulation, under positive and stabilizing selection, in the clonal interference regime (), but did not investigate the dependence on population size. More recently, they extended their work with sequence-based models of transcription factor (TF) binding similar to the model described here (Tulchinsky *et al.* 2014b), showing decreased hybrid fitness with decreasing population size; however, these results are again in the regime where the effect of mutations is not weak () and the dynamics of the growth of DMIs were not investigated in detail.

In summary, although the models of Gavrilets, Nei ,and Barton each predict a decreasing rate of developing RI with increasing population size when these models predict no dependence on population size, or are not applicable, in the weak mutation, nearly monomorphic regime where This is despite genetic studies that have shown that traits involved in species differences range from monogenic to mildly polygenic (Orr 2001). However, more recently, a theoretical framework was developed by the authors of this article for phenotypic evolution in the monomorphic regime () that accounts for a general mapping between genotype and phenotype (Khatri and Goldstein 2015); when applied to a toy model of transcription factor–DNA binding, it suggested that more rapid RI might arise for smaller populations, due to their having a larger drift load. In this work, we explore simulations of a more realistic sequence-based model of transcription factor–DNA binding, which overcomes limitations of the theoretical model.

Although any pair of interacting genes can result in a DMI, the interaction of genes that control expression has been shown to be a major factor driving differences between species (King and Wilson 1975; Wray 2007; Wittkopp *et al.* 2008; Wolf *et al.* 2010), suggesting a major role in speciation. In particular, compensatory changes at both and locations have been shown to be responsible for the misexpression of many genes in hybrids between *Drosophila melanogaster* and *D. simulans* (Landry *et al.* 2005), while there is more direct evidence in *Drosophila* of evolution of genes related to transcription factors driving speciation (Ting *et al.* 1998; Brideau *et al.* 2006). With the increasing use of genome-level studies (Seehausen *et al.* 2014) to characterize speciation, there is a need for theory and modeling to bridge the gap between sequence-level changes at coevolving loci and phenotypic determinants of incompatibilities; the binding of transcription factors to DNA to control gene expression is arguably one of the most important coevolving systems for organisms and so makes an ideal case study to examine the consequences to speciation of a simple biophysical model and a mechanistic insight on the way DMIs develop.

In this article, we examine how incompatibilities arise in allopatry for an abstract, yet biophysically motivated model of binding between two macromolecules, a protein TF binding to a specific DNA or TF binding site (TFBS). Our model is based on the “two-state” approximation (Von Hippel and Berg 1986; Gerland *et al.* 2002), which assumes the binding affinity is a sum of contributions of opposing amino acid nucleotide pairs, with each contribution being of only two types, “matched” or “mismatched.” This approximation, although not capturing the molecular interactions in atomistic detail, can represent many salient aspects that have been ignored in previous work on speciation theory. In particular, such a model allows us to include the effects of drift–selection balance in the weak mutation regime (), due to some phenotypes being coded by more sequences than others and the corresponding effect of population size on speciation dynamics. Recent work has shown that such mappings from genotype to phenotype give rise to a number of nontrivial effects (Force *et al.* 1999; Fontana 2002; Berg *et al.* 2004; Mustonen and Lässig 2005; Khatri *et al.* 2009; Goldstein 2011). Here, we find this simple genotype–phenotype map predicts an increasing rate of accumulating DMIs for decreasing population sizes in the weak mutation regime, the appropriate limit for monomorphically evolving traits, with a robust mechanism that does not require valley crossing by either of the divergent populations. This dependence on population size arises due to two separate mechanisms. First, at large population sizes, the overall substitution rate slows down as the number of nearly neutral mutations decreases, which is line with expectations from the nearly neutral theory (Ohta 1973, 1992). More significantly, the particular form of drift–selection balance that arises from the genotype–phenotype map results in sequence pairs that have a distribution of binding affinities peaked away from the optimal in smaller populations. As a result, less allopatric evolution is required before the hybrid organisms become inviable.

## Materials and Methods

### Quaternary model of transcription factor–DNA binding

Proteins bind DNA through a number of interactions, including electrostatic, van der Waals, and hydrogen bonding at the protein–DNA interface. We can split these interactions into a nonspecific part due mainly to the electrostatic interaction between positive protein side chains and the negative phosphate backbone and a specific part largely due to hydrogen bonding. It is these specific interactions that give rise to discrimination of TFs to different DNA sequences; a TF at its correct sequence binds through both nonspecific and specific interactions, while at a noncorrect site it binds only nonspecifically with an altered conformation that maximizes electrostatic interactions (Von Hippel and Berg 1986).

The two-state approximation (Von Hippel and Berg 1986; Gerland *et al.* 2002) for transcription factors binding at their correct binding sites assumes that amino acid nucleotide interactions are either optimal or nonoptimal and the contribution of each amino acid–nucleotide pair to the total binding energy is approximately additive. The rationale for this model is the underlying biophysics of protein–DNA interactions, in particular, the fact that an amino acid at a protein–DNA interface will tend to have a preferred nucleotide with which to hydrogen bond, taking account of the approximately fixed orientation of the amino acid as positioned by the rest of the protein. The other nucleotides tend to be nonoptimal and not able to hydrogen bond (Takeda *et al.* 1989). Although each optimal interaction is marginally stabilizing [ kcal/mol (Von Hippel and Berg 1986)], it is the nonoptimal nucleotides that dominate the binding free energy, since the hydrogen bond acceptors and donors in the DNA can neither hydrogen bond to an amino acid nor hydrogen bond to water molecules. This suggests a large cost for each nonoptimal interaction, although in reality the exact value is highly dependent on the particular protein and DNA sequence; empirically measured costs of free energy per amino acid nucleotide mismatch can range from 1–2 kcal/mol (2–3 ) (Takeda *et al.* 1989; Stormo and Fields 1998) to 4–5 kcal/mol (6–8 ) (Von Hippel and Berg 1986; Lesser *et al.* 1990; Baldwin 2003), where is Boltzmann’s constant and *T* is room temperature. This variation is likely explained by specific cooperative effects that include electrostatic, steric, and solvent interactions (Lesser *et al.* 1990; Baldwin 2003) that change the energy scale of binding dependent on a particular protein–DNA binding context. In this article, for simplicity, we assume a binding energy difference of each nonoptimal interaction compared to an optimal interaction of 1.8 kcal/mol

As mentioned, for each amino acid there tends to be a single nucleotide it prefers to hydrogen bond (Takeda *et al.* 1989). If we designate the category of amino acids by its preferred partnering base (*e.g.*, an amino acid in group would interact preferably with a thymine) and recognize that only changes of amino acid group affect the binding properties, we can use and to represent letters from the quaternary alphabet for both proteins and DNA sequences; for simplicity, this assumes that the amino acids are equally distributed among the four categories. In this way, the genome corresponding to this TF–TFBS pair consists of two “genes” of length in the standard four-letter alphabet of DNA. For simplicity, we consider the mutation rates between amino acid clusters in the protein and nucleotides in the DNA as approximately equal; since our model assumes amino acids and nucleotides are drawn from the same alphabet and as we see below, we treat protein and DNA sequences equally in determining binding affinity, we find in our results that the substitution rates of protein and DNA loci are equal. However, in nature, the rate of substitution between amino acid categories is different from that between DNA bases, increased by the triplet code and decreased by the clustering and other forms of selection acting on the protein, as well as by pleiotropic constraints. However, our model is reasonable, since we would expect the overall dynamics of divergence to be dominated by the loci with the slowest substitution rate and hence slowest effective mutation rate.

Assuming additivity of each amino acid–DNA interaction, the binding free energy will then be equal to a sum of free energies due to matches and mismatches. The number of mismatches is given by the Hamming distance where the function counts the number of positions where the protein sequence and DNA sequence are not the same. The number of matches is then giving a binding free energy, (1)where is the free energy of each match, which includes both specific and nonspecific interactions. If we choose our zero of energy to be the energy of the best binding sequence, then we can redefine the binding free energy to be (2)This binding free energy corresponds to the specifically bound mode of attachment (which has both specific and nonspecific contributions). In addition to this specific bound mode, an alternative configuration of protein and DNA exists where the interactions are purely electrostatic. The specific binding mode and this alternative nonspecifically bound mode are in thermodynamic competition. The free energy of binding in the electrostatic nonspecific mode is (3)where is the free energy per nucleotide in the nonspecific mode relative to the optimal binder. Thermodynamic studies of Lac repressor binding to DNA suggest that the difference in free energy between the best specific binding and the nonspecific mode of binding is ∼ so as for the Lac repressor, we find (Revzin and Von Hippel 1977; Von Hippel and Berg 1986).

### Modeling the evolution of reproductive isolation

The relationship between the binding energy of a TF to its binding site and the fitness of an organism is poorly understood and is likely very complicated and different for each TF–TFBS pair. There is competition between specific binding and nonspecific binding of the TF (purely electrostatic mode, discussed above). We would expect the fraction of time spent in the specific mode to reach a maximum when there are no mismatches, decreasing with increasing *r* until the TFBS cannot compete with the electrostatic nonspecific mode of binding. Genome-wide studies of TFs in *Escherichia coli* (Mustonen and Lässig 2005) and yeast (Mustonen *et al.* 2008; Haldane *et al.* 2014) found a distribution of binding energies for different TFs that deviated from the random/neutral expectation (Equation 6) for the highest-affinity binders. This deviation from the neutral distribution, which reflects selection for functional binding sites, has a form suggesting a Malthusian fitness landscape that is peaked at nearly optimal binding, decreasing with negative curvature as the binding strength is reduced. These factors suggest a simple model for the fitness landscape where the Malthusian fitness decreases quadratically with the specific binding energy (corresponding to a Gaussian Wrightian fitness function) until a critical number of mismatches is reached, corresponding to where nonspecific binding begins to dominate. Beyond this point we consider the organism inviable with a Malthusian fitness of negative infinity (Wrightian fitness of zero). In particular, this cutoff allows us to define DMIs as occurring when for hybrids between allopatric populations.

More formally, (4)where is the curvature of the fitness landscape and biologically, roughly corresponds to the strength of selection of this trait; as decreases the fitness landscape becomes more shallow, and so for a fixed effective population size the landscape becomes more neutral.

Combining with Equations 2 and 3 yields Note that as binding sites increase in length, the stability of the best binder () relative to nonspecific binding will increase in proportion to and hence a larger number of mismatches will be required before a binding site becomes nonfunctional. Specifically, for and (Von Hippel and Berg 1986), we find In the case of short DNA recognition sites for *Eco*RI endonuclease cleaving DNA, where it was found that (Lesser *et al.* 1990), which agrees well with our approximate relation between and We expect our qualitative results to be robust to the choice of such a threshold. Similarly, a more detailed consideration would include binding of the TF to other spurious sites in the genome with large sequence similarity; again we expect such consideration will change the value of but not change the scaling relation as longer binding sites will always have a larger maximum affinity.

To simulate the evolution of TF–TFBS sequence evolution we assume a diploid Wright–Fisher population genetic process with copies of each gene in the population with a fixed effective population size of where we have assumed equality with the actual population size *N*. As we are interested in the weak mutation regime (), the population is represented by a single fixed sequence for the TF–TFBS pair of loci at each time point, where all individuals are homozygous and mutations are either instantly fixed or eliminated. We use the Gillespie algorithm (Gillespie 1976) to simulate evolution as a continuous-time Markov process; at each step of the simulation the rates of fixation of all one-step mutations from the currently fixed alleles (wild type) on both TF and TFBS loci are calculated, and one of these mutations is selected randomly in proportion to the relative rate. Time is then progressed by where *K* is the sum of the rates of all one-step mutants and *u* is a random number drawn independently between 0 and 1, which ensures the times at which substitutions occur are Poisson distributed, as would be expected for a random substitution process. The rates are based upon the Kimura probability of fixation (Kimura 1962), (5)where is the change of fitness of a mutation at a particular location and is the rate at which mutations arise for each amino acid or nucleotide position in a diploid population; the latter approximation in Equation 5 assumes Note that although in the simulations we use the full form for the fixation probability, fitness effects are typically small () in the simulations, so the substitution rates depend only on the population-scaled fitness changes which, for a given mutation, are proportional to In the rest of this article we refer to the scaled population size to make it clear that reducing either or (or both) can change the evolutionary outcomes from those dominated by selection to those dominated by drift.

Using the above evolutionary process based on the biophysics of a TF binding DNA, we study allopatric speciation by independently evolving two lineages in the fitness landscape defined by Equation 4. We create an ancestral genome containing a protein and a DNA binding-site gene, each of length with drawn from the equilibrium distribution of binding energies (Equation 7). This ancestral genome is then duplicated, with each copy representing the start of a different isolated population that subsequently evolves independently. As the allopatric populations evolve, we consider the viability of hybrid offspring of the two lineages. If the evolving protein and DNA sequences in one lineage are and and the other ones are and we can at each time point calculate the Hamming distance for each hybrid as and with corresponding hybrid binding energies, and Using the same fitness function as in Equation 4, we can then evaluate the fitness of the hybrids as a function of time. An incompatibility arises whenever the fitness of the hybrid is [ or ], *i.e.*, when a hybrid TF–TFBS specific binding is weak compared to the nonspecific mode of binding and effectively can no longer recognize its target site. At this point, we assume that the two diverging populations can no longer form viable offspring, and they are reproductively isolated. This is a simplification as hybrid offspring will always be heterozygous at diverged loci, such that the TF–TFBS pair inherited from each parent will have functional binding, while cross-binding between parental pairs will be nonfunctional. Hence, not all postzygotic DMIs would be sufficiently deleterious to affect the viability of these heterozygotic offspring. We assume, however, that there are some TF–TFBS pairs that are sufficiently critical such that and loss of cross-binding is sufficient to decrease the gene expression level to the extent that the hybrid is inviable; these are the pairs that will be relevant for the speciation process and therefore are the ones addressed by our model. For each scaled population size and sequence length, 1000 replicates were run up to a time of allowing us to calculate the probability of the presence of a DMI as a function of divergence time. In addition, simulations were run up to a shorter time (dependent on the exact value of ) with replicates to get reliable estimates of the very small probability of a DMI (Figure 1) at early times.

### Data availability

## Results

### Rate of accumulation of hybrid incompatibilities

The probability of a DMI as a function of divergence time is plotted in Figure 1, for various values of for We see that the model predicts a very strong population size effect for the dynamics of hybrid incompatibilities; as the scaled population size decreases the timescale for DMIs to arise sharply decreases. This effect saturates for very small scaled population sizes, but diverges for very large scaled population sizes, to the point that reproductive isolation will take extremely long times for very large population sizes (). For small scaled population sizes the increase in DMIs is quadratic at small times (). For large scaled population sizes there is a delayed, but very rapid, increase in DMIs, which does not seem to fit a power law but rather has a negative curvature on a log–log scale. This is consistent with theoretical predictions of a coarse-grained model of TF–DNA binding evolution (Khatri and Goldstein 2015), where the growth of DMIs is rapid with the asymptotic form, as of This form arises when there are nearly neutral diffusive dynamics, as shown by the inset in Supporting Information, Figure S2, and when the common ancestor distribution is very narrow, as shown in Figure 2, in both cases for simulations at large scaled population sizes. We also performed simulations where the common ancestor sequence was drawn to always have the mean binding energy of the equilibrium distribution (Figure S1) and found the results to be nearly identical; this suggests that the power law behavior seen for small populations is not due to averaging over the common ancestor distribution, but as argued in the *Discussion* due to Poissonian distribution of times for substitutions.

The dependence of on population size arises from two effects, the first resulting from the dependence of equilibrium binding strengths on population size. Figure 2 shows the distribution of binding energies on each lineage for different scaled population sizes () for and The distributions are confined to the region where is the inviability boundary. For large scaled population sizes, we see that distributions are peaked near the optimal binding strength reflecting the efficacy of selection in large populations. However, as the scaled population size is decreased, we see the distribution of binding energies shifts to weaker affinity values (higher ), due to the stronger influence of genetic drift. At the smallest scaled population sizes, genetic drift dominates and the distribution of binding affinities becomes identical to the distribution obtained under neutral evolution (maintaining, however, the inviability boundary). At the level of sequences or genotypes, the neutral distribution is evenly distributed among all possible genotypes; each sequence has equal probability. However, the probability of a given value of is obtained by multiplying the probability of each sequence times the degeneracy, that is, the number of sequences corresponding to this As each sequence has the same probability, the neutral distribution is then simply proportional to the number of sequences that give or Hamming distance which is given by the binomial distribution (6)For example, the number of sequences that give is (for ), as there is exactly one DNA sequence that matches to each one of the protein sequences. This number is very small compared to the number of sequences at the inviability border that have five mismatches,

At intermediate population sizes we can quantify the interplay between selection and degeneracy through the concept of sequence entropy (Barton and Coe 2009; Khatri and Goldstein 2015), representing the (log) number of sequences encoding a given phenotypic state (*e.g.*, binding energy), which is closely related to the Boltzmann entropy from statistical mechanics (Reif 1965). This entropy measure should be distinguished from entropies of sequences due to polymorphisms in the population (in this article we have assumed populations are always monomorphic). The combination of fitness and sequence entropy that is maximized during evolution is the function termed the free fitness (Iwasa 1988; Sella and Hirsh 2005; Haldane *et al.* 2014; Khatri and Goldstein 2015), from which the probability density is given by (7)where *Z* is a normalization factor, known as the partition function, given by This probability density is plotted as solid lines in Figure 2 for different population sizes, using Equations 4, 6, and 7; we see that the agreement between the two is excellent.

The binding energy distributions show that for a general genotype–phenotype map fitness is not maximized, but instead there is a balance between selection for higher fitness and the tendency to undergo drift toward those phenotypes that correspond to the largest number of sequences. As the scaled population size decreases, the initial binding affinity of the common ancestor is on average smaller and so fewer substitutions are required between a pair of divergent lineages for an incompatibility to arise in a hybrid.

The second major factor affecting the rate of accumulation of DMIs is the slowing of the substitution rate with population size, as shown in Figure 3. The dependence we see can be explained by the average size of fitness effects as the scaled population size changes, where is a sum over terms formed by the product of the equilibrium probability and the total substitution rate for (the exact formula given in the legend of Figure 3 and plotted as the solid black line); at very large scaled population sizes the is peaked at and so the average substitution rate will be dominated by transitions between and Although is maximum for transitions from to happen rarely since it requires fixing a mutant with a population-scaled difference in fitness, which is negative and of magnitude >>1, when this means substitutions will occur significantly slower than neutral. Conversely, the reverse transition from to is also rare, despite the fixation probability being large, since the probability is small due to the same large population-scaled difference in fitness [This must be the case as in equilibrium for the probabilities not to change. This requirement is known in physics as “detailed balance.”]. This explains the slowdown of the accumulation of DMIs for large scaled population sizes observed in Figure 1. However, in very small populations, the inverse of the scaled population size is much larger than differences in fitness so we might expect substitutions to occur at the neutral rate (). In fact, we find that it is roughly half the neutral rate (); this is because for populations spend a large fraction of the time at the inviability boundary so the substitution rate is diminished compared to the expected neutral rate since a fraction of mutations at this boundary are inviable and are never accepted in the population.

Finally, we note that our results are robust with respect to changes in sequence length, showing qualitatively similar behavior for the dynamics of DMIs at different scaled population sizes, as shown in Figure 1. The effect of sequence length is explored in Supporting Information and in Figure 4, which examines the typical time required for RI to arise.

### Estimating the time to reproductive isolation

In a full genome, where there are many possible interacting genes, it will typically be the short-time behavior of each interacting pair that will dominate the development of RI for the whole organism. If we assume ∼ interaction partners per gene and protein-coding genes, we have ∼ interaction partners. As only a single one of these interactions giving rise to a DMI is required for RI, we for simplicity estimate the probability that RI has arisen is which at short times is given by In Figure 4 is plotted the time at which for We see the rate at which RI develops is strongly dependent on the scaled population size, with a weaker, but still significant dependence on the sequence length. In particular, we see for small scaled populations RI can arise quite quickly, on the timescale of generations, assuming There are different aspects of our model, which each cause an underestimate or an overestimate of the time for RI to arise. As discussed, only some fraction of traits will lead to a sufficient change in gene expression to cause an inviable organism, when cross-binding in heterozygotes is eliminated, and so this would cause an underestimate of the time. But on the other hand, particularly for small populations, where the common ancestor binding energy distribution is broad (Figure 2), there will be common ancestor gene pairs, whose binding affinity is closer to the inviability boundary, which would tend to dominate giving a that is shorter than our estimate. In addition, not all TF–TFBS pairs will necessarily have optimum fitness at optimum binding, which is likely to cause a reduction of the time to reproductive isolation, as the common ancestor distribution will be peaked closer to the inviability boundary, even in the limit of large populations; this again would mean an overestimation of As discussed above a major determinant at large scaled population sizes of the time for RI to develop is the rate of substitutions on each lineage, the inverse of which is plotted as a dashed line in Figure 4; we see that although the inverse substitution rate is a good predictor for large scaled population sizes, for small scaled populations it fails. This is due to the weaker equilibrium binding affinities at smaller scaled population sizes, which reduces further.

The time for RI to arise has a complicated dependence on sequence length, which is explored in detail in Supporting Information. Briefly, for small scaled population sizes (), RI develops more rapidly for longer sequences as the overall substitution or divergence rate is roughly proportional to yet the distance between the common ancestors and the inviability boundary does not vary appreciably with sequence length. Conversely, for large population sizes (), this trend is reversed and longer sequences develop RI more slowly. This is because, although there is the same dependence of divergence rate on the average distance of the common ancestor to the inviability boundary increases linearly with () due to longer binding sites giving more stable protein–DNA complexes. For large scaled population sizes, as demonstrated in Figure S2 of Supporting Information, the hybrid binding energies have neutral dynamics and so the typical time required to fix substitutions will vary quadratically with and thus quadratically on This quadratic dependence dominates the linear dependence of the divergence rate on resulting in an overall linear dependence of on The speciation times as a function of for are shown in the inset in Figure 4; the near-linear dependence lends support to the diffusive model for hybrid dynamics at large population sizes.

## Discussion

Dobzhansky, Muller, and Bateson (Bateson 1909; Dobzhansky 1936; Muller 1942) provided the first solution to Darwin’s conundrum of how speciation might arise by suggesting that in allopatry incompatibilities form between coevolving loci on an epistatic fitness landscape. Here, using a similar approach to that of Tulchinsky *et al.* (2014b), we have examined a biophysically motivated model of how incompatibilities arise in allopatric populations, and their population size dependence, using a simple model of the coevolution of transcription factors binding to DNA in the weak mutation, monomorphic regime. The model of TF–TFBS binding described here is inherently epistatic, despite the assumption that the contribution of each interacting amino acid-nucleotide pair is independent and additive to the total binding energy. Epistasis arises both from the nature of the binding interaction and from the resulting fitnesses. Considering the binding interaction, whether a given amino acid or nucleotide gives rise to a match or mismatch depends on the particular binding partner, so that the binding energy is a nonlinear function of the sequences at the TF and TFBS loci. It is this epistasis that is the source of the Dobzhansky–Muller incompatibilities that we find in our simulations described in *Results*. For example, as has been previously discussed (Johnson and Porter 2000; Tulchinsky *et al.* 2014b) the common ancestor might be fixed for a pair of sequences which has a binding energy of as there is only a single mismatch; after a period of divergence, two allopatric populations might be fixed for and each arising from just two substitutions, of compensatory effect, from the common ancestor sequence, so that as there is still only a single mismatch. However, the hybrid sequences are and which correspond to binding energies as they each have two mismatches. As the number of substitutions increases on each lineage, we can see that each lineage will maintain good fitness in a stabilizing landscape through compensatory changes, which each try to minimize the number of mismatches; however, each lineage fixes different sets of compensatory mutations, so when combined in a hybrid, the epistasis between pairs of sequences then gives rise to DMIs. The second cause of epistasis is the quadratic dependence of fitness on binding strength, as well as the discontinuity of the fitness function at Although there is a similarity between our model and typical polygenic models of quantitative traits, they are very different as for quantitative traits the phenotype is usually modeled as additive in each locus (Wright 1935a,b; Barton 1989), but with quadratic selection inducing epistasis between loci; in our model there is epistasis at the level of phenotype and the fitness of phenotypes.

A key aspect that this biophysical model of evolution introduces to the picture of fitness landscapes is the idea that many sequences can result in the same phenotype. In particular, the number of sequences corresponding to each phenotype can be very different, and this uneven distribution can have important consequences for the evolutionary process. As described, our results arise due to a drift–selection balance, which can be cast in the language of a balance between fitness and sequence entropy. The maximum of the free fitness landscape corresponds to the phenotype when these two evolutionary forces are balanced; importantly, this balance is dependent on the population size. Here, for TF–DNA binding there are many more sequences that have a large number of mismatches compared to those few high-fitness sequences that have a small number of mismatches; at smaller population sizes genetic drift dominates, pushing the equilibrium toward less fit sequences. This has an important consequence for the dynamics of reproductive isolation, that smaller scaled populations on average have common ancestors with a lower equilibrium affinity and so a smaller number of substitutions are needed for a hybrid incompatibility to arise. This leads to the main prediction of this article that smaller scaled populations () develop incompatibilities more quickly. Note that an evolutionary model that ignored this genotype–phenotype map could not reproduce Figure 2, but would have a common ancestor binding distribution peaked at the best binder for all population sizes, even though the strength of selection is reduced at small scaled population sizes. It should be stressed that the key parameter of interest is the scaled population size and so our results do not apply to just small populations, but in principle to TF–TFBS pairs in organisms of large absolute population size, but weak absolute selection, such that again across the genome there are likely many pairs of TF–TFBS, for which affording the possibility for rapid reproductive isolation to arise under stabilizing selection. For example, human studies suggest that ∼20% of mutations in amino acids are under weak selection, such that (Eyre-Walker *et al.* 2006), and so some fraction of these would be related to TF–TFBS interactions to which our model would apply. In general, the rate of reproductive isolation due to stabilizing selection will depend on the underlying distribution of fitness effects produced by new mutations in a given organism; if this distribution is assumed roughly fixed independent of the organism, then we would expect the proportion of TF–TFBS pairs that fall into the weak selection category to increase for smaller populations and the average rate of developing RI (per locus pair) will be higher compared to that in larger populations.

At larger scaled population sizes ( but still in the weak mutation regime, ), where fitness dominates drift we find this trend continues, but for a different reason; when populations no longer diverge neutrally and instead need to fix deleterious mutants whose difference in fitness is large compared to the inverse of the effective population size. This means that the time for reproductive isolation becomes very long for very large scaled populations. Overall, this picture is consistent with predictions of the nearly neutral theory, where large populations have a diminishing substitution rate (Ohta 1973, 1992; Lanfear *et al.* 2014). However, while there is evidence consistent with the nearly neutral theory from experimental studies (Wu and Li 1985; Ohta 1995; Johnson and Seger 2001; Weinreich 2001), they are not yet conclusive. In addition, there are theoretical models that predict no dependence on population size of the population-scaled fitness effects (Cherry 1998; Goldstein 2013), depending on the exact nature of the genotype–phenotype map. Again it should be noted that it is the effective scaled population size of the loci of interest that is key and so our model specifically predicts that TF–TFBS pairs in a genome under stabilizing selection and for which are unlikely to give rise to RI; however, this can in principle occur in large or small absolute populations, depending on the strength of selection on TF–TFBS pairs. Again, assuming a roughly fixed distribution of fitness effects, our results would suggest that for larger populations, the mechanism we describe under stabilizing selection would be relatively unimportant in contributing to RI.

As discussed in the Introduction there is some empirical evidence that smaller populations develop postzygotic isolation more quickly, although there have yet to be any systematic or definite studies. Our model then provides a rationale for these observations in the field with a robust mechanism that does not require that either lineage pass through a fitness valley. It also provides an insight, through a biophysical model, of the mechanistic causes of how DMIs develop for coevolved pairwise molecular interactions. While we would not expect quantitative agreement with biological systems, we can make a rough comparison to empirical data: our results suggest that reproductive isolation can occur on a timescale of the order of a few hundred thousand generations for small scaled population sizes. Direct studies of interspecific hybrids of African cichlids (Stelkens *et al.* 2010) show that postzygotic isolation typically arises over a timescale of MY, which corresponds to million generations, assuming a generation time of 3 years (Nagl *et al.* 1998), which suggests the mechanism we present is roughly consistent with empirical data. Importantly, we see that this mechanism, which poises small populations at the inviability boundary, can provide relatively rapid reproductive isolation between lineages with only nearly neutral evolution, without having to invoke valley crossing or peak shifts, or positive selection, which requires large populations.

Overall, our results suggest that stabilizing selection via the mechanism studied (and its analogs for more complicated gene regulatory systems) would have more importance at smaller population sizes and less at larger population sizes; this latter assertion is consistent with a number of speciation genes found to show evidence of positive selection, in many cases as a result of genomic conflict (Johnson 2010; Presgraves 2010), which are predominantly in *Drosophila*, which has a large effective population size and for which it is known that positive selection is quite pervasive (Andolfatto and Przeworski 2000; Macpherson *et al.* 2007). However, it is difficult as yet to draw strong general conclusions about the relative role of positive *vs.* stabilizing selection as a cause of DMIs, although this work highlights the relative role that different mechanisms might play at different population sizes and gives a quantitative theory that experimentalists can use to look for direct signatures of RI arising due to stabilizing selection.

The model studied, however, is simplified compared to the complexity of gene regulation in eukaryotes with multiple TFs binding to enhancers to control gene transcription and each TF having multiple binding sites controlling many different genes. Here, we treat TFs and their binding sites on an equal footing and so, for example, the substitution rate in each is the same. It is commonly thought that since TFs are under stronger pleiotropic constraints, they evolve more slowly and so much of the phenotypic divergence between species is driven by *cis*-regulatory change (King and Wilson 1975; Wittkopp *et al.* 2008) (and reviewed recently by Lynch and Wagner 2008). We expect that as pleiotropy will act to reduce the substitution rate on a TF, the divergence rate of allopatric lineages will decrease. This suggests that if pleiotropy is important, our simulations may underestimate the average time to reproductive isolation. However, a similar biophysical model (Tulchinsky *et al.* 2014a), albeit in the strong mutation regime, shows that for the case of a single TF binding two functional binding sites, despite the additional constraint, incompatibilities can arise at similar rates to those of a single TF–TFBS pair under stabilizing selection.

Previous theoretical work by Orr (1995; Orr and Turelli 2001) predicts that in the weak mutation regime, the number of incompatibilities should increase as from a fixed common ancestor, due to the combinatorial possibilities over a large number of pairwise interacting loci. Here, we predict the same growth of DMIs with time, but only for small scaled population sizes () and for a single two-locus system. However, the underlying mechanism appears to be very different here and not likely to be universal. Simulations with a fixed common ancestor rather than one drawn from the equilibrium distribution (Equation 7) are nearly identical (Figure S1 in Supporting Information); this suggests that the power law arises (here quadratic) mainly due to the close proximity of the common ancestor to the inviability boundary, requiring just a few substitutions in each lineage, and so the number of DMIs at short times is dominated by how likely a few substitutions are to arrive very quickly. This is given by a Poisson distribution, so if substitutions are needed on average for an incompatibility, then which for short times to leading order in This suggests that for a quaternary alphabet which is the minimum number of substitutions required for an incompatibility to arise, since a single substitution in one lineage will always give rise to the common ancestor and mutated genotype in the hybrids. On the other hand, for large populations, which have a peaked distribution of common ancestors relative to the Hamming distance to the inviability threshold we observe that the growth of DMIs does not appear to be described by a simple power law, but instead the results suggest there is a negative curvature to their growth on a log–log plot. In addition, we find that the variance of binding energies increases linearly with time in the limit of large populations (inset in Figure S2 in Supporting Information), so together with our results that indicate (inset in Figure 4), this suggests that the hybrid binding energies follow neutral diffusive dynamics for large scaled population sizes. This is as predicted by a simple calculation of the growth of DMIs due to a continuous diffusion model for the evolution of TF–DNA binding (Khatri and Goldstein 2015) and arises at large scaled population sizes due to the fact that from a fixed common ancestor there is a large mutational distance that needs to be diffused by hybrids before incompatibilities can arise. We suggest that more detailed studies of species divergence, similar to current works (Matute *et al.* 2010; Moyle and Nakazato 2010), which show a rapid increase in DMIs, should be able to discern between these two qualitatively different behaviors at different population sizes. In particular, recent cross-species ChiP-seq analysis of transcription factor binding (Schmidt *et al.* 2010) suggests a way to explicitly test our predictions at the level of actual binding affinities of hybrid TF–TFBS combinations for recently diverged species, such as in the *Drosophila* family.

The process of speciation underlies the vast diversity of life on Earth. Gene expression divergence is thought to underlie many differences between species (King and Wilson 1975; Wray 2007; Wolf *et al.* 2010), for example, in the Galapagos finches (Abzhanov *et al.* 2006), in the various species of *Drosophila* (Wittkopp *et al.* 2008), and with more direct evidence of a role in speciation through the evolution of genes related to transcription factors (Ting *et al.* 1998; Brideau *et al.* 2006). More recently studies of crosses between *D. melanogaster* and *D. santomea*, which diverged >10 million years ago, have revealed how the cryptic divergence of genetic architecture of conserved developmental body plans leads to postzygotic isolation (Gavin-Smyth and Matute 2013). Proteins binding to DNA to control gene expression are a prototypical coevolving system and critical for the proper development of organisms; thus these results have strong implications for speciation rates and diversity of populations at small population sizes. In addition, although our model is motivated by DNA–protein binding, the approach could be adapted to any type of interacting macromolecules, for example, coevolution of protein–protein interactions or the interaction of genes expressed by the nucleus and mitochondria, where in particular such interactions have been shown in yeast to give rise to cytonuclear incompatibilities (Chou and Leu 2010; Chou *et al.* 2010).

## Acknowledgments

We acknowledge useful discussions with David Pollock and comments from Adam Porter and the anonymous referees. R.A.G. was supported by the Medical Research Council under grant U117573805 and B.S.K. by The Francis Crick Institute, which receives its core funding from Cancer Research UK, the UK Medical Research Council, and the Wellcome Trust.

## Footnotes

*Communicating editor: B. A. Payseur*Supporting information is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.115.181685/-/DC1.

- Received August 5, 2015.
- Accepted September 23, 2015.

- Copyright © 2015 Khatri and Goldstein

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.