## Abstract

We survey the distribution of haplotypes at the self-incompatibility (SI) locus of *Arabidopsis lyrata* (Brassicaceae) at 12 locations spread over the species' natural distribution in Iceland. Previous investigations of the system have identified 34 functionally different S haplotypes maintained by frequency-dependent selection and arranged them into four classes of dominance in their phenotypic expression. On the basis of this model of dominance and the island model of population subdivision, we compare the distribution of S haplotypes with that expected from population genetic theory. We observe 18 different S haplotypes, recessive haplotypes being more common than dominant ones, and dominant ones being shared by fewer populations. As expected, differentiation, although significant, is very low at the S locus even over distances of up to 300 km. The frequency of the most recessive haplotype is slightly larger than expected for a panmictic population, but consistent with a subdivided population with the observed differentiation. Frequencies in nature reflect effects of segregation distortion previously observed in controlled crosses. The dynamics of the S-locus variation are, however, well represented by a 12-island model and our simplified model of dominance interactions.

MOST homomorphic self-incompatibility (SI) systems are determined by a single multi-allelic locus (the S locus), composed of at least two genes. In the sporophytic self-incompatibility (SSI) system of Brassicaceae, the S-receptor kinase (SRK) is responsible for the determination of female specificity, and the S-cysteine-rich (SCR) protein is responsible for the determination of male specificity (Schopfer *et al*. 1999; Takasaki *et al*. 2000). In a panmictic population with gametophytic SI the dynamics were modeled quite accurately by Sewall Wright (1939). Gametophytic self-incompatibility systems are simple, in that incompatibility is determined by matching of a haploid pollen with a diploid stigma expressing both its haplotypes codominantly. In a panmictic population all haplotypes are expected to have a frequency drawn from the same unimodal distribution. Sporophytic self-incompatibility systems are found within several families, in particular Brassicaceae (Bateman 1954). They are more complex to model because pollen phenotype derives from the genotype of the pollen parent, and dominance may occur in both stigma and pollen (Bateman 1952). In simple sex-symmetric dominance models, where selection on the male and female sides is equally strong, phenotypes are expected to have the same mean frequency (although with different distributions) leading to a higher expected frequency of relatively recessive haplotypes because they display their phenotype less often (Bateman 1952; Cope 1962; Sampson 1967; Imrie *et al*. 1972; Charlesworth 1988; Schierup *et al*. 1997; Charlesworth *et al*. 2000). In cases of more complicated dominance relations among the haplotypes, the homogeneity of phenotype frequencies fails (Schierup *et al*. 2006). Recessive haplotypes, however, still show relatively higher frequencies, as found in surveys of sporophytic systems (Mehlenbacher 1997; Schierup *et al*. 2006).

Population structure is common in plant species, stimulating theoretical and empirical studies on the effect of population structure on the distribution of systems under balancing selection. These theoretical studies show that the strong frequency-dependent selection causes very different dynamics for SI haplotypes as compared to a neutral gene (Schierup 1998b; Schierup *et al*. 2000a,b; Muirhead 2001). Self-incompatibility causes haplotypes at very low frequencies to show high fitness compared to those close to their equilibrium frequency. Invasion of a haplotype absent in a subpopulation therefore has a high probability of success. In particular, it is higher than that of neutral alleles, leading to the prediction that population differentiation is lower at the S locus than at neutral loci. The number of S haplotypes in a subdivided population is not expected to be an increasing function of the degree of isolation as for neutral genes. Actually, for most reasonable migration rates a subdivided population maintains fewer haplotypes than a panmictic population with the same number of individuals (Schierup 1998b; Muirhead 2001). These conclusions apply to both gametophytic and sporophytic SI. The study by Glemin *et al*. (2005), using an antibody assay in *Brassica insularis*, is consistent with these predictions. Furthermore, population structure is expected to affect haplotypes at different dominance levels differently. Assuming a sex-symmetric model, Schierup *et al*. (2000b) found that more recessive haplotypes have higher average frequencies in a subdivided population than in a panmictic one, mainly because they are present in a larger proportion of subpopulations. More dominant haplotypes were found to be lost locally more often and therefore have a higher effective migration rate and corresponding shorter coalescence time for copies of the same haplotype.

In studies of the SI system of *Arabidopsis lyrata* (Brassicaceae) we have previously characterized 34 different SRK haplotypes of the SRK ortholog (Charlesworth *et al*. 2000; Bechsgaard *et al*. 2006) and developed a PCR-based typing system for identification of S haplotypes at the SRK gene (Mable *et al*. 2003, 2005; Schierup *et al*. 2006). The dominance relations among the haplotypes are relatively well known (Prigoda *et al*. 2005). This typing system was used to study the mating patterns within a single Icelandic population (Schierup *et al*. 2006). Our next step is to study the geographical variation in the SI system, and here we report the frequencies of the S haplotypes in 12 populations across Iceland. In a subdivided population we are able to develop predictions on equilibrium frequencies and the dynamics of haplotypes as a function of their dominance level. These are compared to observations in nature on the basis of the number of haplotypes observed either locally or in the global population. We conclude that the dynamics of the system are well captured by our dominance model and the ensuing intrinsic balancing selection, in particular if we allow for limited differentiation among the populations.

## MATERIALS AND METHODS

*A. lyrata* is one of the most common plant species in Iceland. It grows mainly on lava fields, disturbed habitat, along roads, and throughout the highlands. Leaf material and seeds from 12 different locations were collected during a field trip in August 2001. The geographical positions of the sampled populations are shown in Figure 1 with more details in Table 1. At each sampling site, 30–100 individuals were sampled.

S haplotypes were determined for 30–50 individuals per population using the same protocol as in Schierup *et al*. (2006). The overall strategy is first to use two sets of general primers (13SEQ1F and 13SEQ2F forward with SLGR reverse) and then to restriction digest the resulting PCR fragments using two four-cutter enzymes, *Alu*I and *Msp*I. The resulting electrophoretic banding pattern suggests the haplotypes. These are subsequently confirmed with haplotype-specific primer sets. To identify the missing haplotypes, we subjected the individuals with only one initially identified haplotype to further analyses using two general forward primers (13F1 and 13SEQ3F). If this did not result in the discovery of a second haplotype, we tested primers specific for haplotypes known to not give very strong signals with general primers (haplotype ALSRK36, ALSRK37, and ALSRK38, with the primer sets ALSRK36F1–ALSRK36R1, ALSRK37F1–ALSRK37R1, and Aly13-38–38-R1, respectively). If this failed to identify a second haplotype, we consider the individual to be homozygote for its haplotype. All haplotypes except for ALSRK38 have previously been reported and linkage to the S locus established for all but ALSRK38 and ALSRK27 (Schierup *et al*. 2001; Mable *et al*. 2003, 2005; Bechsgaard *et al*. 2006; Schierup *et al*. 2006). The sequence of ALSRK38 has been submitted to GenBank (accession no. EU165341). The sequences of all specific primer sets can be found in supplemental Table S1.

#### Statistical analysis:

A phylogenetic tree of the S haplotypes was reconstructed on the basis of 650 bp of the S domain sequenced in all the haplotypes identified in this study using Aly3 and Aly7 as outgroups (Charlesworth *et al*. 2003). The minimum-evolution criterion with the HKY substitution model was used and analysis was performed in Mega 3 (Kumar *et al*. 2004). The phylogeny was used to infer dominance class membership of haplotypes where this has not been established by crosses (Prigoda *et al*. 2005), in particular for haplotypes ALSRK27 and ALSRK38.

*F*-statistics and the Mantel test for isolation by distance (correlation between matrices of *F*_{ST} values and geographical distances) were calculated using Fstat version 2.9.3.2 (Goudet 1995).

Mate availability within each population was estimated by calculating the mate availability for individual plants using the current model of dominance relationship between the haplotypes in stigma and pollen (Schierup *et al*. 2006). The mate availability of an individual is the proportion of the population that it can fertilize, thus we assume that selection occurs through male function only, *i.e*., there is no variation in female fecundity (Vekemans *et al*. 1998). This assumption reflects the fact that full seed set is typically observed in the field (Schierup *et al*. 2006).

The mean and variance in frequency of haplotypes across populations were quantified, and the homogeneity of the variance for different haplotypes was analyzed.

#### Numerical evaluations of haplotype frequencies:

The dynamics of a sporophytic SI system with variable number of haplotypes at four dominance levels were investigated through iterations of a deterministic model and through stochastic computer simulations. Dominance was assumed complete between dominance levels in both sexes, with codominance within dominance levels. The model is thus a combination of the SSIcod and SSIdom models of Schierup *et al*. (1997), and it therefore assumes random mating, male sexual selection due to SI, and no selection on the female side. Investigations including fecundity selection, (Vekemans *et al*. 1998) gave very similar results, but they are not reported.

The expected equilibrium frequency of haplotypes in a population was determined as in Schierup *et al*. (2006). The deterministic recurrence equations were iterated assuming symmetry in genotypic frequencies with respect to the haplotypes within any of the dominance levels and can be considered a special case of the framework recently published by Billiard *et al*. (2007). When the genotypic frequencies of the haplotypes within a particular dominance level are equal at the start of the iteration, they remain so throughout the iterations. Two types of iterations were run: one where each subpopulation was run separately using the observed number of haplotypes for each dominance level in that subpopulation; the other investigated the total population using the total number of haplotypes observed in the entire population. Stability of equilibria was checked by perturbations followed by iterations omitting the symmetry assumption. The convergence to equilibrium is rather fast: starting with equal frequencies of the 18 haplotypes corresponding to the observed haplotypes, the frequency of the recessive haplotype is 0.199 after 100 generations, 0.300 after 200, 0.314 after 300, 0.315 after 400, and correct to six decimal places (0.314565) after 600 generations. The Mathematica notebook iterating the recurrence equations was written by F. B. Christiansen and can be downloaded at http://www.daimi.au.dk/∼mheide/.

The stochastic simulations are based on individual plants fertilized from a local pollen pool formed from equal contributions from all local plants and modified by immigration. A 12-island model of population structure was implemented with migration at the same rate between all islands as in Wright's island model. Only pollen were allowed to migrate, modifying the local pollen pool by replacing a given fraction by an equal mixture of pollen from the population. Different combinations of population sizes and number of subpopulations were tested, keeping the total population size between 1200 and 2400. All subpopulations were assumed to be of the same size—sizes of 100 or 200 were used in accord with the population structure observed in previous studies (Schierup 1998a; Schierup *et al*. 2006). A Poisson-distributed random number of already existing haplotypes was introduced into the population each generation to avoid loss of haplotypes. The probability of introduction of a given haplotype was proportional to its expected frequency under the deterministic equilibrium, and the process can be thought of as immigrants from a larger population, *e.g*., the remaining part of Iceland. The rate of introduction was set to 0.5%, which turned out to be sufficient to avoid loss of haplotypes in the simulations. Simulations were initiated with a set of genotypes drawn at random from the equilibrium frequency distribution determined by the deterministic iterations, and they were continued until a selection–drift–mutation steady state was attained (at least 5000 generations). At this time a sample of 30 individuals was picked from each population, and frequencies of haplotypes were calculated as well as *F*_{ST} for the set of sampled populations. The sample size was chosen to mimic the sampling strategy used in the empirical survey. The migration rate was adjusted by trial and error for each set of parameters until *F*_{ST} values were obtained similar to the observed *F*_{ST} = 2%. The local immigration fraction was thus set at 0.12 when the population size is 100 and at 0.08 with 200 individuals in the local population (other simulations with different numbers of individuals in the populations but the level of differentiation at 2% gave comparable results). The structure of the natural population is presumably not reflected by this simple model, but its genetic structure is well approximated. Of course, the correspondence between the simulated structure and that of the collection of sampled natural populations is not of a quality that allows the simulation results to be the foundation upon which formulation of quantitative hypotheses suited for statistical analyses may be built. The simulation results are thus qualitative guidelines for the interpretation of patterns in natural observations.

To probe the effect of yet finer population structure we simulated a two-dimensional finite stepping stone model with immigration at equal rates from the four neighboring subpopulations. The 225 subpopulations of size 10 are arranged on a torus to avoid edge effects. Total immigration was set at 0.3 in each subpopulation, and that rate yielded *F*_{ST} values in the range of 2–4%. Otherwise, details of the simulation were as in the 12-island model. When evaluating population differentiation, the torus was partitioned into 25 areas of 90 individuals by merging 9 adjacent subpopulations (a 3 × 3 section). *F*_{ST} was then calculated between these 25 areas. The simulation programs were written by M. H. Schierup in Python. The source code with limited documentation is also available through www.daimi.au.dk/∼mheide/.

## RESULTS

Eighteen of 34 different S haplotypes known in the species were identified (Table 2), and these fall into four different dominance classes, with 1 recessive haplotype, 3 of the least dominant haplotypes, 3 of the dominant haplotypes, and 11 of the most dominant haplotypes (Figure 2). Throughout this study we assume an absolute dominance relation among these four classes with codominance within, and these dominance relations are equal in the sexes.

Two haplotypes were identified in 271/405 = 67% of individuals, and only one haplotype was observed in the remaining 33% of individuals. All individuals showed at least one haplotype. Single-haplotype individuals are assumed to be homozygotes for this haplotype, even though we cannot rule out that they are heterozygotes for an unidentified haplotype (see an extended discussion of this below).

Observed frequencies of haplotypes in each population are shown in Figure 3. More dominant haplotypes are generally less common, and the most recessive haplotype is the most common in all populations. Since haplotypes from the same dominance level are expected to have the same frequencies, we may consider the total frequency of haplotypes at each level. These are in Figure 4 compared to deterministic calculations that predict the total frequency of haplotypes at each dominance level in each population, on the assumption that we have sampled all the haplotypes present in each population.

#### Population structure and haplotype frequency dynamics:

Average differentiation of haplotype frequencies among populations as measured by *F*_{ST} is only 2%, but highly significant (*P* < 0.001). We investigated the nature of this differentiation in more detail. Table 3 shows an analysis of the homogeneity of the dominance-class counts among populations, concluding clear evidence of population structure most likely due to variation in all class frequencies. Figure 5 shows *F*_{ST}'s between pairs of populations as a function of distance between the two populations, indicating no correlation between *F*_{ST} and distance (Mantel test, *P* > 0.6). This lack of isolation by distance suggests the use of Wright's 12-island model in simulations to elucidate the differentiation among sampled populations. Absence of some of the haplotypes from local populations was also seen in the simulations (Table 4), so the heterogeneous pattern seen within dominance classes in Figure 3 need not be ascribed to limited sample sizes. This is true for the most dominant class, in particular. Comparing observations to expectations in a panmictic population (Table 5), the most striking deviation is that the observed frequency of the most recessive haplotype is higher than predicted. Computer simulations of a population subdivided into 12 subpopulations with a similar level of geographical differentiation produce results comparable to those obtained, when treating the 12 populations as isolated populations with different numbers of haplotypes. Comparing the observed haplotype frequencies to those obtained from simulations of local populations, where only the locally observed haplotypes are present, the most notable change is that the observed frequencies of the most recessive class are closer to those expected—the differences are small, though. Inclusion of the observed level of genetic population structure in the theoretical predictions therefore improves the correspondence between simulations and observations.

Simulations were also used to evaluate whether the large fraction (33%) of putative homozygotes can be explained by very fine-scaled population structure. If no substructure exists within populations, and all populations are weighted equally, one would expect 13.8% homozygotes from deterministic calculations. Simulations using the island model of migration yielding *F*_{ST} = 0.02 produce homozygote frequencies between 18 and 25% (range over 50 replicates). The effect of finer local population structure was investigated by simulating a two-dimensional stepping stone model (see materials and methods). For comparable levels of geographical differentiation the simulations yielded the frequency of homozygotes in the range of 25–30% (observed in 50 replicates). Thus, substructure within local populations has the potential of significantly increasing homozygote frequencies through biparental inbreeding. The values found in this simulation are indeed closer to the observed value, but further investigations are needed to resolve to what extent biparental inbreeding is the main cause of the observed excess of total homozygote frequency.

The 134 individuals observed with a single haplotype (and assumed to be homozygotes) are unequally distributed over dominance classes, with 81 homozygous for the recessive, 27 for the least dominant, 10 for the dominant, and 16 for the most dominant class (supplemental data, Table S2).

#### Mate availability:

Figure 6 shows the populationwide frequencies of haplotypes as well as the observed average mate availability that each haplotype experiences in the total population. The very common recessive haplotype has the lowest mate availability. In a local population the mate availability decreases with the frequency of the haplotype. For the least dominant and dominant haplotypes this is reflected as a decrease in the populationwide average mate availability with the average frequency of the haplotype. For the most dominant haplotypes the observed relation between their frequencies and these average mate availabilities is far from a perfect reflection of the expectation in the local populations (Figure 6). This deviation exposes the discrepancy between the genotypic frequencies expected from observed average haplotype frequencies and observed average genotype frequencies akin to those described as the Wahlund effect.

Mate availability in the 12 populations ranges between 82 and 92% with an average of 88% chance that two individuals from the same population can mate. Compared to the expected mate availabilities calculated from the observed haplotype frequencies, all observed mate availabilities are low (Figure 7). Two individuals picked at random from the total population are compatible with a probability of 91%, again consistent with slight population differentiation. Different haplotypes have different mate availabilities (Figure 6), and the more dominant haplotypes generally have higher mate availabilities.

## DISCUSSION

We report the most comprehensive typing of S haplotypes in a subdivided natural population to date with the aim of testing whether the spatial distribution is concordant with our simplified models of dominance of S haplotypes and population structure. We found that our model of the self-incompatibility system in *A. lyrata* describes the dynamics of the system well. The model assumption of a strict dominance hierarchy is a simplification, however. Instances of reciprocal differences in crosses and dominance within dominance classes have been reported in *A. lyrata*, depending on the genomic background (Mable *et al*. 2004; Prigoda *et al*. 2005). Our deterministic calculations of expected frequencies of haplotypes as a function of their dominance level fit the observed patterns in the population individually. Thus, at the local level, the dynamic equilibrium of the dominance class frequencies appears to be reached quickly in accordance with the strong frequency-dependent selection. In this sense the dynamics of S haplotypes in the local population are important for the dynamics of the SSI polymorphism in Iceland. However, frequencies of individual haplotypes within a class vary considerably due to random genetic drift within the local population and sampling variance due to local heterogeneity (Schierup *et al*. 2006). Low but significant population structure makes the deterministic theory of a panmictic population insufficient for describing haplotype frequencies in the entire population. We thus resorted to computer simulations to illuminate the effects of the observed level of differentiation. Local absence of some of the most dominant haplotypes was seen in both natural and simulated populations. The higher expected effective migration rate of haplotypes with increasing dominance is therefore offset by their increased risk of local extinction by drift—in concordance with earlier results based on the simpler sex-symmetric SSIdom model with only one haplotype at each dominance level (Schierup *et al*. 2000b). The observed frequencies of haplotypes at the various dominance levels are accurately predicted by simulations. More dominant haplotypes are expected to have lower frequencies and higher mate availabilities in a subdivided population, as observed in our study.

We observe an excess of homozygotes over the theoretical expectation, assuming that all samples were taken from the same population. Part of this excess is due to local fluctuation in haplotype frequencies due to population structure. An island model of population structure, however, only predicts ∼20% homozygotes, but 33% of the sampled individuals show only a single haplotype. The remaining excess of homozygotes can be ascribed to an unknown combination of more fine-scaled population structure within sampling sites, inability to detect certain (unknown) haplotypes, incompletely known dominance relations, and a limited amount of selfing. Simulation of the first situation produced up to 30% homozygotes for reasonable parameter values. The individuals sampled in the 12 populations were typically taken from an area of 100 × 100 m. This is a considerably larger scale than the 10 × 15-m area, where we previously detected fine-scale genetic clustering of S haplotypes in a different Icelandic population (Schierup *et al*. 2006).

Observation of homozygotes for haplotypes in the most dominant class cannot be explained by population structure but suggests one of three possibilities: (1) haplotypes exist that our typing does not recognize, (2) dominance exists within the most dominant class, or (3) selfing is possible. We are not able to distinguish between these alternatives with the present data. We believe that our methods detect all haplotypes that have previously been characterized in this species (34 in all), but we also find it plausible that haplotypes exist that cannot be detected at present. Bechsgaard *et al*. (2004) found evidence that at least one haplotype is not detected, and Mable *et al*. (2003, 2004) found evidence for dominance within dominance classes. In addition, selfing has evolved repeatedly in *A. lyrata* in nature (Mable *et al*. 2005; Prigoda *et al*. 2005), and we found selfing in the greenhouse under stressful conditions (M. H. Schierup and J. S. Bechsgaard, unpublished observation). Our results suggest that putative undetected haplotypes are relatively rare and therefore most likely belong to the most dominant class. Undetected haplotypes are therefore unlikely to produce large quantitative effects on our results. Incomplete knowledge of dominance relations and their occasional breakdown can of course produce an excess of homozygotes, in particular if selfing is possible.

Bechsgaard *et al*. (2004) found that the haplotypes ALSRK01 and ALSRK09 have segregation advantages in certain full-sib crosses. These two haplotypes have the smallest average mate availabilities in our study. It is thus tempting to speculate that they indeed possess a segregation advantage in natural populations and that this increases their frequencies to a level, where their mate availabilities decrease their male sexual fitness until their segregation advantage is balanced. However, our results show that segregation distortion in general is not sufficiently common to disturb the overall dynamics of the system away from that predicted by our simple model of dominance.

The lack of isolation by distance was not anticipated since *A. lyrata* continuously cover large parts of Iceland. Previous studies in Iceland have shown that pollen and seed dispersal in established populations is very limited (Schierup *et al*. 2006). Isozyme studies have observed *F*_{ST} > 0.1 among populations separated by <40 km, evidencing isolation by distance at this scale (Schierup 1998a; Bechsgaard *et al*. 2004; M. H. Schierup and J. S. Bechsgaard, unpublished results). *F*_{ST} values are, however, not directly comparable for isozymes and highly polymorphic genes under balancing selection. We can see two possible reasons for the low level of isolation by distance at the S locus. First, the overall level of differentiation of S haplotypes is low due to their efficient migration, causing isolation by distance undetectable on the investigated geographical scale. Second, the colonization of Iceland is very recent, although *A. lyrata*, like many other species of plants, might have survived the Weichselian glaciation at nunatak refugia in northern Iceland (Rundgren and Ingolfsson 1999). The retreat of glaciers began ∼10,000 years ago, and a significant fraction of the land is still glaciated (Figure 1). The present distribution is likely still to reflect a colonization process where the plants follow the receding rim of the ice. Typing of neutral molecular markers and S haplotypes in the same individuals is needed to distinguish effects of these scenarios.

## Acknowledgments

We thank Deborah Charlesworth, Barbara Mable, Xavier Vekemans, and Vincent Castric for sharing unpublished material and for many valuable discussions. Camilla Håkansson is thanked for help in the lab. The study was supported by grants from the Carlsberg Foundation, the Danish Natural Sciences Research Council, and the Danish Veterinary and Agricultural Research Council to M.H.S.

## Footnotes

Communicating editor: M. K. Uyenoyama

- Received March 4, 2008.
- Accepted August 4, 2008.

- Copyright © 2008 by the Genetics Society of America