## Abstract

Estimating seed and pollen gene flow in plants on the basis of samples of naturally regenerated seedlings can provide much needed information about “realized gene flow,” but seems to be one of the greatest challenges in plant population biology. Traditional parentage methods, because of their inability to discriminate between male and female parentage of seedlings, unless supported by uniparentally inherited markers, are not capable of precisely describing seed and pollen aspects of gene flow realized in seedlings. Here, we describe a maximum-likelihood method for modeling female and male parentage in a local plant population on the basis of genotypic data from naturally established seedlings and when the location and genotypes of all potential parents within the population are known. The method models female and male reproductive success of individuals as a function of factors likely to influence reproductive success (*e.g*., distance of seed dispersal, distance between mates, and relative fecundity–*i.e*., female and male selection gradients). The method is designed to account for levels of seed and pollen gene flow into the local population from unsampled adults; therefore, it is well suited to isolated, but also wide-spread natural populations, where extensive seed and pollen dispersal complicates traditional parentage analyses. Computer simulations were performed to evaluate the utility and robustness of the model and estimation procedure and to assess how the exclusion power of genetic markers (isozymes or microsatellites) affects the accuracy of the parameter estimation. In addition, the method was applied to genotypic data collected in Scots pine (isozymes) and oak (microsatellites) populations to obtain preliminary estimates of long-distance seed and pollen gene flow and the patterns of local seed and pollen dispersal in these species.

ONE of the major challenges in plant population biology is to describe the relationship between population genetic structure and reproductive patterns of a species. This objective has often been addressed through parentage analysis using neutral genetic markers (Devlin and Ellstrand 1990; Meagher 1991), and recent availability of highly variable genetic markers, such as microsatellites, has greatly increased the efficiency of parentage assignment (Dow and Ashley 1996; Streiff *et al.* 1999; Gonzalez-Martinez *et al.* 2002). Parentage inference has now advanced to the point where demonstrating variation in reproductive success among adult plants is no longer sufficient by itself. It is more interesting to identify causes of differential male and female reproductive success. In theory, reproductive patterns are best inferred when parents can be unambiguously assigned through parentage analysis. If, however, genetic discrimination is not complete and/or errors occur in parentage assignment, estimates of parameters describing reproductive patterns in a population may be seriously biased (Roeder *et al.* 1989). Under these conditions, the population-wide parameters describing a reproductive system (*e.g.*, selection gradients, see Morgan and Conner 2001; Burczyk *et al.* 2002) are best estimated by applying probability models that account for the genetic composition of offspring in actual progeny arrays and inferring the appropriate model parameters. This method has been used effectively to study male mating success (Adams and Birkes 1991; Burczyk *et al.* 1996, 2002; Smouse *et al.* 1999; Morgan and Conner 2001) and female reproductive success (Schnabel *et al.* 1998) in plant populations.

Estimation of reproductive patterns in plants is further complicated because gene flow (*i.e*., immigration of seed and/or pollen gametes) from uncensored parents residing outside the study population often contributes to offspring (Ennos 1994; Hamrick and Nason 2000; Morgan and Conner 2001). Methods of simultaneously estimating paternity due to pollen gene flow and selection gradients describing differential fertility of males within populations, by applying mating models to the genetic composition of progeny arrays (seeds) sampled from individual mother plants, have been described and illustrated (Adams and Birkes 1991; Burczyk *et al.* 1996, 2002; see also Smouse *et al.* 1999; Morgan and Conner 2001; Wright and Meagher 2004). These procedures can be used only to estimate male components of reproduction at the seed stage and when seeds are sampled from individual, known mother plants. Similar methods, however, can be applied to describe gene flow by seeds and female selection gradients (or effective seed dispersal) within populations (Adams 1992; Dow and Ashley 1996; Schnabel *et al.* 1998; Godoy and Jordano 2001; Gonzalez-Martinez *et al.* 2002).

It might be expected that realized gene flow by pollen or seeds (Meagher and Thompson 1987) and selection gradients (female or male fertility) observed in naturally regenerated seedlings may differ substantially from these parameters measured at the seed stage (Dyer and Sork 2001). Several factors such as seed dispersal mechanisms, seed predation, and selection during germination and establishment may influence the genetic composition of naturally regenerated seedlings. For example, if immigrating pollen comes from populations adapted to substantially different environmental conditions, the resulting offspring may not be as fit as offspring derived from local matings. Therefore, revealing patterns and determinants of gene dispersal and reproductive success is fundamental to understanding the genetic aspects of natural regeneration in plant populations (Meagher and Thompson 1987). However, because of the high amount of genetic exclusion power required (Marshall *et al.* 1998), there have been only a few attempts to study parentage of naturally established seedlings (Meagher and Thompson 1987; Dow and Ashley 1996; Schnabel *et al.* 1998; Konuma *et al.* 2000; Kameyama *et al.* 2001; Gonzalez-Martinez *et al.* 2002; Isagi and Kanazashi 2002; Jones 2003; Shimatani 2004). Nevertheless, these studies were unable to describe fully the reproductive patterns that led to the establishment of the seedling populations.

In this article we present the *seedling neighborhood model* (Burczyk *et al.* 2004), a novel probability model that makes it possible to describe various reproductive factors (*i.e.*, gene flow and selection gradients) influencing the genetic composition and genealogy of naturally regenerating seedling cohorts. We investigate the statistical properties of the model through simulations, focusing on statistical resolution provided by sets of genetic loci of different exclusion probabilities. Finally, we apply the model to preliminary data sets from Scots pine (*Pinus sylvestris* L.) and oak (Quercus sp.) populations. We also discuss potential applications of the model for investigating reproductive patterns in plant populations.

## METHODS

#### Seedling neighborhood model:

This probability mating model is fashioned after our earlier neighborhood models developed to describe pollen gene flow and mating patterns within populations (Adams and Birkes 1989, 1991; Burczyk *et al.* 2002). It requires the mapped locations of all sampled seedlings and all potential reproductive adult males and females within a local population, the multilocus genotypes of all seedlings and adults, and allele frequencies of the same species in surrounding (background) populations. The model allows for simultaneous estimation of seed and pollen immigration levels, along with female and male reproductive success parameters (*i.e.*, relative fertilities and seed and pollen dispersal). For bisexual plants it also accounts for the proportion of selfing among the fraction of seedlings originating from local mothers. We assume in this article that parentage of seedlings is of primary interest, but the procedure applies equally well to seeds (embryos) sampled on the ground or in seed traps.

The general idea of the model is outlined in Figure 1. The model assumes that a seedling is mothered either (i) by an unknown female located outside an arbitrarily defined circular area around the seedling (*i.e.*, the seedling's neighborhood, hence the name of the model) due to seed immigration (with probability *m*_{s}) or (ii) by a specific local female growing within the seedling's neighborhood (with probability 1 − *m*_{s}). For each seedling with a local mother it is assumed that the paternal gamete came from one of three sources: (i) self-fertilization (with probability *s*), (ii) migrant pollen from outside of the mother's neighborhood (with probability *m*_{p}), or (iii) outcrossing with males located within the mother's neighborhood (with probability 1 − *s* − *m*_{p}), much like earlier neighborhood models for pollen dispersal (Adams and Birkes 1991; Burczyk *et al.* 2002). Note that the mother's neighborhood is a circular area surrounding the mother that has the same radius as the seedling neighborhood. The probability of observing the *i*th seedling having a multilocus diploid genotype *G _{i}*, therefore, is(1)where

*M*and

_{ij}*F*are the genotypes of the

_{ijk}*j*th mother in the

*i*th seedling neighborhood and of the

*k*th father in the

*ij*th mother neighborhood, respectively.

*P*(

*G*|

_{i}*M*,

_{ij}*M*),

_{ij}*P*(

*G*|

_{i}*M*,

_{ij}*B*

_{p}), and

*P*(

*G*|

_{i}*M*,

_{ij}*F*) are the genetic segregation (or

_{ijk}*transition*) probabilities (Devlin

*et al.*1988),

*i.e.*, the probabilities that the

*i*th seedling has diploid genotype

*G*when a mother plant of genotype

_{i}*M*is, respectively, self-pollinated, pollinated by a distant unknown background male, or pollinated by a neighboring plant having genotype

_{ij}*F*(

_{ijk}. P*G*|

_{i}*B*

_{s}) is the transition probability that a seedling immigrating from mothers located outside of a seedling's neighborhood has genotype

*G*. Parameter ψ

_{i}_{ij}is the relative reproductive success of the

*j*th female in the neighborhood of the

*i*th seedling, and ϕ

_{ijk}is the relative reproductive success of the

*k*th male within the neighborhood of the

*ij*th female. In species not capable of self-fertilization

*s*= 0, and the terms in the model are simplified.

Relative female reproductive success is expressed as(2)where τ_{ij} is a function of one or more factors influencing the reproductive success of the *j*th female in the *i*th seedlings neighborhood, and the denominator is the sum over all (*s _{i}*) potential females in that neighborhood (so that ). Similarly, male reproductive success is(3)where π

_{ijk}is a function of factors influencing the reproductive success of the

*k*th male in the neighborhood of the

*j*th female located within the

*i*th seedlings neighborhood, and

*z*is the total number of potential fathers in the

_{ij}*ij*th female's neighborhood.

Although one may use various types of functions for τ_{ij} and π_{ijk} to relate female and male reproductive success to factors influencing reproductive success, here we use an exponential function [τ_{ij} = exp(ω_{ij}); π_{ijk} = exp(*v _{ijk}*)] because this assures positivity of the reproductive success parameters ψ

_{ij}and ϕ

_{ijk}, which can be regarded as relative proportions. Positivity is maintained for any expression for ω

_{ij}and

*v*accommodating various factors affecting reproductive success (Burczyk

_{ijk}*et al.*1996, 2002; Burczyk and Prat 1997; Bacles

*et al.*2005). In particular, seed and pollen dispersal can readily be described through exponential distributions. Variation in parental fitness based on relative fecundity surrogates (

*e.g.*, number of flowers, plant size) can also be approximated by an exponential distribution (Smouse

*et al.*1999; Morgan and Conner 2001). For example, if ω

_{ij}includes two factors, such as the distance of a seedling from potential mothers within its neighborhood and relative size (

*e.g.*, height) of the mothers, then we can let ω

_{ij}= γ

_{1}

*d*+ γ

_{ij}_{2}

*f*, where

_{ij}*d*is the distance between a seedling and a potential mother and

_{ij}*f*is the mother's relative size. If similar factors are considered for male reproductive success then let

_{ij}*v*= β

_{ijk}_{1}

*d*+ β

_{ijk}_{2}

*f*, where

_{ijk}*d*and

_{ijk}*f*are the distance between male and female and the male's size, respectively. The parameters γ

_{ijk}_{1}, γ

_{2}and β

_{1}, β

_{2}, in these cases, describe the strength and direction of the effects of their respective factors. These parameters are often referred to as selection and ecological gradients and they represent the slope of regression of individual fertilities on trait values (see Morgan and Conner 2001; Burczyk

*et al.*2002). The linear functions ω

_{ij}and

*v*can be further extended to include quadratic terms, making it possible to assess the effects of stabilizing or disruptive selection acting on particular traits influencing reproductive success (Morgan and Conner 2001; Wright and Meagher 2004). The unique feature of the seedling neighborhood model is that it can be applied to simultaneously estimating selection gradients of a given phenotypic trait in both male and female parents, making it possible to evaluate the importance of the trait in both male and female reproductive success.

_{ijk}To estimate the model's parameters the model is “fitted” to observed multilocus genotypic data of seedlings and adults using numerical procedures based on maximum-likelihood methods (see Adams 1992; Burczyk *et al.* 1996, 2002 for explanations). The likelihood function for a sample of *n* independently selected offspring individuals is(4)where **γ** and **β** are the vectors of parameters related to female and male reproductive success, respectively. Stated in nontechnical terms, the following can be estimated using these procedures given sufficient data: (1) the proportion of seedlings established from seed by distant (*m*_{s}) *vs.* nearby (1 − *m*_{s}) females; (2) the degree to which various factors such as distance and relative fecundity influence relative reproductive success of females within a local population; (3) the proportion of a mother's offspring within a local population (at the seedling stage) due to self-fertilization (*s*), pollination by distant males (*m*_{p}), or pollination by nearby males (1 − *m*_{p}); and (4) the degree to which various factors such as distance to females and pollen fecundity influence reproductive success of males within local populations.

#### Simulations:

We wished to explore the efficiency of the model and estimation procedure through computer simulation, which is frequently used to investigate statistical properties of mating models (Marshall *et al.* 1998; Morgan 1998; Gerber *et al.* 2000; Morgan and Conner 2001). Each simulation was initiated by generating a bisexual parental population consisting of 200 adults randomly distributed across a square area of 150 × 150 units. Also, the cohort of 500 seedlings was simulated and randomly distributed in the center of the plot across a square area of 30 × 30 units. In this way, drawing a neighborhood of 30 units in radius around seedlings and then the same size neighborhood around potential females resulted in complete neighborhoods located entirely within the generated plot. The average number of adults within a neighborhood of 30 units in radius was ∼25 individuals. Distances between seedlings and potential mothers (*d _{ij}*), as well as distances among adults (

*d*, male–female pairs), were calculated and used as factors influencing female and male reproductive successes (γ and β parameters, respectively).

_{ijk}Each parent was assigned a multilocus genotype by randomly sampling alleles from a specified frequency distribution (see below). Alleles were sampled independently within and among loci, so the parental population was in Hardy–Weinberg equilibrium and with linkage equilibrium among loci. The parental generation and the specified frequency distribution were used to generate seedling genotypes. First, the proportion *m*_{s} of seedlings had their diploid genotypes generated on the basis of the background allelic frequency distribution (same as for adults), which was assumed to represent immigrant seeds from a background source. For each of the remaining seedlings (1 − *m*_{s}) a female parent was chosen within the seedling neighborhood using a random number generator, with the probability of choosing the *j*th female given by ψ_{ij} in Equation 2, and a multilocus egg gamete was determined by randomly sampling alleles from the female parent's genotype. For each seedling whose female parent was within the neighborhood, the source of pollen gamete was determined randomly by the following probabilities: *s* (self pollen), *m*_{p} (pollen from the background source), and 1 − *s* − *m*_{p} (pollen from a male within the female parent's neighborhood, with the probability of choosing the *k*th male given by ϕ_{ijk} in Equation 3). The pollen haplotype from the specific source was then assigned by randomly sampling alleles from that source, and the genotypes of egg and pollen gametes were combined to form the seedling genotype. The complete set of adult and seedling genotypes was then subjected to the estimation procedure described above.

The potentially large number of parameters and their combinations prevent a comprehensive investigation of statistical properties. For simulations, we used two sets of marker loci varying in their exclusion power. The first set (marker set I) included six loci, each with three alleles at frequencies 0.7, 0.2, and 0.1. This set, with exclusion probability (EP) = 0.8034 (Chakraborty *et al.* 1988), might be considered typical for isozymes. The second set (marker set II) included six loci with 10 alleles, each with equal frequency (0.1) with EP = 0.9999, resembling a battery of microsatellites.

First we explored the properties of the model and estimation procedure in the simple case where only levels of seed and pollen immigration (*m*_{s} and *m*_{p}) were of interest, and there are no selfing (*s* = 0) or differences in reproductive success (female or male) among parents in the population (*i.e.*, parameters related to female and male reproductive success γ = β *=* 0). We simulated seedling samples on the basis of true parameter values of seed (*m*_{s}) and pollen immigration (*m*_{p}) ranging from 0.2 to 0.8, in various combinations (see Table 1). Second, seedling cohorts were simulated assuming that reproductive success of females and males is a function of seed and pollen dispersal within neighborhoods following an exponential distribution (Table 2). Here, the negative values of γ and β were used to represent the decreasing reproductive success with increasing the dispersal distance. The combination of parameters γ = −0.20, β = −0.10, *s* = 0.2 was used to represent a case of severely localized seed and pollen dispersal. Here, mean effective numbers of seed and outcross pollen parents (*i.e.*, selfing excluded) within seedling and mother tree neighborhoods were on average *N*_{es} = 7.08 and *N*_{ep} = 15.47, respectively (see Crow and Kimura 1970; Burczyk *et al.* 2002), as compared to the mean census number of ∼25 individuals. The parameter combination γ = −0.10, β = −0.05, *s* = 0.1 simulated less restricted seed and pollen dispersal (*N*_{es} = 15.40, *N*_{ep} = 22.03). The seed and pollen immigration levels were set at three categories: (i) *m*_{s} = *m*_{p} = 0, fully isolated population with no seed and pollen immigration (*i.e.*, all parents are within neighborhoods); (ii) *m*_{s} = 0.2 and *m*_{p} = 0.5, moderate seed and pollen immigration; and (iii) *m*_{s} = 0.5 and *m*_{p} = 0.8, extensive seed and pollen immigration. The range of parameter values used for simulations was chosen on the basis of the preliminary analyses of the example data sets (see below).

For each parameter combination 500 simulations were performed. The parameter estimates were obtained numerically on the basis of maximum-likelihood procedures using the Newton–Raphson method (Rao 1973; Kennedy and Gentle 1980). Means and variances of the estimates over the replicated data sets were then compared to the true parameter values.

#### Example data:

In addition to the simulated data sets, we applied the seedling neighborhood model to actual data obtained from two forest stands: a Scots pine (*P. sylvestris* L.) stand and a mixed-oak [*Quercus robur* L./*Q. petraea* (Matt.) Liebl.] stand. Both are certified seed collection stands located in Poland (Scots pine, Forest District Woziwoda; mixed oaks, Forest District Jamy). In the Scots pine stand, 525 seedlings (5–15 years old) and 313 adults (∼160 years old) were genotyped on the basis of eight allozyme loci (estimated EP = 0.72). In the oak stand, 320 seedlings (1–3 years old) and 450 adults (120 years old) were genotyped on the basis of three microsatellite loci (EP = 0.95). The locations of all seedlings and adults were mapped allowing for a detailed analysis of the effect of seed and pollen dispersal on corresponding male and female reproductive success within each stand. The neighborhood radius around seedlings and putative mothers was set to 40 m in both stands, which included on average 83.8 trees (166/ha) in the Scots pine and 73.9 trees (146.8/ha) in the mixed-oak stands. Such numbers of adults within neighborhoods seemed sufficient for precise estimation of seed and pollen dispersal patterns within neighborhoods (Burczyk *et al.* 1996). Also, with a 40-m radius, the seedling and subsequent mother neighborhoods were entirely included within the sampled plots of both stands. Reproductive parameters (*m*_{s}, *m*_{p}, γ, β, and *s*) were estimated using the computer program SNM v.1.0, written for this purpose (available from I. J. Chybicki upon request), which employs optimized procedures (Powell's and Newton–Raphson methods) to estimate parameters. The standard deviations of the parameters were derived from the Hessian (variance–covariance) matrix, which is an inherent part of the estimation procedure employed in the SNM program. Additionally, the significance of the parameters γ, β, *s* was assessed using likelihood-ratio tests (Manly 1992; Morgan and Conner 2001).

## RESULTS AND DISCUSSION

#### Simulations:

Simulations indicated that the seedling model gives reasonably robust estimates of gene flow and reproductive parameters when genetic markers with at least moderately high exclusion probabilities (EP ≥ 0.8) are available (Tables 1 and 2⇑). When individuals within local populations did not differ in female or male reproductive success (*s* = γ *=* β *=* 0), estimates of *m*_{s} and *m*_{p} were nearly unbiased; however, the marker set I (isozyme-like) showed high variances of the estimates (Table 1). Notably, while variances of were stable across a range of true *m*_{s} values, the variances of tended to increase with increasing *m*_{s}. This is because higher *m*_{s} reduces the number of nonimmigrant seedlings contributing to estimating pollen immigration. The standard deviations of for marker set I appeared to be unacceptably high in most cases, given the sample size used in the simulations (*n* = 500). However, it is expected that increased sample size will, at least, partly compensate for a lower exclusion probability (Morgan 1998; Morgan and Conner 2001). For marker set II (microsatellite type), the parameter estimates were unbiased and the variances of both and were much lower, although still being larger for .

With some exceptions (marker set I), the seedling model and estimation procedure provided efficient estimates (low bias and variance) in the cases where the simulations assumed reproductive success of individuals within local populations decreased exponentially with distance from the seedling location (females) or mate (males) (Table 2). When we assumed that all seed and pollen parents are local (*m*_{s} = *m*_{p} = 0), the variances of and were low and of equivalent magnitude, within a given marker set. However, the variances of both estimates tended to increase with increasing *m*_{s} and *m*_{p}. In all cases, mean parameter estimates of both seed and pollen immigration were very close to their true values (*i.e.*, low bias). The means of standard deviations of parameters based on the Hessian matrix for each simulated data set were nearly identical to standard deviations derived from replicate simulations.

The estimation procedure often failed to converge to reasonable parameter estimates for the simplified data sets with the lower EP, probably due to insufficient genetic resolution. The Newton–Raphson algorithm used in our iterative estimation procedure requires careful choice of initial parameter values for convergence, especially for data sets with less genetic information (Thisted 1988; Morgan and Conner 2001). In our simulations, the parameter values were used as the initial starting points in the iterations. However, the actual parameter values of individual simulated data sets could differ from the expected *m*_{s} and *m*_{p}, and if such difference was considerable this led to the lack of convergence, more often for marker set I. We investigated the probability surface (expressed as the log-likelihood) around the true parameter values in the special case where *m*_{s} = *m*_{p} = 0.5 and *s* = γ *=* β *=* 0. To obtain smooth graphs, we used a large seedling sample (*n* = 10,000) (Figure 2). In the case of marker set II (microsatellites), the log-likelihood has a strong peak at the true values of *m*_{s} and *m*_{p} (0.50, 0.50). Here, the starting points, if shifted away from the true parameter values, should still converge closely to expected parameter values. However, for marker set I (isozymes), although the peak of the log-likelihood surface corresponds to the expected parameter values, there is a ridge through the maximum along which the gradient is close to zero and the algorithm could stop at any point on this ridge, depending on the convergence criterion. The starting points, if shifted away from the true values, could easily converge to a point on the log-likelihood surface that is not the peak (*i.e*., where and/or are not close to their true values), especially for small data sets with less genetic information. Note that the slope of the log-likelihood surface is steeper along the *m*_{s} axis than along *m*_{p} (Figure 2). The distribution of the log-likelihood around the true parameter values along the *m*_{s} and *m*_{p} axes supports our earlier observations, that the variances of are smaller than those of and that these variances are smaller for the microsatellite- than for the isozyme-type genetic markers.

#### Empirical seed and pollen gene flow estimates:

The level of seed immigration in the Scots pine population appears considerable; it is estimated that nearly 45% of seedlings resulted from seed flow over distances >40 m (Table 3). In addition, it is estimated that 92% of the seedlings with local mothers resulted from pollination by distant (>40 m) males. These results suggest that only a small percentage (0.55 · (1 − 0.92) · 100 = 4.4%) of the seedlings in this population have both parents that are local. Although and are quite different from zero, their estimated variances are considerable, as are the variances of and ŝ. The pollen dispersal parameter, β, could not be estimated, probably due to low frequency of local pollination events (4.4%). Despite the large number of seedlings sampled (525), the precision of these data was greatly compressed by the low exclusion power (EP = 0.72) of the isozyme data set. Nevertheless, the high observed levels of pollen and seed gene flow are not surprising for Scots pine. This species is well known for extensive seed and pollen dispersal. Its pollen sedimentation velocity is among the lowest in conifers and the species has the greatest capability to disperse seeds among temperate forest trees (Geburek 2005). Independent estimates of seed and pollen gene flow obtained in the same population on the basis of seed samples collected in seed traps and directly from mother trees are quite comparable to the results reported above (A. Dzialuk and J. Burczyk, unpublished results).

In the case of the oak stand, we were able to estimate all intended reproductive parameters, but the patterns of gene flow were much different from those estimated for Scots pine. Only ∼6% of seedlings originated from mother trees located >40 m from seedlings (Table 3). Also, seed dispersal within the local population was very limited ( = −0.2027), which means that the majority of seedlings originated from nearby mother trees. On the other hand, pollen gene flow was extensive, as ∼62% of pollen gametes fertilizing local mother trees came from unsampled males situated outside the mothers' neighborhoods. This suggests that a considerable percentage (0.94 · (1 − 0.62) · 100 = 35.7%) of the seedlings in this population have both parents that are local. In addition, while pollen dispersal within neighborhoods followed a negative exponential distribution, it was less restrictive ( = −0.0908) than seed dispersal (Figure 3). Variances of parameter estimates were low, indicating high precision of the estimation procedure. This was possible because the applied set of microsatellites had relatively high exclusion power (EP = 0.92). The limited seed but extensive pollen dispersal in oaks is consistent with observations in several earlier studies (Dow and Ashley 1996; Streiff *et al.* 1999).

#### Conclusions:

The particular problem in parentage assignment for naturally regenerated seedlings is that it is possible to assign only a fraction of the seedlings to unique potential parental pairs (Meagher and Thompson 1987; Chakraborty *et al.* 1988). This difficulty is due to insufficient exclusion power of genetic markers and/or parentage with unknown (not genotyped) female and/or male parents (seed and pollen immigration). The two-parent problem, *i.e*., the problem of determining which of the two identified parents is mother and which is father, also complicates parentage analyses, especially in monoecious species (Meagher and Thompson 1986). The seedling neighborhood model described in this article attempts to overcome these difficulties. The seedling neighborhood model does not assign parentage for offspring but rather models the genealogy of seedlings by reconstructing the two-stage movement of genes through pollen and seeds.

In our description of the model, we defined 1 − *m*_{s} as the proportion of seedlings whose mothers are within the seedling's neighborhood. In practice, however, one cannot distinguish between the genotypes of male and female parents of a seedling (two-parent problem; Meagher and Thompson 1986). It may happen that in real situations the actual mother of a given seedling could be located outside its neighborhood while the father is located within the immediate neighborhood of the seedling. This may lead to underestimation of seed movement and at the same time to overestimation of pollen movement. Nevertheless, if for a given species seed dispersal is more restricted than pollen dispersal, which is typical in plants (Ennos 1994; Hamrick and Nason 2000), the populationwide *m*_{s} estimates should approximate the actual proportion of seed immigration.

In this article, we assumed that background allele frequencies are the same for seed and pollen immigration. However, despite little spatial variation in the genetic structure in several natural populations, different modes of seed and pollen dispersal may cause allele frequencies to differ for immigrating seed and pollen pools. Nevertheless, sensitivity of the model to biased background allelic frequencies used in the estimation procedure (Burczyk and Chybicki 2004) decreases with increased exclusion probabilities of the markers. We also assumed that there are no mutations or genotyping errors. However, it might be expected that especially genotyping errors may considerably affect the estimates of seed or pollen immigration (Marshall *et al.* 1998; Burczyk *et al.* 2004; Slavov *et al.* 2005). We will address this issue in a future work.

In our simulations and data examples we did not estimate selection gradients (*sensu* Morgan and Conner 2001). However, the seedling model can easily accommodate additional parameters that would relate individual reproductive success to individual features, such as plant size, flower fecundity, phenology, or floral characteristics (Burczyk and Prat 1997; Smouse *et al.* 1999; Wright and Meagher 2004). In addition, application of the seedling model makes it possible to assess how these traits affect female *vs.* male reproductive success. Another unique feature of the model is that it is possible to estimate rates of selfing at the seedling stage, which may provide new insights into mechanisms determining inbreeding levels in naturally regenerated populations. We believe that applications of this approach will lead to enhanced recognition of the interactions among genetics, ecology, and demography at the population level. Such information is of primary interest in genetic conservation programs, where natural regeneration is the major mode of reproduction.

## Acknowledgments

We thank Artur Dzialuk and Magdalena Trojankiewicz for assistance in collecting genotypic data of Scots pine and oak populations. This work was partly supported by research grants from Polish Committee for Scientific Research: 5P06H 042 15 and 3P06L 034 23.

## Footnotes

Communicating editor: M. K. Uyenoyama

- Received June 13, 2005.
- Accepted February 15, 2006.

- Copyright © 2006 by the Genetics Society of America