## Abstract

We introduce a novel indirect method of estimating the pollen dispersal curve from mother–offspring genotypic data. Unlike an earlier indirect approach (TwoGener), this method is based on a normalized measure of correlated paternity between female pairs whose expectation does not explicitly depend on the unknown effective male population density (*d*_{e}). We investigate the statistical properties of the new method, by comparison with those of TwoGener, considering the sensitivity to reductions of *d*_{e}, relative to census density, resulting from unequal male fecundity and asynchronous flowering. Our main results are: (i) it is possible to obtain reliable estimates of the average distance of pollen dispersal, δ, from indirect methods, even under nonuniform male fecundity and variable flowering phenology; (ii) the new method yields more accurate and more precise δ-estimates than TwoGener under a wide range of sampling and flowering scenarios; and (iii) TwoGener can be used to obtain approximate *d*_{e} estimates, if needed for other purposes. Our results also show that accurately estimating the shape of the tail of the pollen dispersal function by means of indirect methods remains a very difficult challenge.

UNLIKE direct methods based on parentage analysis, indirect methods of gene flow estimation characterize the spatial genetic structure of genotypes by means of a given parameter and, under particular population genetic models, estimate the level of gene flow that would produce a distribution with that chosen parameter value (Slatkin 1985, 1987). Classical indirect methods assume an infinite island population model and use the standardized variance in neutral allele frequencies among populations (*F*_{ST}) to estimate the product *Nm*, where *N* is the effective size of each population and *m* is the average rate of migration (Wright 1931). Under isolation by distance for discrete populations, estimates of *F*_{ST} for pairs of populations yield corresponding estimates of *N*σ^{2}, where σ^{2} is the axial variance of dispersal (Rousset 1997). Within continuous populations, the expected decay rate for the probability of gene identity with distance (Wright 1943; Malécot 1975) is the basis of different indirect estimators of *D*σ^{2}, *D* being the effective density of individuals (Epperson and Li 1997; Hardy and Vekemans 1999; Rousset 2000). The summary parameters *Nm* and *D*σ^{2} provide a useful synthesis of demographic and dispersal processes, reflecting the relative importance of historical gene flow and genetic drift in shaping local differentiation.

Despite their evolutionary interest, however, gene flow estimates provided by classical indirect methods assume evolutionary equilibrium and are therefore of limited value when dealing with contemporary ecological processes. The long-term average yielded by these methods will rarely reflect current patterns of gene flow (Bossart and Prowell 1998), especially for populations distributed across recently disturbed habitats, a situation of particular interest in conservation biology. Direct methods of gene flow estimation, based on measurements of dispersal distances obtained from parentage analyses or marked individuals and propagules, represent a useful alternative for describing current population dynamics (Bennetts *et al*. 2001; Jones and Ardren 2003). In the case of plants, the estimation of pollen-mediated contemporary gene flow has attracted a great deal of attention in recent years, since pollen's potential for long-distance movement determines the maintenance of genetic connectivity among population fragments in disrupted ecosystems (White *et al*. 2002), the level of undesired gene immigration into breeding and conservation populations (Adams *et al*. 1997; Plomion *et al*. 2001; Potts *et al*. 2003), and the risk of diffusion of transgenes into wild populations (Stewart *et al*. 2003).

Paternity analysis is a widely used method of direct estimation for the distribution of pollen dispersal distances (Devlin *et al*. 1988; Adams *et al*. 1992; Smouse *et al*. 1999; Burczyk *et al*. 2002). However, paternity-based methods are laborious and require knowing the genotypes and spatial coordinates of all potential pollen donors in the study area, which greatly limits the spatial extent of the analysis and subsequent inferences about pollen dispersal patterns (Sork *et al*. 1999). The need to extend the range of contemporary pollen flow estimates with relatively low sampling effort was the motivation for the TwoGener model (Smouse *et al*. 2001), based on the genetic structure (Φ_{ft}) of the pollen pools of a sample of females spaced out across the landscape. Unlike the classical indirect approaches, this method provides not a historical, but rather a real-time estimate of dispersal, relating to a single bout of pollination. Like other indirect approaches, however, it draws inference on gene movement from genetic structure data, since it translates Φ_{ft} into estimates of *N*_{ep} and *d*_{e}δ^{2} (Austerlitz and Smouse 2001a: Equations 15 and 16), where *N*_{ep} is the effective number of pollen donors (the inverse of the probability of paternal identity within a maternal sibship), δ is the average distance of pollen dispersal, and *d*_{e} is the effective density of pollen donors.

Now, obtaining a separate estimate of the dispersal parameter δ from the genetic-structure measure Φ_{ft} requires independent knowledge on the demographic parameter *d*_{e}, a characterization of which involves difficult and costly field measurements. This is a methodological problem that is shared with classical indirect methods, in which obtaining a separate estimate of the dispersal parameter *m* (or σ^{2}) requires an independent estimate of the demographic parameter *N* (or *D*), which will generally be different from the observed census value and not easy to extract (Slatkin 1987). Trying to circumvent this problem, Austerlitz and Smouse (2002) developed a refined procedure, based on the distribution of pairwise-Φ_{ft} values among female pairs, which allows for a separation of δ and *d*_{e}. The statistical properties of this method, however, have been tested only with simulated populations where effective male density does not deviate from census density (*i.e.*, *d*_{e} = *d*) (Austerlitz and Smouse 2002; Austerlitz *et al*. 2004). Since common factors such as unequal male fecundity (flowering intensity) and asynchronous flowering may substantially decrease the effective density of pollen donors (Erickson and Adams 1989; Devlin *et al*. 1992; Kang *et al*. 2003), so that *d*_{e} < *d*, our ability to extract reliable dispersal distance estimates from pollen structure data under realistic conditions remains to be elucidated.

In this study, we develop a novel statistical procedure that allows us to estimate the pollen dispersal function from the genetic structure of the pollen pool, independently of effective density. This approach is based on the expected decay with spatial distance of a normalized measure of correlated paternity between female pairs, characterized in terms of kinship coefficients. We perform numerical simulations to compare its statistical properties with those of the TwoGener method, considering different sets of assumptions that translate into a degree of inequality between effective and census density. Specifically, we evaluate how the bias and mean squared error of the δ-estimates obtained with each method are affected by: (1) sample size and sample allocation, (2) resolution of the genetic markers, (3) effective number of pollen donors, (4) unequal male fecundity, and (5) asynchronous flowering. We also explore what inference can be drawn about the tail of the pollen dispersal curve from each method and, finally, discuss how much can be achieved by using genetic-structure-based approaches to estimate contemporary pollen dispersal.

## METHODS

#### General population and dispersal models:

We assume an infinite population of monoecious, self-fertile individuals, randomly distributed in space, with census density *d*. Individuals are noninbred and show no spatial genetic structure. Self-fertilization occurs at random, and male gametes disperse independently and according to a bivariate, radially symmetrical probability distribution *p*(*x*, *y*), which gives the probability that a female at (0, 0) draws a single pollen grain from coordinates (*x*, *y*).

This theoretical framework is very similar to that considered in the TwoGener model by Austerlitz and Smouse (2001a), but unlike that earlier model, here we assume nonuniform male fecundity. Under the model of equal male fecundity, the probability of identity of two effective male gametes originated from individuals lying within a unit area is *P*_{emf} = 1/*d* (Austerlitz and Smouse 2001a). Now, if the population exhibits unequal male fecundity, this probability will be *P*_{umf} > *P*_{emf}, and we define the effective male density of the population as *d*_{e} = 1/*P*_{umf}. This yields *d*_{e} ≤ *d* in general, with equality only in the case where all males in the population have identical fecundity.

#### The probability of paternal identity within and among maternal sibships:

Denote *Q*_{o} the probability of paternal identity within a maternal sibship, *i.e.*, the probability that a female mates twice with the same male, and *Q*(*z*) the probability of paternal identity between the sibships of two females at a distance *z* apart, *i.e*., the probability that these two females mate with the same male. Austerlitz and Smouse (2001a) have obtained expressions for *Q*_{o} and *Q*(*z*) as functions of the census density *d* and the pollen dispersal distribution *p*(θ; *x*, *y*), where θ is the parameter set of the assumed distribution. Their derivations are based on *P*_{emf} (as defined above), which was estimated as 1/*d*. In our population model, under the assumption of unequal male fecundity, this probability is *P*_{umf} = 1/*d*_{e}, and we can thus directly substitute *d*_{e} for *d* in Equations 10 and 17 in Austerlitz and Smouse (2001a), which become(1)for the probability of paternal identity within maternal sibships and(2)for the probability of paternal identity between any given female pair a distance *z* apart. Note that Equations 1 and 2 assume that effective density is spatially and temporally homogeneous, *i.e.*, that there is no spatial autocorrelation in male fecundity and that the effective pollen pool does not change during the pollination period.

#### Pairwise TwoGener analysis:

This method for estimating the parameters of the distribution of pollen dispersal distances requires a sample of *n*_{m} mapped seed plants (mothers) spread over the landscape, *n*_{o} seeds harvested from each of them, and the inferred male gametic contribution to each seed at *n*_{L} unlinked neutral loci (see Smouse *et al*. 2001). The estimation procedure is based on the formal relationship between the expected differentiation between the pollen clouds () of a pair of mothers *i* and *j*, a distance *z _{ij}* apart, and the expected probabilities of paternal identity within and among their progenies, given by Equations 1 and 2:(3)(Austerlitz and Smouse 2001a, 2002). This relationship yields the expected ϕ

_{ij}parameters as a function of the effective male density (

*d*

_{e}) and the assumed pollen dispersal distribution, with parameter set θ. Once the observed ϕ

_{ij}-values (denoted ) have been computed for each pair of sampled mothers (by using an analysis of molecular variance, as described in Smouse

*et al*. 2001), it is possible to estimate the dispersal parameters by minimizing the following squared-error loss criterion for the choice of those parameters:(4)

#### Pairwise kinship analysis:

Equation 3 provides one possible formal relationship between the genetic structure of the pollen pool and the pollen dispersal distribution. Its form, which forces the subsequent joint estimation of *d*_{e}, is determined by our choice of a fixation index (Φ_{ft}) to characterize the spatial distribution of male gametes, and it is possible that this choice is suboptimal for the estimation of the dispersal function. In particular, if the focus of interest is the dispersal function itself, it might be preferable to use some measure of genetic structure that is determined by the spatial extent of dispersal, but as independent as possible from the effective density of males.

Assume that we have the same sampling scheme and genetic assay as described above: a sample of *n*_{m} mapped mothers, *n*_{o} seeds from each, and the paternal gametic genotype of each embryo at *n*_{L} unlinked neutral loci. Now, instead of using Φ_{ft}, we describe the genetic structure of the pollen pool by the ratio , measuring the correlation of paternity between maternal sibship pairs a distance *z* apart, relative to the average correlation of paternity within maternal sibships in the population. Using Equations 1 and 2, the expected value of Ψ(*z*) for two mothers *i* and *j* a distance *z _{ij}* apart, for a given pollen dispersal distribution

*p*(θ;

*x*,

*y*), takes the form(5)The value of is one for

*z*= 0 and tends to zero for large

*z*. Note that the effective density factor (

*d*

_{e}) cancels out of the argument, whatever the pollen dispersal curve assumed, so that the expected decay of Ψ(

*z*) with spatial distance depends only on this dispersal function. We expect that Ψ(

*z*) should provide an estimator of the dispersal parameters that is essentially independent of the value of

*d*

_{e}.

Similarly to the pairwise TwoGener approach, we need to calculate a set of observed Ψ(*z*)-values, denoted for the *i*th and *j*th mothers, which will allow us to estimate the dispersal parameters by minimizing the squared-error loss criterion for the choice of those parameters:(6)The estimation of the pairwise in the sample involves estimating the correlation of paternity within and among maternal sibships. For this purpose, we use a procedure based on the computation of pairwise kinship coefficients (*F*) between the paternal gametic genotypes of offspring pairs (Hardy *et al*. 2004). The kinship coefficient between two paternal gametes estimates the probability of allelic identity for two homologous genes, sampled one from each gamete. Given the male gametic genotypes of the *g*th and *h*th offspring, the expectation of *F _{gh}* is 0.5 if they have the same (noninbred) father and 0 if they have different (unrelated) fathers. Consequently, twice the average of

*F*over many maternal–sib pairs yields an estimate of , and twice the average of

_{gh}*F*between offspring from two different mothers a distance

_{gh}*z*apart provides an estimate of

*Q*(

*z*). We estimated the multilocus (

*n*

_{L}loci) kinship coefficients in our sample as(7)(Loiselle

*et al*. 1995), where

*p*and

_{lag}*p*are the frequencies of allele

_{lah}*a*at locus

*l*in the paternal gametes of

*g*and

*h*, respectively, is the average frequency of allele

*a*at locus

*l*over all

*n*paternal gametes in the complete collection of maternal progenies, and

*n*

_{a}_{,l}is the total number of alleles at locus

*l*. For the case where the paternal allele at a given locus can be determined categorically,

*p*= 1 if the paternal allele is

_{lag}*a*, and

*p*= 0 otherwise. If there is ambiguity (

_{lag}*i.e.*, both the mother and the embryo share the same heterozygous genotype), we set

*p*for each of the two possible alleles at their posterior likelihood values, given the observed global pollen pool allele frequencies (Smouse

_{lag}*et al*. 2001; Irwin

*et al*. 2003), and

*p*= 0 for the rest of the alleles.

_{lag}It is important to note that genetic marker-based estimators such as estimate kinship relative to the average relatedness for random individuals in the sample, because the sample itself is used as the reference population (Hardy 2003). By construction, the average over all offspring pairs will be zero, so that values of between individuals that are less related than the average will be negative. Since we expect positive observed values of , and also of *Q*(*z*) for small *z*, estimates of *Q*(*z*) obtained directly from will be (on average) negative for large *z*, even though the parametric values are always positive, biasing subsequent estimates of the dispersal function parameters.

A possible solution to this problem is to estimate kinship coefficients among paternal gametes relative to the adult population, by estimating the frequencies from a sample of potential pollen donors and not from the sample of paternal gametes itself. However, this approach results in inflated estimation errors, requiring a very large number of extra samples, which would compromise the sampling economy of the method (data not shown). We used an alternative procedure, which requires identifying *a priori* those pairs of offspring in the sample that must have been sired by different fathers. We can then estimate the average kinship coefficient for those offspring, relative to the sample (denoted , expected to be ≤0), and use this average to recalibrate all the -values, yielding the desired kinship coefficients (denoted ) relative to a reference population of unrelated male gametes (Hardy 2003):(8)

The question is how to identify *a priori* the unrelated male gametes in the sample, so that we can estimate . Austerlitz and Smouse (2001a) have shown that, irrespective of the precise shape of the dispersal function, the probability that two mothers at a distance *z* ≥ 5δ apart are pollinated by the same father is negligible, δ being the average distance of pollen dispersal. Thus, if we concentrate on a subset of maternal pairs that are located far enough apart (*z* > 5δ), we can use the average pairwise between their offspring to obtain . Although δ is one of the unknowns of primary interest here, rough estimates or values available from similar species could be used, since, as we show later, these estimates are reasonably insensitive to the precise minimum distance (δ_{u}) used to define “unrelated” pollen pools in the sample.

Once the coefficients have been computed, we can calculate the observed as twice the average over all the within-mother sib pairs in the sample and the observed *Q*(*z*) for two mothers at distance *z* as twice the average cross-mother between pairs of offspring (one from each). The ratio of the observed *Q*(*z*) and yields the pairwise , from which we can infer the dispersal parameters, using Equation 6.

#### Simulations:

We used numerical simulation to assess the bias and accuracy (root mean square error, ) of the estimates of the average distance of pollen dispersal (δ), for both the pairwise TwoGener analysis and the new method based on kinship coefficients (Kindist). We simulated populations of 10,000 adult, monoecious, self-fertile individuals, randomly distributed over a 100 × 100 unit square area, subject to toroidal boundary conditions. By construction, census density was *d* = 1. We characterized each individual by a genotype of *n*_{L} unlinked loci. Each locus had *n*_{A} equifrequent alleles in the population, and the genotype of each individual was created by randomly drawing its two alleles from this population frequency distribution.

Individual male fecundity (flowering intensity) values were set at λ_{k} = 1 under the equal male fecundity assumption or were randomly sampled from a skewed (Weibull) probability distribution, *f*(α, β; λ) = β/α(λ/α)^{β−1}exp{−(λ/α)^{β}}, for the unequal male fecundity scenarios. The mean of this distribution is given by = αΓ(1 + 1/β) and its variance by Var(λ) = α^{2} · [Γ(1 + 2/β) − Γ^{2}(1 + 1/β)], where Γ is the classical gamma function (*e.g.*, Abramowitz and Stegun 1964) . The average λ was set to “1” in all cases, but we tested the effect of increasing values of the coefficient of variation in male fecundity (CV_{mf} = 0.25, 0.50, 0.75, 1.00, and 1.50) on our estimates.

We also investigated the bias and accuracy of the estimators under conditions of asynchronous flowering phenology. Values for male and female mean date of flowering were independently assigned to each individual by randomly sampling from a normal distribution with mean zero and a standard deviation that ranged from 0 to 4 time units, depending on the scenario. We set the length of both female receptivity and pollen shedding periods at a constant value of 10 time units. We defined a measure of phenological overlap (φ_{jk}) between the *j*th and *k*th individuals as the proportion of the receptivity period of female *j* coincident with the pollen-shedding period of pollen donor *k*, assuming uniform pollen anthesis and female receptivity. The population φ_{jk} mean value ranged from 55 to 100% for the sets of chosen parameters (Figure 1).

For each simulated population, we randomly drew the sample of *n*_{m} mothers from a central circular area of diameter *D*. Next, we generated *n*_{o} offspring for each mother *j*, the father of each offspring being assigned as follows: (1) draw random coordinates, centered on *j*, from the bivariate pollen dispersal distribution *p*(*x*, *y*); (2) find the individual (male *k*) nearest those coordinates; (3) assign individual *k* as the father with probability proportional to λ_{k}φ_{jk}; (4) start again from step 1 if individual *k* is discarded. The genotypes of the offspring were constructed from the parent's genotypes assuming Mendelian segregation. We used pollen dispersal distributions from the exponential-power family, both to simulate the data and to perform the estimations,(9)where Γ is the classical gamma function, *a* is the scale parameter for distance, and *b* is the shape parameter. The average distance of dispersal is given by δ = *a*Γ(3/*b*)/Γ(2/*b*). For *b* = 2 and *b* = 1, Equation 9 degenerates to the normal and exponential distributions, respectively. When *b* < 1, the function becomes fat tailed (Clark 1998), implying a greater component of long-distance dispersal. We considered as the reference case *a* = and *b* = 2, which corresponds to a bivariate normal distribution with axial variance and , testing the impact of different sampling and flowering factors by means of 1000 independent simulation runs for each parameter set.

Using the simulated maternal and offspring genotypes, we estimated δ for each replicate, applying both the TwoGener and Kindist methods. In the case of TwoGener, we considered that effective density (*d*_{e}) is unknown, the usual situation in most experimental studies, and we tested two alternative estimators of δ: , obtained by setting *d*_{e} = *d*, and , obtained by estimating *d*_{e} jointly. On the basis of the 1000 replicates, we computed the bias and for each estimator and then divided by the parametric value of the corresponding target variable, yielding the relative bias and the relative .

Extracting the estimates with functions other than the normal requires a numerical integration of Equation 2 for each pair of mothers, entailing several days to perform the estimations for a single simulation replicate. For this reason, we performed a limited set of analyses with lower replication (10 repetitions for each parameter set) to investigate the ability of the methods to estimate the shape parameter of the dispersal function under different leptokurtosis levels (*b* = 2, 1, and 0.5). In these simulations, we set population density at *d* = 0.0016 and the average distance of pollen dispersal at δ = 100 (for consistency with Austerlitz *et al*. 2004). All TwoGener methods used here are available in Famoz (http://www.pierroton.inra.fr/genetics/labo/Software/Famoz/index.html). Programs used for the Kindist analysis and to simulate the populations are available from J. J. Robledo-Arnuncio.

## RESULTS

#### New kinship-distance analysis:

The observed values of Ψ(*z*) fit the theoretically expected decay function with increasing distance between the sampled mothers, provided that kinship coefficients were appropriately estimated, relative to a subset of unrelated pollen gametes in the sample (Figure 2). The dispersal parameter estimate obtained with Kindist () has a very small negative bias, and its accuracy () is fairly insensitive to the threshold distance between mothers (δ_{u}) chosen to define unrelated pollen pools (Table 1). Values of δ_{u} higher than twice the mean pollen dispersal distance yield virtually unbiased estimates of δ, although a sufficiently large (>100) number of the mother pairs in the sample must be separated beyond δ_{u} to avoid an increment in the bias and of . Extending the total area (*D*) over which the mothers are sampled produces a slight decrease in the bias and an increase in the of the estimator.

#### Effect of sample size:

Both the kinship-based and the TwoGener-based (assuming *d*_{e} = *d*) and (joint estimation of effective density) exhibit better statistical properties for increasing numbers of sampled mothers than for increasing numbers of offspring per mother (Table 2). The new estimator () has consistently lower , and generally lower bias, than for the different sampling efforts considered. The best estimator is for all sample sizes, as a logical consequence of the fact that we are still operating under the scenario with *d*_{e} ≡ *d*, which this estimator includes as an *a priori* specification. That is, if phenology were synchronized and males were equally fecund (which is unlikely in natural populations), then setting *d*_{e} ≡ *d* would be the estimation strategy of choice.

#### Effect of genetic resolution:

All three estimators benefit substantially from an increase in genetic marker resolution, especially in terms of their (Table 3). Paternal exclusion probabilities (EP) >0.99 are necessary to achieve ≤ 0.25 for and , while requires lower genetic resolution (EP > 0.50) to reduce below 20%. Noteworthy, both and suffer important negative biases with biallelic loci. The values of are all lower than those of for most values of EP, the difference becoming somewhat smaller with decreasing genetic resolution. Again, under the ideal scenario of *d*_{e} ≡ *d*, is the best estimator for any given level of genetic resolution, but especially when using low-resolution genetic markers. More generally, when *d*_{e} is unknown, high-exclusion genetic markers will be required to obtain reliable dispersal estimates, especially for TwoGener.

#### Effect of the effective number of pollen donors (*N*_{ep}):

An increase in *N*_{ep} caused by increasing population density results in larger biases of all three estimators (Table 4). But while the bias of seems minimally sensitive to the variation in *N*_{ep}, and exhibit substantially larger bias with increasing *N*_{ep}. For *N*_{ep} ≤ 25, however, and show fairly similar statistical properties. Overall, is a better estimator than under the different simulated levels of pollen-pool genetic structure, but while may be strongly biased, it shows the lowest for most values of *N*_{ep}. It must be stressed that lower accuracy for all three estimators can be expected as *N*_{ep} increases, as a result of decreasing pollen-pool genetic structure.

#### Estimation under unequal male fecundity:

The gap between the parametric effective (*d*_{e}) and census (*d*) population density values increases as the coefficient of variation in male fecundity (CV_{mf}) increases, a factor that has contrasting effects on the estimators (Table 5). The new estimator shows the best behavior: an increase in CV_{mf} reduces its bias, which becomes virtually zero for CV_{mf} > 50%, and increases its only very slightly. By contrast, using TwoGener with the assumption that *d*_{e} = *d* (which it is not true) becomes the worst estimation strategy: suffers important negative biases, as well as the largest of the three estimators when CV_{mf} > 50%. In this case, the joint estimation of δ and *d*_{e} substantially improves the statistical properties of the TwoGener approach, although both the and bias of remain larger than those of . The estimator of *d*_{e} provided by the pairwise TwoGener analysis () shows fairly large bias and , increasing with CV_{mf}. Overall, seems the best estimation strategy to counteract the effect of unequal male fecundity.

#### Estimation under asynchronous flowering:

The existence of variance in male mean date of flowering (*M*_{SD}) but not in female mean date of flowering (*F*_{SD}), or vice versa, has limited impact on the estimators (Table 6). If *M*_{SD} = 0 and *F*_{SD} > 0, then all the males have the same chance of mating with any given female at any given time, and variation in the timing of female receptivity does not affect the effective density of males. If *M*_{SD} > 0 and *F*_{SD} = 0, then any particular male has the same phenological overlap with all females, and variation in overlap coefficients among males has the same effect on the estimators as variation in male fecundity: there is a reduction in *d*_{e}, but affecting all females equally. By contrast, if both *M*_{SD} and *F*_{SD} > 0, then females with asynchronous receptive periods are exposed to different arrays of potential pollen donors, and as a consequence *d*_{e} (in addition to being reduced) will generally be different across females. Therefore, the assumption of temporal homogeneity of *d*_{e} in Equations 1 and 2 is violated, which is reflected in the estimators of the dispersal parameter (Table 6).

Indeed, all three estimators yield a negatively biased dispersal parameter under flowering asynchrony (when both *M*_{SD} and *F*_{SD} > 0), although the bias is <8% for and , and <4% for , when the mean phenological overlap in the population () is >65%. Notably, has the lowest for all the values of considered, as a result of its smaller variance. The least-biased estimator, , exhibits smaller than for strong flowering asynchrony ( = 0.55), but larger for a moderate level of asynchrony ( = 0.77). The corresponding effective density estimates, , show a bias <5% for ≥ 65%, but >20% for = 55%, with substantial in all cases. Overall, asynchronous phenology produces low to moderate biases in all three dispersal estimators, with showing the smallest .

#### Combined impact of unequal male fecundity and asynchronous flowering:

Equating effective and census density *a priori* represents the worst estimation strategy when floral asynchrony and nonuniform male fecundity occur simultaneously (Table 7). Under this scenario, shows the strongest bias and of the three estimators, for all the combinations of CV_{mf} and considered.

The new estimator, , maintains the negative bias produced by asynchronous flowering, somewhat attenuated by the presence of unequal male fecundity. For instance, the bias of is −0.074 for CV_{mf} = 0 and = 65%, while it drops to −0.040 for CV_{mf} = 150% and the same value of (Tables 6 and 7). Interestingly, in the case of , the negative and positive biases caused by floral asynchrony and unequal male fecundity, respectively, compensate. When both factors are combined, the bias of ranges from <1% to a maximum of only 5.3%, depending on the particular combination of CV_{mf} and , being smaller than that of under greater floral asynchrony ( = 65 and 55%). The of is smaller than that of in all cases, but the difference becomes minimal under strongly asynchronous flowering ( = 55%). Finally, the statistical behavior of improves when unequal male fecundity occurs along with asynchronous flowering (Tables 6 and 7), providing effective density estimates with moderate biases (0–15%) and moderate (20–30%). On the whole, would be the estimation strategy of choice when both unequal male fecundity and asynchronous phenology are present.

#### Estimation of the shape of the dispersal function:

Both Kindist and TwoGener yield estimates of the shape parameter (*b*) of the exponential-power pollen dispersal curve with rather high bias and for all the assumed levels of leptokurtosis (*b* = 2, 1, and 0.5; Table 8), although the errors do not seem very sensitive to the presence of unequal male fecundity and flowering asynchrony. Except for the normal function (*b* = 2) under uniform flowering conditions, the of is smaller for Kindist than for TwoGener, but the difference becomes small under asynchronous flowering, either alone or combined with unequal male fecundity.

Due to a compensation effect between the joint estimates of the shape () and scale (*â*) parameters, the corresponding estimate of the average distance of pollen dispersal () is less biased and somewhat more accurate (Table 8). The bias and of ranged from 1 to 14% and from 14 to 32%, respectively, increasing with leptokurtosis (decreasing *b*) but being rather robust to unequal male fecundity and asynchronous flowering. For TwoGener, the of also increased with leptokurtosis, being slightly smaller than that of for *b* = 2 and *b* = 1 and slightly larger for *b* = 0.5. Although TwoGener yielded a better estimator of δ under unequal male fecundity, the estimation under asynchronous phenology (alone or combined with male fecundity variation) suffered inflated bias and . The corresponding effective density estimate () provided by TwoGener has rather high , increasing with leptokurtosis, although it improved under the combined effect of unequal male fecundity and flowering asynchrony, showing virtually no bias. Summarizing, Kindist yields slightly better (but still crude) estimates of the shape parameter *b* than TwoGener, as well as better estimates of δ when unequal male fecundity and asynchronous phenology are combined.

## DISCUSSION

The motivation for this study was the need to devise a new indirect method to estimate the spatial extent of pollen dispersal that was insensitive to unknown effective male population density (*d*_{e}). First, we have introduced a new statistic, Ψ(*z*), to characterize the genetic structure of the pollen pool, whose theoretical expectation, unlike that of the fixation index Φ_{ft}, is not explicitly dependent on *d*_{e}, but that is based on the same set of demographic and dispersal assumptions and is estimated from similar genotypic data. By removing the nuisance parameter *d*_{e}, we have reduced the dimensionality of the parameter space, which results in greater precision (lower variance) for the δ-estimates derived from Ψ(*z*) for all the sampling strategies and genetic assays considered, relative to those obtained from Φ_{ft} with joint estimation of *d*_{e}. Second, we have investigated, for the first time, how unequal male fecundity and asynchronous flowering, two key factors determining *d*_{e} < *d*, affect indirect estimates of the pollen dispersal curve. Our main findings are: (i) it is possible to obtain reliable estimates of δ from indirect methods, even under nonuniform male fecundity and variable flowering phenology; (ii) the new method (Kindist) yields better δ-estimates than TwoGener under a wide range of flowering scenarios; and (iii) TwoGener can be used to obtain approximate *d*_{e} estimates when *d*_{e} < *d*.

When either unequal male fecundity or asynchronous flowering are present, assuming *d*_{e} = *d* to extract δ from the ϕ_{ij}-values will yield severely biased estimates, which should be regarded as lower bounds for δ. If no independent information is available on *d*_{e} (*e.g.*, from demographic and flowering analyses), δ will be better estimated using either Kindist or TwoGener with joint estimation of *d*_{e}. By normalizing the observed correlation of paternity between female pairs, the simulations suggest that Kindist yields the only δ-estimator that is not biased by variation in male flowering intensity. This property is most useful, because nonuniform male fertility is ubiquitous in plant populations. For instance, Kang *et al*. (2003) report an average coefficient of variation in male fecundity of 81% (range 27–140%) across 12 natural stands of eight different tree species. The results of the current study suggest that indirect methods may provide reliable estimates of δ under this range of male fertility variation, especially the Kindist approach, which remains unbiased even under simulated coefficients of variation for male fertility of as much as 100–150%.

The impact of asynchronous flowering is more difficult to overcome, because it violates the assumption of temporal uniformity in *d*_{e} used to derive the expectations of both Φ_{ft} and Ψ(*z*), causing small to moderate negative biases in all the estimators. The bias is actually higher for the new estimator, based on Ψ(*z*). In practice, nonuniform male fertility and floral asynchrony will occur simultaneously. In this general situation, the simulations suggest that Ψ(*z*) provides the δ-estimator with the smallest variance and MSE, even if it may suffer somewhat higher bias in cases of strongly asynchronous flowering. The TwoGener method, although not as accurate in terms of δ, has the virtue of providing approximate estimates of *d*_{e} (under the combined action of unequal male fecundity and asynchronous flowering), which may have biological interest in its own right, since, for instance, it influences the rate of accumulation of beneficial and detrimental mutations in populations (see Oddou-Muratorio *et al*. 2005 for a more complete discussion on that issue).

Even though the biases of the δ-estimates caused by asynchronous flowering are difficult to deal with, our simulations suggest three reasons why we can expect to obtain reliable estimates of δ under floral asynchrony in many practical situations. First, for the biases of the estimators to become large (>10%), the mean flowering overlap among individuals must drop substantially ( ∼ 55%). Empirical studies suggest that flowering synchrony is often higher than this at the within-population level for many species [*e.g.*, 80% in *Pinus sylvestris* (Gömöry *et al*. 2003), mean 88% across three Eucalyptus species (Keatley *et al*. 2004), mean 74% across six Neotropical shrubs (Augspurger 1983), and mean 61% across seven Rubiaceae species (SanMartín-Gajardo and Morellato 2003)]. Second, the estimation bias caused by asynchronous flowering seems to be attenuated by the effect of unequal male fecundity (Tables 6 and 7). And third, variation in the timing of pollen shedding across males has a minor effect on the estimators if the female receptivity periods of the sampled mothers are synchronized. If it was feasible to perform the analyses on a sample of selected mothers with overlapping receptivity periods, we could anticipate that Ψ(*z*) would be virtually insensitive to the combined effect of unequal male fecundity and asynchronous flowering.

Irrespective of fecundity and phenological factors, high-resolution (EP > 0.99) genetic batteries and abundant replication (at least a total of 800 seeds) will be necessary to reduce the relative of any of the pairwise estimators of δ to acceptable (<20%) levels. For a fixed total number of seeds, accuracy will be improved more by increasing the number of mothers than by collecting more seeds per mother, as already shown by Austerlitz and Smouse (2002). Our results also indicate, however, that a minimum of 10 seeds/mother (for Kindist) or 20 seeds/mother (for TwoGener) should be analyzed to avoid inflated estimation errors (Table 2). The distribution of the sampled mothers should cover as many pairwise-distance classes as possible, from neighboring to long-distance pairs, the latter being essential for normalizing Ψ(*z*)-values. Especially intense sampling efforts will be required in cases of low pollen-pool genetic structure, *i.e*., large *N*_{ep}, translating into higher estimation errors. In any event, if no decrease in Ψ(*z*) (increase in pairwise Φ_{ft}) is detected with distance, application of any of these pairwise estimation procedures becomes meaningless.

An additional important result of this study is that, even with intense sampling effort, estimating the shape parameter (*i.e.*, the fatness of the tail) of a two-parameter pollen dispersal distribution is much more difficult than estimating the average distance of dispersal (see Austerlitz *et al*. 2004), even though the new procedure performs better in almost all cases. Indirect methods can provide no more than a hint of whether the dispersal curve is fat tailed or not. For instance, in our simulations, Kindist yielded estimated values of the shape parameter (*b*) that were always <1.0 when the true value was 0.5 (*vs.* 9 of 10 cases for TwoGener). When the true *b* = 2, the corresponding estimates obtained with Kindist were >1.0 in 8 of 10 cases (*vs.* all cases for TwoGener). It is likely that more accurate estimates of the shape parameter will require paternity-based methods (Burczyk *et al*. 2002; Oddou-Muratorio *et al*. 2005), requiring a substantial sampling effort to identify and map potential pollen donors. Our results also suggest, however, that even without *a priori* assumptions about the shape parameter of the dispersal function, the new Kindist approach may provide estimates of δ that have a low bias and are fairly robust to variation in male fecundity and floral asynchrony.

It must be noted that, although we have focused on the average distance of pollen dispersal (δ) in this study, the same methodology can be used to estimate other moments of the pollen dispersal distribution (Austerlitz *et al*. 2004). For instance, one might be interested in estimating the axial variance of pollen dispersal (), a parameter of evolutionary significance determining the process of genetic structuring under isolation by distance (Wright 1943; Rousset 1997). Assuming an exponential-power dispersal function with (*a*, *b*) parameters, we have = (1/2) · *a*^{2} · Γ(4/*b*)/Γ(2/*b*) (Austerlitz *et al*. 2004), and thus we can easily derive from the estimates of *a* and *b* obtained using Equation 4 or 6. For any given shape parameter *b*, there is a linear relationship between σ_{p} and δ, and the estimators of σ_{p} and δ will thus have identical relative bias and .

There are a variety of factors, other than restricted dispersal, determining the genetic structure of the effective pollen pool, whose combined effects may result in biased estimates of dispersal distances. We have considered flowering phenology and male fecundity variation, leaving aside factors such as nonrandom selfing, genetic structure among pollen donors, and genetic incompatibility systems. In natural plant populations, the rate of selfing (*s*) depends on a variety of biological and environmental determinants and may be poorly predicted by the probability of random pollination at distance zero. A possible solution is to obtain a maximum-likelihood estimate of *s* from the same data set used for the TwoGener or Kindist analyses (Ritland 2002), use this estimate to adjust the observed probabilities of paternal identity, and proceed to make subsequent dispersal estimations on the basis of outcrossed matings only (Burczyk and Koralewski 2005). There are also methods available to correct the expected probabilities of paternal identity within and among maternal sibships for the observed values of inbreeding and genetic structure among the adults, although they require an independent genetic survey of the latter (Austerlitz and Smouse 2001b). Finally, the presence of a genetic incompatibility system can be expected to reduce *d*_{e} proportionally to the fraction of compatible mating pairs (Hardy *et al*. 2004), although testing the precise impact on indirect estimates of dispersal will require future work.

Indirect estimation of real-time pollen flow is an appealing approach because of its sampling economy. It nevertheless poses some methodological challenges, because it relies on our ability to isolate the imprint of restricted dispersal on synthetic measures of pollen-pool genetic structure that are sensitive to a variety of environmental and biological factors, translating into *d*_{e} < *d*. Of course, the *d*_{e} parameter might itself be the focus of interest in particular situations, in which case the TwoGener model can provide an approximate estimate of *d*_{e} from mother–offspring genotypic data. When the primary interest is on the average pollen flow distance, however, an ideal method should provide dispersal estimates that are independent of the demographic parameter *d*_{e}. The new Kindist procedure represents a significant advance in this direction: it is not explicitly dependent on *d*_{e} and it yields a reliable estimate of the average distance of pollen dispersal under unequal male fecundity and asynchronous flowering.

## Acknowledgments

The authors thank Etienne K. Klein, Sylvie Oddou-Muratorio, and two anonymous reviewers for constructive comments on the manuscript. J.J.R.-A. was supported by a postdoctoral fellowship from the Spanish Secretaría de Estado de Educación y Universidades, financed in part by the European Social Fund. P.E.S. and J.J.R.-A. were supported by the National Science Foundation (NSF-DEB-0211430).

## Footnotes

Communicating editor: M. K. Uyenoyama

- Received October 5, 2005.
- Accepted March 20, 2006.

- Copyright © 2006 by the Genetics Society of America