Abstract
The distance of pollen movement is an important determinant of the neighborhood area of plant populations. In earlier studies, we designed a method for estimating the distance of pollen dispersal, on the basis of the analysis of the differentiation among the pollen clouds of a sample of females, spaced across the landscape. The method was based solely on an estimate of the global level of differentiation among the pollen clouds of the total array of sampled females. Here, we develop novel estimators, on the basis of the divergence of pollen clouds for all pairs of females, assuming that an independent estimate of adult population density is available. A simulation study shows that the estimators are all slightly biased, but that most have enough precision to be useful, at least with adequate sample sizes. We show that one of the novel pairwise methods provides estimates that are slightly better than the best global estimate, especially when the markers used have low exclusion probability. The new method can also be generalized to the case where there is no prior information on the density of reproductive adults. In that case, we can jointly estimate the density itself and the pollen dispersal distance, given sufficient sample sizes. The bias of this last estimator is larger and the precision is lower than for those estimates based on independent estimates of density, but the estimate is of some interest, because a meaningful independent estimate of the density of reproducing individuals is difficult to obtain in most cases.
BOTH evolutionary and conservation biologists are interested in the distance of pollen movement because of its role in the establishment of neighborhoods and in connectivity among populations (for a review, see Sorket al. 1999). We have elsewhere proposed that one should use measurable differentiation among the inferred pollen clouds of widely spaced females as an assay of the distribution of pollination distances across existing landscapes (Smouseet al. 2001), using a model we refer to as TwoGener. We propose this model as an alternative to both the current practice of extracting an estimate indirectly from F_{ST} (assuming evolutionary equilibrium) and more direct (but laborious) estimates from parentage analysis, both of which have limitations (Sorket al. 1999). We have modeled the relationship between the intraclass correlation of maternal pollen pools, Φ_{ft}, as a function of average pollen dispersal distance (δ), the spatial density (d) of reproductive adults, and the average distance between females (z¯) (Austerlitz and Smouse 2001a). We have also (Austerlitz and Smouse 2001b) examined the impact of spatially organized genetic structure among the adults on Φ_{ft}. These two efforts have yielded the possibility of extracting an estimate of the average pollen dispersal distance on the basis of an estimate of Φ_{ft} that is derived from the inferred pollen pool divergence among the complete pairwise array of sampled females.
The TwoGener approach shares a great deal in common with the analysis of adult genetic diversity among populations. In fact, Φ_{ft} is directly analogous to Wright’s (1951) F_{st}. With the island model, it is sufficient to obtain a global estimate of F_{st}, from which we can extract an estimate of N_{e}m, assuming evolutionary equilibrium. However, for an “isolation by distance” model (Wright 1943, 1946; Malécot 1948, pp. 5463; Kimura and Weiss 1964), the differentiation observed between two populations, gauged by their pairwise F_{st} estimate, is expected to be an increasing function of the distance between them (Slatkin 1991, 1993). Thus, it is possible to obtain an estimate of both the average and variance of longterm (evolutionary time) dispersal rate, as a function of the distance between pairs of populations (Rousset 1997) or individuals (Rousset 2000).
Here, we develop a similar estimation procedure for the estimation of contemporaneous pollen flow distance. As we have shown in previous work, there is a direct relation between Φ_{ft} and the average physical distance between pairs of females (z¯), and we can use the same theory to relate a Φ_{ft}estimate for any particular pair of females to the physical distance between them. In this article, we present several estimators that use either the global value of Φ_{ft}, estimated over all sampled females in the population, or the pairwise Φ_{ft}estimates to gauge the mean pollen dispersal distance.
We first develop estimates that assume that the adult density is independently known in the population. Then, we develop an additional method that allows joint estimation of the density of reproducing adults and the dispersal distance. One might argue that since density can be measured independently in the field, there is no necessity to estimate density from the genetic data, but the real issue is whether we can reliably estimate the true pollination density by direct field observation.
Using computer simulation, we first perform a comparative study on those estimators that assume that adult density is independently known, which is designed to answer several questions. (i) What are the best estimators in terms of bias, standard deviation, and mean squared error; (ii) what is the best strategy for the allocation of experimental effort, in terms of the numbers of mothers and progeny per mother; and (iii) what is the sensitivity of the estimates to the exclusion probability of the set of markers used? We then explore joint estimation, using the same simulation approach. The main question is whether (and how much) we gain by inferring density from the genetic data.
METHODS
The estimators: Global estimators: Assume that we have a sample of females from a given population and a sample of offspring from each female, so that we can infer the pollen cloud sampled by each female, using methods provided in Smouse et al. (2001). Our estimators assume that the pollen dispersal distribution is a bivariate (isotropic) normal distribution, that is, the only distribution for which one can obtain analytic solutions (Austerlitz and Smouse 2001a). The model links the estimated differentiation among pollen clouds (Φ_{ft}), the average pairwise distance between females, and the dispersal parameter (σ). This bivariate normal distribution is defined as
The first estimator that can be designed uses the approximate relation between the global Φ_{ft} and σ,
However, if the average distance between mothers is <5δ, the relationship between global Φ_{ft} and σ requires refinement (Austerlitz and Smouse 2001a):
Pairwise estimators: Equation 6 can also be used for Φ_{ft}(z) between any pair of females, as a function of the distance (z) between them. Denote the pairwise Φ_{ft} observed between the ith and jth females by
Now, σ^{2} appears both on the lefthand side of this equation and in the denominator of the exponent on the righthand side, and it is not entirely obvious whether it would be better to estimate σ or 1/σ, so we have also evaluated a second pairwise estimator, defining β= 1/σ, and then estimating
Still another method consists of performing a nonlinear regression to estimate the best fit for σ, using a leastsquares criterion. Given any particular estimate of σ, we can predict each of the theoretically expected values ϕ_{ij}^{th} = Φ_{ft}(z_{ij}) values from the equation
Although the true value of Φ_{ft} is nonnegative,
For all the methods above, we have to impose an external estimate of d that is obtained from field measures of the density of reproducing individuals. We can also obtain such an estimate for d from the genetic data themselves, by optimizing (11) simultaneously for σ and d, providing a final set of estimates,
Simulations: We assessed bias, standard deviation, and square root of meansquared error
We drew the sample of n_{m} mother plants on a square grid with spacing 1.0 between consecutive points. Thus the grid was approximately of size
We used the bivariate normal pollen dispersal distribution for simulations, and we tested the impact of various design criteria, conducting 1000 simulation runs for each parameter set and computing the mean, standard deviation, and square root of the meansquared error for all estimators. Our reference setting included density d = 1, axial dispersal variance σ= 1, number of sampled mothers n_{m} = 20, number of other adults sampled to estimate allelic frequencies n_{s} = 30, number of offspring per mother n_{o} = 20. For the genetic markers, we chose a setting that would correspond to microsatellite markers: number of loci studied n_{L} = 5, number of alleles per locus n_{A} = 10. We then varied each parameter separately, to assess its impact on the various estimators.
RESULTS
Best estimators: A plot of the pairwise Φ_{ft}’s against the distance between the sampled mothers shows clearly that these pairwise values are dispersed around the theoretical curve as expected and that this dispersion decreases when more offspring per mother are sampled (Figure 2). Table 1 shows two of the estimators that are strongly biased, the global estimator (
Number of mothers vs. number of progeny: The better global estimator (
Increasing the number of sampled mothers (n_{m}) has a stronger impact on the estimators than increasing the number of offspring per mother, yielding a greater increase of precision and a greater reduction of bias (Table 2). Since the product N = n_{m}n_{o} is the size of the experimental effort invested in the study, it is generally better to increase the number of mothers, rather than the number of offspring per mother. Smouse et al. (2001) showed that for any fixed number of mothers, n_{m}, the optimal value of n_{o} is 1/Φ, suggesting a strategy of setting n_{o} ≈ 1/Φ_{ft} and increasing the number of mothers, n_{m}, as much as possible. Under these conditions,
Number of loci vs. number of alleles: The best allocation of laboratory effort is a matter of some concern, given the substantial cost of lab assay. One cannot design the loci to order, of course, but some loci are more polymorphic than others, and one can choose among those available. The MSE decreases with an increase in the exclusion probability (E). As we pointed out in Smouse et al. (2001), one needs enough exclusion probability from the assay battery to make the enterprise profitable, but beyond a certain level of exclusionary information (E > 0.99), greater genomic sampling is not very helpful: An asymptotic value is reached for MSE when E becomes very high (see Table 3). Given the cost realities of laboratory assay, the best strategy would seem to be five loci with 1020 alleles each. The pairwise estimates are considerably better than the global estimate for genetic batteries with low exclusion probability (for which case, the global estimate Φ_{ft} can be negative, which precludes estimation of
Joint estimation of σ and d: The estimate
Any estimate of σ that is based on an extraneous estimate of density, d, will be biased by error in that estimate of d. For instance, if parametric d and σ are both 1.0, a biased field estimate of d, say dˆ_{f} = 0.8, will yield an estimate of σ that has expectation (0.8)^{1/2} = 1.118, producing a bias of +0.118. If dˆ_{f} is 0.5, this bias will be of 0.414, much higher than the bias of
DISCUSSION
We describe an effective means for estimating the pollen distribution function, assuming a bivariate normal distribution. Provided that density is known independently, this study shows that it is possible to design estimators that are minimally biased and that have enough precision to provide a trustworthy estimator of the average distance of pollen dispersal. Concerning global estimates, we have shown that it is important to take into account the average pairwise distance between mothers, as a way of removing the potential bias that occurs when the mothers are sampled at distances that are too close. Among the pairwise estimates, the one that exhibits the lowest MSE is the nonlinear regression. The behavior of both global and pairwise estimates is satisfactory even with 20 mothers and 20 offspring per mother, but the best way to decrease meansquared error is to increase the number of sampled mothers, rather than the number of offspring per mother. Increasing exclusion probability (E), by increasing either the number of loci or choosing more polymorphic loci, also decreases MSE, but there is nothing much to be gained by increasing this exclusion probability above 0.99, and for many purposes, a genetic battery that yields E > 0.90 is quite adequate to the task.
This study was motivated by the idea that pairwise analysis could be used to extract better information from the genetic data. The results suggest that this is indeed the case, especially for lower values of E, for which pairwise estimates yield an appreciable reduction in MSE; for very high values of E, the reduction is minimal. The failure to achieve greater gains may be traceable to a feature that is common to all pairwise analyses. The availability of multiple measures provides the impression that they should increase resolution, but each of those measures is extracted from a much smaller sample. Moreover, the collection of all pairwise ϕ_{ij}values is far from being an independent set of estimates (ϕ_{jk} is not independent of ϕ_{ij} and ϕ_{ik}). In practice, the collection of pairwise information provides a modest improvement on a single global average, which—while it ignores the detail—has the virtue of being based on a substantially larger sample size; the pairwise strategy is better, but only mildly so. Since an iterative pairwise estimate is neither time nor laborintensive to obtain, particularly in view of the field and laboratory scope of such studies, it would always seem preferable to compute one.
In practice, estimating adult reproductive density is a serious problem. For example, not all adults reproduce during a given year, and phenology is variable even within a year, so that not all parents can mate at the same time. There is also variation in male fecundity within a population, as a function of age and size differences, genetic differences, and microenvironmental factors. All of these factors contribute to a reduction in the effective density of reproducing individuals. In some cases, it can be nearly impossible to count all adult plants belonging to a given species, in particular in tropical forests. Finally, and especially in the case of forest trees, populations can cover very wide areas, and density can be nonhomogeneous across the landscape, making any estimation of the effective density approximate and subject to uncertainty. Simply using the number of individuals above a given size class can yield a serious overestimate of d and a serious underestimate of σ. The estimator dˆ_{1} can be contrasted with the more usual stem count; any discrepancy becomes an indicator of the extent to which various forms of heterogeneity have impacted on “effective male reproductive density,” d_{e}, which is likely to be less than the actual stem count. We must keep in mind, however, that very large data sets are going to be required for reliable joint estimation: large numbers of adults and offspring per adult, along with a highresolution (exclusion probability) genetic battery. The increasing availability of numerous highly polymorphic loci at reasonable cost, however, provides some hope that we can apply the method effectively in real situations.
This study shows that, at least for bivariate normal pollen dispersion, the relationship between pollen dispersal distance and Φ_{ft} can be used to extract a useful estimate of the decay parameter, σ. The fact that we can establish at least a numerical relation between Φ_{ft} and dispersal distance for any dispersal function (Austerlitz and Smouse 2001a) suggests that this estimation procedure can be extended to a wider array of dispersal functions. Since we can also account for genetic structure among adults in the population (Austerlitz and Smouse 2001b), we will also be able to design estimates that take that information into account. All such extensions will also require extensive testing. These are matters that we will leave for future work.
This approach is essential since a good estimation of contemporary gene flow is essential to understand the evolutionary processes that occur at the scale of a landscape (Sorket al. 1999). It is the only way to infer the consequence of various processes, which are often recent and man induced: fragmentation, loss of pollinators, and extinction of local populations. Thus, only reliable inference of this instantaneous gene flow will yield the possibility of predicting future changes for many species. TwoGener estimation should be useful in that context, because it allows us to gauge pollen flow, without typing all potential fathers in the population and even without having an estimate of adult density. Thus, it makes the method less labor intensive than alternative methods such as paternity analysis, and it can be carried out over a broader stretch of landscape and during several years.
Acknowledgments
The authors thank A. J. Irwin, A. Kremer, V. L. Sork, and S. OddouMuratorio for helpful commentary on the manuscript. P.E.S. was supported by the New Jersey Agricultural Experiment Station (USDA/NJAES17106), McIntireStennis grant USDA/NJAES17309, as well as by National Science Foundation grant BSR0089238. Most simulations were performed on the UNIX machines of the Institut National de la Recherche Agronomique (Bordeaux and JouyenJosas, France).
Footnotes

Communicating editor: O. Savolainen
 Received November 13, 2001.
 Accepted February 20, 2002.
 Copyright © 2002 by the Genetics Society of America