Genetics, Vol. 149, 1099-1103, June 1998, Copyright © 1998

Properties of Maximum Likelihood Male Fertility Estimation in Plant Populations

M. T. Morgana
a Départment of Botany, Department of Genetics and Cell Biology, Washington State University, Pullman, Washington 99164-4238

Corresponding author: M. T. Morgan, Department of Botany, Department of Genetics and Cell Biology, Washington State University, Pullman, Washington 99164-4238, mmorgan{at}wsu.edu (E-mail).

Communicating editor: A. H. D. BROWN


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Computer simulations are used to evaluate maximum likelihood methods for inferring male fertility in plant populations. The maximum likelihood method can provide substantial power to characterize male fertilities at the population level. Results emphasize, however, the importance of adequate experimental design and evaluation of fertility estimates, as well as limitations to inference (e.g., about the variance in male fertility or the correlation between fertility and phenotypic trait value) that can be reasonably drawn.


ONE half of the nuclear genes in most plants pass through the male reproductive pathway, yet estimates of male fertility based on ecological observations such as dispersal distances of pollen analogues or observed pollinator movements can be "disappointingly crude" (SNOW and LEWIS 1993 Down, p. 332): any one of a large number of individuals capable of producing male gametes may potentially sire a particular offspring. This situation is attributable to unique features of plant biology, particularly the difficulty of reliably circumscribing the pool of potential fathers.

Genetic markers can assist male fertility estimation. The most powerful marker-based methods (DEVLIN et al. 1988 Down; ROEDER et al. 1989 Down; BROWN 1990 Down; ADAMS et al. 1992 Down) partition paternity among genetically possible fathers using a maximum likelihood argument (ROEDER et al. 1989 Down; SMOUSE and MEAGHER 1994 Down). Estimated fertilities may be used to evaluate specific hypotheses (e.g., that all males have equal fertility) and to describe patterns such as variation in male fertility (e.g., DEVLIN and ELLSTRAND 1990 Down; DEVLIN et al. 1992 Down; SMOUSE and MEAGHER 1994 Down) or the relationship between male trait value and fertility as a measure of selection (e.g., SCHOEN and STEWART 1986 Down; BROYLES and WYATT 1990 Down; CONNER et al. 1996 Down).

Here I use computer simulation to document statistical power of maximum likelihood methods and to identify conditions when reasonable insight into male fertility variation can be obtained. The focus is on allozyme data, where factors contributing to manageable experimental designs are well understood; speculation on possible results from highly variable markers is presented in DISCUSSION. Results indicate the importance of genetic exclusion probability ({epsilon}, see CHAKRABORTY et al. 1988 Down; DEVLIN et al. 1988 Down), number and size of maternal progeny arrays, and estimation of a limited number of fertilities. Future paternity studies require further mathematical analysis of maximum likelihood methods, or extensive computer simulation, to adequately evaluate the accuracy of inferences made.


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Maximum likelihood estimation:
SMOUSE and MEAGHER 1994 Down(following ROEDER et al. 1989 Down) develop a maximum likelihood estimator of male fertility for use in conjuction with electrophoretic or other genetic marker data. The problem is to estimate a vector {lambda} of male fertilities, using a matrix X of genetic data. Each element of the fertility vector {lambda}j corresponds to the fertility of the jth unique male genotype, while the matrix entry Xij is the probability of observing offspring genotype i given the genotypes of the maternal parent and the jth putative paternal parent (DEVLIN et al. 1988 Down; ROEDER et al. 1989 Down). The likelihood of a vector of male fertilities, given observed offspring genotypes, is

(1)

The goal is to identify the vector of male fertilities maximizing this likelihood.

A maximum of the likelihood can be found using the expectation maximization algorithm (ROEDER et al. 1989 Down, p. 373). One iteration of this algorithm transforms a value of male fertility {lambda}j to a value {lambda}'j using the formula

(2)

The product Xij{lambda}j in the numerator represents the expectation step, while the division and outer sum correspond to maximization. The algorithm used here starts with an initial vector of male fertilities {lambda} in which elements are equal and sum to one, {Sigma}j{lambda}j = 1. Iteration proceeds until the change in the log of the likelihood is less than 10-5 per iteration.

Simulation methodology:
Simulation was used to evaluate the statistical power of the estimation procedure and to evaluate inference about male fertility. Simulations centered around a "standard" parameter set. The standard set assumed a dioecious population of 25 male and 25 female parents, with 20 progeny assayed per maternal family. Genetic data in the standard set consist of eight loci, each with two equally frequent alleles (expected exclusion probability {epsilon} = 0.81; observed exclusions in simulations, e.g., in Figure 1, are less than this because of the finite number of paternal parents). This parameter set involves assaying a reasonable number of progeny for a combination of loci with exclusion probabilities toward the high end of that attainable with allozyme markers. Natural populations are likely to have more than 25 potential males, but the analyses presented below suggest that this realistic situation results in poor statistical properties. Loci are in Hardy-Weinberg and linkage equilibrium and are inherited in a Mendelian fashion. Parental genotypes are known without error. Expected male fertilities were chosen from a Gaussian distribution with mean equal to the number of progeny simulated and coefficient of variation equal to CVg; zero fertility was assigned when negative deviates were drawn. The actual fertility coefficient of variation CVm (i.e., variation in male fertility realized in a simulation) includes this source of variation and an additional multinomial component associated with sampling. Numbers of male and female parents, progeny array size, and number of loci were varied one at a time, with CVg ranging between zero and one (with CVg < 0.7, virtually all males sire some offspring, whereas for CVg = 1, the distribution of male fertilities is nearly Poisson and ~35% of males sire no offspring). Each parameter combination involved 500 replicates.



View larger version (28K):
In this window
In a new window
Download PPT slide
 
Figure 1. —Statistical power to reject the hypothesis of equal male fertility. Each panel shows the effect of one factor (number of loci with two equally frequent alleles, progeny array size, number of potential male parents, number of maternal progeny arrays) on power, when the Gaussian component of fertility variation, CVg, is altered. The heavy, solid line in each panel represents standard parameter values (25 male and female parents, 20 progeny per female, eight loci with two equally frequent alleles). Observed exclusion probabilities for the standard parameters, but with different numbers of loci, are shown as {epsilon} in the upper left panel.

Statistical power was evaluated using the likelihood ratio statistic suggested by ROEDER et al. 1989 Down. The test asks whether estimated male fertilities significantly improve the likelihood of the data when compared with the initial equal fertility vector. The test subtracts the log of the likelihood in Equation 1 calculated with the estimated fertilities from the log of the likelihood with equal fertilities, and is symbolized as {Delta} log L. For each statistical test, 500 data sets were simulated assuming equal male fertility, CVg = 0. The {Delta} log L values from these simulations represent the null distribution against which fertility distributions with CVg > 0 are to be compared. Statistical power for each scenario with CVg > 0 is determined as the proportion of {Delta} log L values more extreme (larger) than 95% of the values under the assumption of equal expected fertility.

Two measures were used to characterize estimated vs. actual fertilities. The first, m/CVm, compared the estimated to actual male fertility coefficient of variation (this is also the ratio of estimated and actual male fertility standard deviations because the mean estimated and actual male fertility is the same). The fertility coefficient of variation represents the opportunity for selection (CROW 1958 Down; ARNOLD and WADE 1984 Down, p. 710), and m/CVm provides an indication of whether this opportunity will be over- or underestimated in paternity analyses. The second measure, {rho}, is the correlation between estimated and actual fertilities. This correlation is important in analyses of selection attempting to correlate phenotypic trait value with a measure of fitness (LANDE 1976 Down; LANDE and ARNOLD 1983 Down) because {rho} determines the maximum possible correlation between trait and fitness (LI 1955 Down, p. 151). The variance of individual fertility estimates provides an important method of assessing accuracy (ROEDER et al. 1989 Down), but is not reported here because of its indirect relation to population fertility variation or selection analysis.


*  RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Simulation results in Figure 1 indicate that statistical power to reject the null hypothesis of equal male fertility can be high, provided that male fertility is not too uniformly distributed. Paternity analyses benefit from large progeny sizes, many maternal progeny arrays, many loci (high exclusion probabilities), and few paternal parents. The lower panels of Figure 1 suggest that the total number of progeny assayed is important because similar curves result when comparable total progeny are assayed (e.g., 10 progeny from 25 mothers = 250 total progeny vs. 20 progeny from 12 mothers = 240 total progeny).

Estimation of the male fertility variance may be biased, and there may not be a strong correlation between actual and estimated fertility (Table 1). These difficulties are particularly apparent when the actual variance is limited or when many male fertilities are estimated. Even in scenarios with 12 loci and, hence, extraordinary exclusion probability (expected {epsilon} = 0.92), the maximum likelihood method overestimates variance in male fertility by 1.5- to 2-fold. With eight loci and moderate exclusion probability (expected {epsilon} = 0.81), the correlation between actual and estimated fertility ranges from 0.25, when there are many males with limited fertility variation, to 0.65, when substantial fertility variation among relatively few males is estimated using many or large maternal families. With the exclusion probability offered by 12 loci, the correlation between actual and estimated fertility can rise to 0.83. When males have equal expected fertility, replicates with 50 females or 40 progeny per female show a slight decrease in performance of the estimators compared with standard parameter values involving fewer females or progeny. A similar pattern is observed when male fertility variation is summarized as a ratio of expected values, rather than as the expected value of ratios, so that the difference is not likely to result from uncertainty in the denominator of m/CVm. Instead, this result may reflect an underlying bias in the imperfectly estimated fertilities, reinforced by larger sample sizes.


 
View this table:
In this window
In a new window

 
Table 1. Characterization of male fertility with allozyme markers


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Maximum likelihood methods can detect significant male fertility variation when applied to appropriate data sets (ROEDER et al. 1989 Down). However, low statistical power (Figure 1), biased estimates of fertility variation, and low correlation between actual and estimated fertility (Table 1) occur with few loci, few maternal progeny arrays, few progeny per maternal family, or many potential fathers. The fertility coefficient of variation, and hence opportunity for selection, can be substantially overestimated, even with 12 loci and exclusion probability {epsilon} = 0.92. The correlation between estimated and actual fertility can reduce the correlation between trait value and relative fertility in a selection analysis by 50% or more (Table 1). These results suggest how experimental design can enhance statistical power, and they indicate limits to inference drawn from such experiments.

Experimental populations are well suited to inference of male fertility (DEVLIN and ELLSTRAND 1990 Down; DEVLIN et al. 1992 Down; KOHN and BARRETT 1992 Down; CONNER et al. 1996 Down), although some care must be taken in evaluating male fertility in natural populations. In experimental populations, the number of male fertilities requiring estimation can be small, and genotypes represented in the population can be chosen to ensure high exclusion probability. The most ambitious experimental study to date (CONNER et al. 1996 Down) involves 60 hermaphroditic plants, ~35 progeny per maternal parent, and exclusion probability between 0.85 and 0.89. Analysis by CONNER et al. shows that the coefficient of variation of estimated individual male fertilities in this study is small (<5%). The results in Table 1 suggest that even in this data set, male fertility variation will be moderately overestimated, and the ability to detect selection on reproductive traits will be diminished by the imperfect correlation between estimated and actual fertility. Nonetheless, there is reasonable promise for application of paternity estimation techniques in populations of 25 possible paternal parents with substantial fertility and allozyme variation present. Clearly excluded as candidates for fertility estimation in nature are populations with large numbers of males (including species with extensive gene flow), populations with limited or moderate allozyme variation, or species with small progeny array sizes.

Genetic information (exclusion probability {epsilon}) plays a prominent but not exclusive role in male fertility estimation. For instance, all parameter sets involving eight loci in Figure 1 have the same exclusion probability, yet statistical power varies from near zero to one, depending on other aspects of experimental design and the actual amount of fertility variation. The results of Table 1 similarly show the importance of factors other than exclusion probability in characterizing fertility variation. Even if exclusion were complete and fertility assigned without error, under the hypothesis of uniform expected male fertility, the error of individual fertility estimates follows a multinomial distribution with sampling variance inversely proportional to the total number of progeny surveyed (ROEDER et al. 1989 Down). Thus, the best strategy for increasing accuracy of fertility estimates may not be maximizing genetic exclusion (e.g., through use of hypervariable markers). Perhaps the most encouraging result is the benefit of increasing the number of progeny sampled for statistical power (either sampling more progeny per mother or more maternal parents, see Figure 1) because assaying additional progeny is the factor most easily manipulated by the investigator interested in natural populations. Admittedly, Table 1 shows that increasing progeny sampled may only modestly increase the precision of estimated male fertility parameters.

Modern molecular markers may substantially expand the applicability of paternity analyses, although available data sets only hint at appropriate parameters for further investigation. Simple sequence repeats (SSRs) are one promising genetic marker with abundant polymorphism and codominant expression. Although many SSR loci are found in rice (CHEN et al. 1997 Down) or maize (SMITH et al. 1997 Down), published studies of natural plant populations document SSR variants at relatively few loci. For instance, four polymorphic loci with effective number of alleles (HARTL and CLARK 1989 Down, p. 126) between 1.9 and 5.24 were found in Pithecellobium elegans (Mimosoideae; CHASE et al. 1996 Down), while a single locus with six alleles was identified in the tropical tree Gliricidia sepium (DAWSON et al. 1997 Down). Table 2 shows simulation results when highly polymorphic loci are assayed in 250 progeny (10 offspring from 25 maternal parents) with between 25 and 200 potential male parents and male fertility differences resulting entirely from sampling (i.e., CVg = 0). Variation similar to that reported from natural populations (e.g., four alleles at four loci) continues to provide biased estimates of male fertility variation and low correlation between actual and estimated fertility, even with only 25 potential male parents. A greater number of alleles per locus results in very favorable prospects for paternity analysis, but observation of many alleles per locus may be precluded by genetic drift in the small populations assumed here. Investing in development of additional loci offers very effective paternity analysis, even in moderate-sized populations.


 
View this table:
In this window
In a new window

 
Table 2. Characterization of male fertility with highly polymorphic markers

Computer simulation and resampling techniques may continue to play an important part in paternity studies. Preliminary analysis, using knowledge of marker variation, population structure, and proposed experimental design, might help to determine whether a full-scale study will be informative (ROEDER et al. 1989 Down) and to identify an appropriate sampling strategy (e.g., polymorphism such as that in Table 2 suggests few progeny per maternal parent compared with that in Table 1). Interpretation of hypothesis tests and inferences from a paternity study also requires investigation of statistical properties of the inference to determine the expected bias in estimates of male fertility variation or the expected correlation between estimated and actual fertility. Computer simulation also offers the opportunity to incorporate idiosyncrasies of the data set under investigation. For instance, using many marker loci increases the likelihood of linkage, parental genotypes may not be in Hardy-Weinberg proportions, and markers may violate Mendelian patterns of segregation.

Finally, the method of estimating paternity used here represents only one form of analysis. ADAMS and co-workers (ADAMS and BIRKES 1991 Down; ADAMS 1992 Down; BURCZYK et al. 1996 Down) use electrophoretic data to estimate the fraction of self-fertilizations, matings between neighboring individuals, and mating between individuals outside the local neighborhood. Matings between neighboring individuals are further estimated as a function of plant or population attributes (e.g., size of putative paternal parent, distance between maternal and putative paternal parent). This procedure has much to recommend it, because it restricts the pool of potential fathers (through estimation of neighborhood size) and directly estimates a small number of biologically interesting parameters (e.g., relationship between plant size and fertility) rather than relying on intermediary estimates of a large number of male fertilities. These methods were developed for seed orchards with relatively few maternal parents and well-defined populations, so their application to natural populations should be approached with caution.


*  ACKNOWLEDGMENTS

This research was supported by a Natural Sciences and Engineering Research Council of Canada postdoctoral fellowship. DANIEL SCHOEN, PETER SMOUSE, and anonymous reviewers provided many helpful comments on earlier versions.

Manuscript received November 18, 1997; Accepted for publication February 27, 1998.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

ADAMS, W. T., 1992  Gene dispersal within forest tree populations. New Forests 6:217-240.

ADAMS, W. T., and D. S. BIRKES, 1991 Estimating mating patterns in forest tree populations, pp. 157–172 in Biochemical Markers in the Population Genetics of Forest Trees, edited by S. FINESCHI, M. E. MALVOLTI, F. CANNATA and H. H. HATTEMER. SPB Academic Publishing, The Hague.

ADAMS, W. T., D. S. BIRKES and V. J. ERICKSON, 1992 Using genetic markers to measure gene flow and pollen dispersal in forest tree seed orchards, pp. 37–61 in Ecology and Evolution of Plant Reproduction, edited by R. WYATT. Chapman & Hall, New York.

ARNOLD, S. J. and M. J. WADE, 1984  On the measurement of natural and sexual selection: theory. Evolution 38:709-719.

BROWN, A. H. D., 1990 Genetic characterization of plant mating systems, pp. 145–162 in Plant Population Genetics, Breeding, and Genetic Resources, edited by A. H. D. BROWN, M. T. CLEGG, A. L. KAHLER and B. S. WEIR. Sinauer Associates, Sunderland, MA.

BROYLES, S. B. and R. WYATT, 1990  Paternity analysis in a natural population of Asclepias exaltata: multiple paternity, functional gender, and the `pollen-donation' hypothesis. Evolution 44:1454-1468.

BURCZYK, J., W. T. ADAMS, and J. Y. SHIMIZU, 1996  Mating patterns and pollen dispersal in a natural knobcone pine (Pinus attenuata Lemmon.) stand. Heredity 77:251-260.

CHAKRABORTY, R., P. E. SMOUSE, and T. R. MEAGHER, 1988  Parentage analysis with genetic markers in natural populations. I. The expected proportion of offspring with unambiguous paternity. Genetics 118:527-536[Abstract/Free Full Text].

CHASE, M., R. KESSELI, and K. BAWA, 1996  Microsatellite markers for population and conservation genetics. Am. J. Bot. 83:51-57.

CHEN, X., S. TEMNYKH, Y. XU, Y. G. CHO, and S. R. MCCOUCH, 1997  Development of a microsatellite framework map providing genome-wide coverage in rice (Oryza sativa L.). Theor. Appl. Genet. 95:553-567.

CONNER, J. K., S. RUSH, S. KERCHER, and P. JENNETTEN, 1996  Measurements of natural selection on floral traits in wild radish (Raphanus raphanistrum). 2. Selection through lifetime male and total fitness. Evolution 50:1137-1146.

CROW, J. F., 1958  Some possibilities for measuring selection intensities in man. Hum. Biol. 30:1-13[Medline].

DAWSON, I. K., R. WAUGH, A. J. SIMONS, and W. POWELL, 1997  Simple sequence repeats provide a direct estimate of pollen-mediated gene dispersal in the tropical tree Gliricidia sepium.. Mol. Ecol. 6:179-183.

DEVLIN, B. and N. C. ELLSTRAND, 1990  Male and female fertility variation in wild radish, a hermaphrodite. Am. Nat. 136:87-107.

DEVLIN, B., K. ROEDER, and N. C. ELLSTRAND, 1988  Fractional paternity assignment: theoretical development and comparison to other methods. Theor. Appl. Genet. 76:369-380.

DEVLIN, B., J. CLEGG, and N. C. ELLSTRAND, 1992  The effect of flower production on male reproductive success in wild radish populations. Evolution 46:1030-1042.

HARTL, D. L., and A. G. CLARK, 1989 Principles of Population Genetics. Sinauer Associates, Sunderland, MA.

KOHN, J. R. and S. C. H. BARRETT, 1992  Experimental studies on the functional significance of heterostyly. Evolution 46:43-55.

LANDE, R., 1976  Natural selection and random genetic drift in phenotypic evolution. Evolution 30:314-334.

LANDE, R. and S. J. ARNOLD, 1983  The measurement of selection on correlated characters. Evolution 36:1210-1226.

LI, C. C., 1955 Population Genetics. The University of Chicago Press, Chicago.

ROEDER, K., B. DEVLIN, and B. G. LINDSAY, 1989  Application of maximum likelihood methods to population genetic data for the estimation of individual fertilities. Biometrics 45:363-379.

SCHOEN, D. J. and S. C. STEWART, 1986  Variation in male reproductive investment and male reproductive success in white spruce. Evolution 40:1109-1120.

SMITH, J. S. C., E. C. L. CHIN, H. SHU, O. S. SMITH, and S. J. WALL et al., 1997  An evaluation of the utility of SSR loci as molecular markers in maize (Zea mays): comparisons with data from RFLPS and pedigree. Theor. Appl. Genet. 95:163-173.

SMOUSE, P. E. and T. R. MEAGHER, 1994  Genetic analysis of male reproductive contributions in Chamaelirium luteum (L.) Gray (Liliaceae). Genetics 136:313-322[Abstract].

SNOW, A. A. and P. O. LEWIS, 1993  Reproductive traits and male fertility in plants—empirical approaches. Annu. Rev. Ecol. Syst. 24:331-351.