Abstract
We propose a new method for approximate Bayesian statistical inference on the basis of summary statistics. The method is suited to complex problems that arise in population genetics, extending ideas developed in this setting by earlier authors. Properties of the posterior distribution of a parameter, such as its mean or density curve, are approximated without explicit likelihood calculations. This is achieved by fitting a locallinear regression of simulated parameter values on simulated summary statistics, and then substituting the observed summary statistics into the regression equation. The method combines many of the advantages of Bayesian statistical inference with the computational efficiency of methods based on summary statistics. A key advantage of the method is that the nuisance parameters are automatically integrated out in the simulation step, so that the large numbers of nuisance parameters that arise in population genetics problems can be handled without difficulty. Simulation results indicate computational and statistical efficiency that compares favorably with those of alternative methods previously proposed in the literature. We also compare the relative efficiency of inferences obtained using methods based on summary statistics with those obtained directly from the data using MCMC.
VALID and efficient statistical inferences are often difficult to achieve in population genetics problems, because data sets are large and complex, and because even the simplest models typically have many nuisance parameters, which often include the entire genealogical tree underlying the observations. Until recently, the only feasible approach to statistical inference proceeded by comparing summary statistics with their null distribution under a simplified model. This approach is statistically inefficient and inflexible, the results can be difficult to interpret, and quantitative model comparison is usually not possible.
In recent years, advances in methods of stochastic simulation have begun to permit likelihoodbased statistical inference in population genetics problems. In particular, the Bayesian paradigm has many advantages in this setting (Shoemakeret al. 1999). In addition to conveying statistical efficiency, Bayesian methods have advantages of interpretation since they provide probability distributions for the unknown(s) of interest, either singly or jointly. Perhaps most importantly, the Bayesian approach resolves, via integration, the theoretical problems caused by the presence of many nuisance parameters. In many scenarios, however, the large number of nuisance parameters means that computational problems continue to limit the practicality of Bayesian methods.
Tavaré et al. (1997) pioneered a rejectionsampling method for simulating an approximate posterior random sample. Any properties of the posterior distribution, such as 95% intervals, can then be approximated by corresponding properties of the sample. Under the method, a candidate value, ϕ′, for the parameter of interest, ϕ, is simulated from its prior distribution. Ideally, the next step would be to accept ϕ′ with probability proportional to its likelihood, otherwise ϕ′ is rejected. However, because the likelihood is difficult to compute, Tavaré et al. (1997) replaced the full data with a summary statistic S and accepted ϕ′ with probabilty proportional to P(S = sϕ′). This is implemented by accepting ϕ′ if and only if
Although a useful advance, the approach of Tavaré et al. (1997) is limited to relatively simple settings in which P(S = sϕ) can readily be computed and maximized over ϕ. Fu and Li (1997) opened the way to greater generality of this approach by, after simulation of ϕ′, replacing the computation of P(S = s ϕ′) with a further simulation step: They simulate a data set under their model and accept ϕ′ if the observed summary statistic s matches the simulated value s′. For Fu and Li (1997), the unknown of interest ϕ was the time since the most recent common ancestor of the sample, for which a standard coalescent prior distribution was assumed, and the statistic S was the maximum over haplotypes of the number of nucleotide differences.
This Monte Carlo likelihood approximation was extended by Weiss and von Haeseler (1998) to multiple summary statistics, some with nearcontinuous distributions, and multiple parameters. Instead of requiring an  exact match, they accept ϕ′ whenever s′  s ≤δ, for some appropriate metric · and tolerance δ, where s′ and s are vectors of summary statistics calculated at, respectively, simulated and observed data sets. Rather than simulate ϕ′ from a prior distribution, they employed a grid of ϕ values, which is equivalent to assuming a uniform prior.
Pritchard et al. (1999) adopted a rejectionsampling method similar to that of Weiss and von Haeseler (1998) but with simulation from a prior. Their investigation of mutation and demographic parameters, based on a sample of human Y chromosome data, is discussed further below. Similar methods have been adopted by Wall (2000), Tishkoff et al. (2001), and Estoup et al. (2002). These rejectionsampling methods combine the computational convenience of summary statistics with the advantages of the Bayesian paradigm. A key feature of the approach is that it can handle complex models with many nuisance parameters, provided only that simulation of data under the model is feasible. Moreover, the ratio of acceptances under two models approximates the Bayes factor, and hence quantitative model comparison is possible.
A crucial limitation of the rejectionsampling method is that only a small number of summary statistics can usually be handled. Otherwise, either acceptance rates become prohibitively low or the tolerance δ must be increased, which can distort the approximation, because the s′ are treated equally whenever s′  s ≤δ, irrespective of the precise value of s′  s. Here, we introduce two improvements to existing rejectionsampling methods, smooth weighting and regression adjustment, described further below. The key benefit is insensitivity of the approximation to δ. This insensitivity permits increasing the number of summary statistics, thus potentially increasing the information extracted from the data. An additional feature of our study is that it is the first to compare the inferences obtained using summary statistics with those obtained by fulldata Markov chain Monte Carlo (MCMC) methods. Given the potential that the summarystatistic methods have for substantially widening the access to and scope of Bayesian inference in population genetics, it is important to illustrate the relative efficiency of both methods.
METHODS
Rejectionbased approximate Bayesian inference: Initially we assume that there is a single parameter of interest, ϕ; the case of vectorvalued ϕ is discussed below. The basic rejectionsampling algorithm is: (1) choose a summary statistic S and calculate its value s for the observed data set; (2) choose a tolerance δ; (3) simulate ϕ′ from the prior distribution for ϕ; (4) simulate a genealogical tree under the chosen model, such as a coalescent model (see, e.g., Nordborg 2001); (5) simulate ancestral allelic types at the root of the tree, and then mutation events along the tree to generate a data set at the leaves; (6) compute s′, the value of S for the simulated data set; (7) if s′  s ≤δ, then accept ϕ′, otherwise reject; and (8) repeat steps 3 to 7 until k acceptances have been obtained.
Regression adjustment and weighting: Our new algorithm mimics the above up to and including step 6, except that Pritchard et al. (1999) used a rectangular acceptance region, whereas after appropriate scaling, for example, to equalize variances, we take · to be the Euclidean norm
We propose two innovations at step 7: smooth weighting and regression adjustment. In steps 16 we have simulated independent pairs (ϕ_{i}, s_{i}), i = 1, 2, · · ·, m, where each ϕ_{i} is an independent draw from the prior distribution for ϕ, and the s_{i} are simulated values of S with ϕ = ϕ_{i}. Under the Bayesian paradigm P(ϕS) = P(Sϕ)P(ϕ)/P(S) = P(S, ϕ)/P(S). The posterior distribution is a conditional density that could be estimated by first estimating the joint density P(S, ϕ) and dividing by an estimate of the marginal density P(S) evaluated at S = s. The (ϕ_{i}, s_{i}) are random draws from the joint density, and the rejection method is just one of many possible methods for estimating the conditional density when S = s. It is based on the idea that the ϕ_{i} for which s_{i}  s is small form an approximate posterior random sample. Our idea is to improve the approximation by (1) weighting the ϕ_{i} according to the value of s_{i}  s and (2) adjusting the ϕ_{i} using locallinear regression to weaken the effect of the discrepancy between s_{i} and s.
It is convenient to start with standard linear regression, but this is only to explain ideas: Our recommended method uses locallinear regression, described subsequently. Here, we assume that the conditional density that we are trying to estimate can be described by the following regression model for some intercept α and vector of regression coefficients β,
Locallinear regression: The linearity and additivity assumptions underpinning (2) will often be implausible, but may apply locally in the vicinity of s. To implement locallinear regression, we replace the minimization (3) with
The solution to (4) is
The regression approach can be extended to adjust multiple parameters simultaneously, using multivariate regression, in which case β is a matrix and α and ϕ_{i} are vectors. Examples of the approximation of joint posterior densities for pairs of parameters are given below.
Choice of tolerance, δ: For both rejection and regression methods we set δ to be a quantile, P_{δ}, of the empirical distribution function of the simulated s_{i}  s. For example, P_{δ} = 0.01 means that the 1% ofsimulated  s_{i} that are closest to s are assigned a nonzero weight. Choice of δ involves a biasvariance tradeoff: Increasing δ reduces variance thanks to a larger sample size for fitting the regression, but also increases bias arising from uncorrected departures from additivity and linearity.
In the limit of increasing δ, all s_{i} are accepted under the rejection method, and so posterior approximations approach prior values. For practical values of δ, the simulation results below suggest that there can be a notable “bias” of the posterior estimate toward the prior. The locallinear regression method approaches simple linear regression in this limit, and so the accuracy of the results for large δ depends on the adequacy of the linearity and additivity assumptions. In the limit as δ tends to zero, the regression and rejection methods are equivalent. Thus, the relative merits of the two methods hinge on their sensitivity to δ in the vicinity of δ= 0, which is explored via simulation studies below.
Posterior density estimation: The posterior density at a candidate value ϕ_{0} for ϕ can be approximated using kernel density estimation applied to the weighted sample,
As noted above, using a localconstant approach with the indicator kernel leads to the usual rejectionmethod estimate of the posterior density function:
SIMULATION STUDY
Equation 6 is the standard expression for the fitted value at the intercept in a weighted linear regression and can thus be estimated by any standard method. For the results presented in this article, we used the function lm(), either in the R statistical language (Ihaka and Gentleman 1996; http://cran.rproject.org) or for multivariate response variables in the commercial program Splus. The densityestimation program Locfit is also implemented in R.
Motivating data set and model: In this section we describe a number of simulationbased tests in which the relative performances of the rejection and regression methods are compared. These tests are centered around the models and data set analyzed by Pritchard et al. (1999). The data set consists of gene frequencies at eight loci on the Y chromosome, surveyed from 445 males taken from a number of different populations around the world, previously published by PerezLezaun et al. (1997) and Seielstad et al. (1998). Pritchard et al. (1999) considered a population growth model similar to that of Weiss and von Haeseler (1998) and Beaumont (1999), in which an ancestral population of constant size N_{A} chromosomes begins exponential growth t_{g} generations from the present time, giving a current population size of N_{0} = N_{A}exp(rt_{g}), where r is the population growth rate per generation. Pritchard et al. (1999) extracted three summary statistics from the data: (1) the mean (across loci) of the variance in repeat numbers; (2) the mean effective heterozygosity (i.e., the probability of two randomly drawn chromosomes differing at a particular locus, averaged across loci); and (3) the number of distinct haplotypes in the sample. They simulated data points under a coalescent model and applied the rejection algorithm described above, keeping only points that were within 10% of the observed values of each summary statistic. Data sets were simulated with a number of mutation models, and they analyzed both the combined data set of 445 chromosomes and the data from each population separately. In our comparisons, we consider only the singlestep mutation model (Ohta and Kimura 1973), with no limit on the allele sizes, and the combined data set.
The results from the regression and rejection methods are also compared with those obtained by the MCMC method of Wilson and Balding (1998), which has been expanded to include a model of population growth (Wilsonet al. 2003; the program BATWING is available at http://www.maths/abdn.ac.uk/~ijw). The growth model is the same as that described above, but with a number of different parameterizations, which are discussed below. Thus we can compare the results from approximate posterior distributions based on summary statistics with those from posterior distributions based on the full data.
Stable population model: The parameter of interest is θ= 2Nμ, where N is the number of chromosomes in the population and μ is the mutation rate. Setting θ = 10, we simulated 100 data sets of 445 chromosomes typed at eight completely linked loci. These data sets were then analyzed with both the regression and the rejection methods, using P_{δ} = 0.00125, 0.0025, 0.005, 0.01, 0.02, 0.04, 0.08, 0.16 and simulation sizes k = 2000, 10,000, and 50,000. For the fulldata estimation using MCMC, we ran BATWING for 2 × 10^{6} parameter updates (20 tree updates per parameter update), after a burnin of 10^{5} parameter updates, thinned every 200, to yield 10,000 points. For the regression and rejection methods we assumed rectangular priors on θ of (0, 50) and flat improper priors for the MCMC method. The two priors are comparable because the widths of the rectangular priors are large relative to the posterior density. In particular, although it is likely that the posterior distribution for θ is improper (i.e., has infinite area), to the degree of approximation inherent in MCMC, after burnin the simulated samples are so far from 50 that in practice there would be no difference in behavior whether a rectangular prior was used or not. The results are illustrated in Figure 1, which shows the relative mean square error (RMSE) of the estimates of θ for k = 50,000 plotted against the tolerance, P_{δ}, together with approximate standard errors estimated via a nonparametric bootstrap. The RMSE is calculated as
We have also estimated the RMSE for the heterozygositybased and variancebased estimators of θ (Kinget al. 2000, Equations 2, 3, 4, 5). For the variancebased estimator the RMSE is 0.46 ± 0.11 and for the heterozygositybased estimator it is 0.092 ± 0.016. Both estimators have means close to 10 and the cause of the large RMSE is the substantially higher variance compared to the other methods.
Growing population model: Although the model includes four parameters that might be considered estimable—i.e., r, t_{g}, N_{A}, and μ in the model of Pritchard et al. (1999)—only three parameters are identifiable in the likelihood function. Although more parameters can be estimated through the use of informative priors as discussed in Tavaré et al. (1997) and performed in Pritchard et al. (1999), for the RMSE analysis we restricted ourselves to a threeparameter model. This leads to a choice of parameterizations (e.g., Weiss and von Haeseler 1998; Beaumont 1999) and, for compatibility with the BATWING program, we used
Additional summary statistics: We investigated the mean across loci of the kurtosis in the allele length distribution and the variance of the variances in length among loci (Di Rienzoet al. 1998; Reichet al. 1999), the mean maximum allele frequency, a multivariate equivalent of kurtosis (i.e., based on the fourth power of the Euclidean distance from the centroid of the lengths measured at each locus for each chromosome), and the measure of linkage disequilibrium, Δ^{2} (see, e.g., Hudson 2001), averaged over all pairs of loci. The rationale for the latter stems from the observation by Slatkin (1994) that the degree of linkage disequilibrium is reduced in growing populations, even for completely linked loci. None of these statistics individually led to a marked improvement in RMSE. The latter two statistics appeared to lead to a small, and possibly statistically significant, reduction in the RMSE for ω, and the results from using five summary statistics (the original three plus the multivariate kurtosis and Δ^{2}) are illustrated as shaded lines in Figure 2. The effect of the extra summary statistics is to produce either no change or a small improvement in the RMSE for the regression method for all tolerances, but a substantial worsening for the rejection method (for all tolerances). With five summary statistics, as with three, the RMSE of the regression method begins to rise with small P_{δ}, as shown in Figure 2, but the lines do not cross. By contrast we found that with k = 2000 the RMSE for the regression method began to be notably larger than that for the rejection method at P_{δ} = 0.02. This is due to the increased variability in the regression estimates with increasing number of summary statistics (the “curse of dimensionality”).
For fixed simulation size k, increasing the number of summary statistics requires an increase in the tolerance δ. For the rejection method, the results of Figure 2 indicate that the bias thus introduced outweighs the benefits of the additional information. This is not so for our regression method, because of its insensitivity to δ, although the improvement is small. Inevitably, there will be a tendency for diminishing returns from each extra summary statistic. Overall, however, there seems to be room for additional improvement in the fit using regressionbased methods, such that accuracy close to that of fulldata methods may be obtained, but using orders of magnitude fewer computations. The MCMC simulations for the growing population model took 100 processor days on 700 Mhz Pentium 3 processors, whereas the summary statistic analysis took 4 hr for three parameters and 27 hr for five parameters. The latter increase in time does not reflect the consequences of scaling up with more summary statistics, but that the computation of Δ^{2} is very time consuming, whereas that for the other summary statistics is generally small compared to the time for generating samples. It should also be noted that in all the simulations described in this article the time spent performing the regression calculations is a negligible proportion of the total time to carry out the coalescent simulations—a few seconds in comparison to several hours. It is conceivable that with a very large number of summary statistics and a large number of points accepted within the tolerance limits, the time spent performing the regression calculations predominates, in which case the rejection method may have an advantage in that more points can be analyzed within a given time.
The performance of RMSE as a measure of accuracy depends on the priors chosen. If the prior mean for a parameter is the same as the value with which the simulations were carried out, then the rejection method will outperform both the MCMC and the regression methods as the tolerance becomes larger. This is because the data will cause the true posterior distribution to fluctuate away from the prior, whereas, in the case of the rejection method, if the tolerance is large all samples will be randomly taken from the prior and have the same mean as the prior and negligible variance in the mean. Similarly, because the posterior distribution estimated by the rejection method tends toward the prior when the tolerance is large, if a prior is chosen such that the mean of the posterior distribution tends to be on one side of the true value (for example, when the likelihood is very skewed) and the prior mean is on the other side of the true value, the RMSE for the rejection method can be seen first to decrease with increasing P_{δ} and then increase. A comparison of the mean posterior variances might avoid this problem. However, the RMSE is a useful summary because it combines both the effect of variability in the estimates and bias. Therefore we use RMSE to summarize the relative performance of the different methods and have chosen priors that avoid this problem.
Comparison of posterior densities: We estimated marginal posterior densities for θ, ω, and κ for one of the 100 data sets simulated under the growingpopulation model. The regression and rejection methods were applied with two different tolerances (P_{δ} = 0.00125 and P_{δ} = 0.16) and k = 500,000, while for the MCMC method densities were estimated using 10,000 BATWING outputs. The results are presented in Figure 3. It can be seen that there is a general tendency for the posterior density estimated by the rejection method to be closer to the prior than that estimated by the other methods, particularly with P_{δ} = 0.16. The densities for the two tolerances are generally different with the rejection method and very similar with the regression method. The posterior density from the MCMC method is generally sharper than those from the other methods and is more likely to be centered around the true value.
The true posterior distribution based on summary statistics need not be very similar to that of the full data, estimated by MCMC—i.e., it could potentially have a different location as well as a broader variance. In comparing the regression and rejectionbased posterior densities there is no independent “true” posterior density based on summary statistics with which to compare the results in Figure 3. However for P_{δ} = 0.00125 the posterior densities obtained by both the rejection and regression methods are very similar. If it is assumed that the regression method accelerates the rate of convergence to the “true” posterior distribution on the basis of the summary statistics, then similarity of the regression and rejectionbased distributions may indicate convergence, although there is always the possibility that both will continue to change slowly with decreasing P_{δ}.
To estimate joint posterior densities for pairs of parameters, bivariate locallinear regression adjustments were carried out using lm() in Splus, which can handle multivariate response variables. Figure 4 shows the 10, 50, and 90% highest posterior density (HPD) contours for pairs of parameters considered jointly. It can be seen that both summarystatisticbased methods tend to have wider joint posterior densities than the MCMCbased method, particularly for (θ, ω). However, the densities estimated by the regression method are much closer to those estimated by MCMC than those estimated by the rejection method.
ANALYSIS OF HUMAN Y CHROMOSOME DATA
Pritchard et al. (1999) estimated posterior densities for the four “natural” parameters of the growing population model: μ, r, t_{g}, and N_{A}. The BATWING program does not use this parameterization, and we use an alternative, similar parameterization described below for comparing posterior distributions from the rejection and regression methods with the fulldata likelihoods. However, despite not having the fulldata likelihood as a benchmark it is still useful to compare the performances of the regression and rejection methods with these four parameters, and the results are shown in Table 1. We present results for tolerance P_{δ} = 0.02 and P_{δ} = 0.16. The results for the rejection and regression methods are based on 100,000 simulations. We summarize the posterior distributions by the mean and 95% equaltail probability intervals, as in Pritchard et al. (1999). In addition, we repeated the results of Pritchard et al. (1999), using their definition of tolerance (10% of the observed value) with 10^{6} simulations, in which ∼1600 points were accepted.
The main pattern evident in Table 1 is that all the methods give broadly similar answers irrespective of the tolerance. This appears to be because there is very little information in the data on much of the parameter space. For example, the priors and posteriors for μ are almost identical, and therefore it is not surprising that even with P_{δ} = 0.16 in the rejection method, where the acceptance rate is 100fold greater than in the study by Pritchard et al. (1999), there is very little difference in the summaries of the posterior distributions. We are able to replicate the results of Pritchard et al. (1999) for their definition of tolerance, and generally there is similarity between their results and those from the rejection method using our definition of tolerance, at least for the narrower tolerances. As with the results above, there is a tendency for the estimates from the rejection method to move closer to the prior with increasing P_{δ}, whereas this effect is not so strong with the regression method. The results from the regression method with P_{δ} = 0.02 tend to be more different from those obtained using the method of Pritchard et al. (1999) than the results from the rejection method, and it is tempting to conclude that the regressionbased results are closer to the true p(θ s), despite the 12.5fold increased  with the method of intensity of sampling Pritchard et al. (1999).
To compare the results from a fourparameter model, similar to that used by Pritchard et al. (1999), using fulldata posterior distributions estimated with BATWING as a benchmark, we have used the following parameterization (and priors): μ (gamma: shape  10, scale 0.00008), N_{A} (lognormal: mean log 8.5, SD log 2), r (exponential: 0.005), and β= t_{g}/N_{A} (gamma: shape 2, scale 1). In 5 individuals we detected changes in microsatellite lengths that were fractions of the repeat length, and we therefore used only 440 individuals in the MCMC estimation. The new summary statistics calculated from this group were variance in allele length, 1.123; heterozygosity, 0.635; and number of distinct haplotypes, 312. For the MCMC simulation we ran 10^{7} parameter updates (40 tree updates per parameter update), after a burnin of 5 × 10^{5} parameter updates, thinned every 1000, to yield 10,000 points. The results of this analysis are shown in Figure 5 and reflect those in Table 1. The fulldata posterior distributions are generally very broad and similar to the priors, with the exception of values of N_{A}, r, and β close to 0, which are clearly rejected by the data. Not surprisingly, therefore, the rejection method generally appears to perform well. For N_{A} and r, where the prior is more different from the posterior, the usual pattern is observed whereby the distributions estimated by the rejection method move toward the prior for large P_{δ}, and this effect is weaker in the regression method. For μ and β, where the posteriors and priors are very similar, no such effect is observed, and in fact, for β the regressionbased density appears to change more with P_{δ} than the rejectionbased density. It should be borne in mind that although a minimum of 1000 independent points are used in the density estimates, there will still be some sampling error in the estimates (and also, of course, sampling error associated with the regression itself). A further point to note is that with β, where the posterior density is close to 0, the regression method gives some negative values, which have been truncated in Figure 5. This could be avoided by the use of transformations or a generalized linear model in the regression.
As with the results in Figure 3 the rejectionbased densities tend to be similar to the regressionbased densities for P_{δ} = 0.02, suggesting a degree of convergence. The density for r is the most different. However, tests with P_{δ} = 0.002 indicate that the rejectionbased density does indeed converge to that estimated by the regression method. The results suggest that the posterior distributions for r and β, given the summary statistics, are notably different from the fulldata posterior distributions.
CONCLUSIONS
There are two principal advantages of our approach over the rejection method: Simulated ϕ′ are assigned a weight that decreases with s′ s, and locallinear regression corrects for the difference between E[ϕS = s_{i}] and E[ϕS = s]. We have illustrated with examples, where we have the fulldata posterior distributions, that this innovation leads to substantially improved accuracy over earlier methods. We also illustrate the relative accuracy of MCMCbased and summarystatisticbased methods for inferring past population growth. It can be seen that the MCMCbased method is consistently superior to the summarystatisticbased methods and highlights that it is well worth making the effort to obtain fulldata inferences if possible. However, undoubtedly there are advantages to the use of summary statistics, both in the ease of implementation and in the time taken to obtain results, and it appears to be a viable initial approach for applying Bayesian methods to some population genetic problems. Because of the curse of dimensionality there are limitations to the number of summary statistics that can be handled with a reasonable number of simulations (otherwise the problem will approach that of MCMC in computational time). It remains to be seen which population genetic problems can be easily summarized by a small enough number of summary statistics for this approach to be competitive with MCMC. Further research is needed to find a more rigorous way for choosing summary statistics, including the use of orthogonalization and “projectionpursuit” methods. A problem that needs to be considered is the potential for the regression method to adjust the simulated points so that they fall outside the support of the prior. For example, if the posterior density is large at 0 negative values can be generated. This can be addressed by the use of transformations. In addition, improved regression methods, or other methods of conditionaldensity estimation, may overcome this problem and allow wider tolerances to be used, thereby decreasing the number of simulations that are needed and increasing the number of summary statistics that can be accommodated. Overall, however, the general approach presented here should allow for a greatly expanded use of approximate Bayesian methods in population genetic analysis.
Acknowledgments
We are grateful to Claire Calmet and two anonymous referees for their helpful comments on the manuscript. This work was supported by BBSRC/EPSRC grant E13397 awarded to M.A.B. and D.J.B.
Footnotes

Communicating editor: W. Stephan
 Received March 22, 2002.
 Accepted October 2, 2002.
 Copyright © 2002 by the Genetics Society of America