## Abstract

The distribution of fitness effects (DFE) has considerable importance in population genetics. To date, estimates of the DFE come from studies using a small number of individuals. Thus, estimates of the proportion of moderately to strongly deleterious new mutations may be unreliable because such variants are unlikely to be segregating in the data. Additionally, the true functional form of the DFE is unknown, and estimates of the DFE differ significantly between studies. Here we present a flexible and computationally tractable method, called Fit∂a∂i, to estimate the DFE of new mutations using the site frequency spectrum from a large number of individuals. We apply our approach to the frequency spectrum of 1300 Europeans from the Exome Sequencing Project ESP6400 data set, 1298 Danes from the LuCamp data set, and 432 Europeans from the 1000 Genomes Project to estimate the DFE of deleterious nonsynonymous mutations. We infer significantly fewer (0.38–0.84 fold) strongly deleterious mutations with selection coefficient |*s*| > 0.01 and more (1.24–1.43 fold) weakly deleterious mutations with selection coefficient |*s*| < 0.001 compared to previous estimates. Furthermore, a DFE that is a mixture distribution of a point mass at neutrality plus a gamma distribution fits better than a gamma distribution in two of the three data sets. Our results suggest that nearly neutral forces play a larger role in human evolution than previously thought.

A fundamental concept in population genetics is the distribution of fitness effects (DFE) of new mutations. The DFE refers to the proportion of new mutations that have particular effects on fitness. The DFE is a crucial quantity in evolutionary genetics because it determines how selection affects genetic variation (Eyre-Walker and Keightley 2007), the conditions under which recombination could evolve (Keightley and Otto 2006), and the spectrum of mutations potentially involved in genetic diseases (Eyre-Walker 2010). Further, an accurate DFE is required for robust inference of the amount of adaptive evolution across taxa (Boyko *et al.* 2008; Gossmann *et al.* 2012; Castellano *et al.* 2016; Galtier 2016); a topic of widespread interest. Because this distribution is so important, considerable effort has been put forth toward estimating it in several species.

In organisms that allow experimental manipulation, the DFE can be directly estimated. Here, the DFE is either derived from direct measurements of fitness from a collection of single-step mutants, or indirectly inferred from observed changes in population fitness in mutation accumulation (MA) experiments (Eyre-Walker and Keightley 2007; Bataillon and Bailey 2014). The first approach, in combination with high-throughput methods, has been successfully applied to examine the full spectrum of (even weak) selection coefficients of mutations in small mutational target regions in a number of viral, bacterial, and yeast systems (Fowler *et al.* 2010; Hietpas *et al.* 2011; Boucher *et al.* 2014). They frequently report a gamma or unimodal, similarly shaped distribution of fitness effects (Bataillon and Bailey 2014), or a bimodal distribution with a second cluster of highly deleterious mutations (Acevedo *et al.* 2014; Bank *et al.* 2014; Boucher *et al.* 2014). The second approach infers the DFE from fitness trajectories of a collection of populations over time in MA experiments, without directly identifying the mutations involved. Assuming that the true DFE is gamma distributed, they estimate the parameters of a gamma DFE that best fit to the observed changes in the mean and variance of fitness among replicate populations (Halligan and Keightley 2009). These studies point to a shape of the DFE that is less leptokurtic than an exponential distribution, with the mode different from zero. This could indicate that the true underlying DFE is more complex than the gamma distribution (Halligan and Keightley 2009), or reflect a bias of MA-based methods toward mutations with large fitness effects (Eyre-Walker and Keightley 2007). In summary, experimental estimates of the DFE suggest that a substantial proportion of new mutations are strongly deleterious. However, due to the inherent limitations of these methods, inference of the exact functional form of the genome-wide DFE is challenging.

A second category of methods to infer the DFE involves examining patterns of neutral and putatively deleterious genetic variation in natural populations, and finding the model of demographic history and purifying selection that can match the observed level of variation. This framework has been applied to many species including humans (Eyre-Walker *et al.* 2006; Keightley and Eyre-Walker 2007; Boyko *et al.* 2008; Li *et al.* 2010), *Drosophila* (Keightley and Eyre-Walker 2007; Kousathanas and Keightley 2013), yeast (Koufopanou *et al.* 2015), orangutans (Ma *et al.* 2013), gorillas (McManus *et al.* 2015), and mice (Halligan *et al.* 2013). Many of these studies suggest that the DFE has a strongly leptokurtic distribution, conflicting with the MA-based estimates. Consistent with the bimodal DFE found by many site-directed mutagenesis studies (Bataillon and Bailey 2014; Boucher *et al.* 2014), they find a large proportion of nearly neutral mutations, as well as many strongly deleterious mutations. For example, previous studies in humans (Eyre-Walker *et al.* 2006; Boyko *et al.* 2008) have estimated the parameters of a gamma distribution for the DFE of new nonsynonymous mutations. These studies have found ∼57–61% of new nonsynonymous mutations to be moderately to strongly deleterious (|*s*| ≥ 10^{−3}), about 15–16% to be weakly deleterious (10^{−4} ≤ |*s*| < 10^{−3}), and the remainder (24–28%) to be nearly neutral (Figure 1).

The estimates of the DFE from genetic variation data for humans by Eyre-Walker *et al.* (2006) and Boyko *et al.* (2008) have been widely used in human population genetic studies. For example, these DFEs were used to estimate differences in the genetic load across human populations (Henn *et al.* 2016), to model the ancient introgression of Neanderthal alleles into humans (Harris and Nielsen 2016), as a model for the frequency spectrum of deleterious polymorphisms in simulating data for disease studies (Uricchio *et al.* 2016), to evaluate the contribution of background selection to diversity on the *Y* chromosome in humans (Wilson Sayres *et al.* 2014), and to estimate the strength of selection acting on disease genes (Moon and Akey 2016). While the Boyko *et al.* (2008) study has had considerable impact in the field, it is important to appreciate that the estimates of the DFE were made using a sample of a small number of individuals. As such, most of the variation segregating in those samples is likely to be neutral or nearly neutral. Inferences about the proportion of moderately and strongly deleterious mutations largely come from assuming the gamma distribution approximates the DFE of new mutations well, and then tabulating the proportion in those categories predicted by the gamma distribution. In other words, the second mode of strongly deleterious and lethal mutations observed by experimental studies is unlikely to be directly observed in polymorphism data sets, and these proportions are extrapolated from the long tail of the DFE.

This extrapolation of the proportion of strongly deleterious mutations may not be accurate. A more recent study using exome sequencing data from 200 Danish individuals (Li *et al.* 2010) estimated a DFE that differs considerably from that inferred in Boyko *et al.* (2008) or from the experimental estimates in lower organisms. Specifically, Li *et al.* (2010) found a mixture distribution consisting of a neutral point mass and gamma distribution fit best to their data (Figure 1). Additionally, they estimated that only 1% of new mutations have |*s*| > 10^{−4} (compared to 57% in Boyko *et al.* 2008), and 78% of new mutations fall in the 10^{−4} ≤ |*s*| < 10^{−3} range (compared to 15% in Boyko *et al.* 2008). Li *et al.* (2010) attributed this difference in the DFEs to their study considering a larger sample of individuals. As such, they surmised that they sampled more weakly deleterious variants, allowing more accurate inferences. However, this explanation has not been tested using simulations or larger data sets. Thus, the proportions of moderately *vs.* strongly deleterious nonsynonymous mutations in humans, as well as the functional form of the DFE, remain elusive.

Due to large-scale genome and exome sequencing projects, samples of hundreds to thousands of individuals are available (Tennessen *et al.* 2012; Fu *et al.* 2013; Lohmueller *et al.* 2013; The 1000 Genomes Project Consortium 2015). These large data sets should yield more reliable inferences of the DFE because moderately deleterious polymorphisms should be segregating, albeit at low frequency, in these samples (Supplemental Material, Figure S1 in File S1). As such, it should be possible to determine the functional form of the DFE and directly estimate the proportion of moderately and strongly deleterious mutations.

However, a major roadblock to using these new data sets for inference of the DFE is a lack of suitable software for inference from large samples. Generally, methods to infer the DFE summarize the allele frequency information of two classes of sites, one assumed to be neutral and the other selected by the site frequency spectrum (SFS). Then, they find the DFE that, under the inferred model of demography fit to the SFS from neutral sites, fits the observed SFS from selected sites. The method of Keightley and Eyre-Walker (2007), DFE-alpha, models demography using a Wright–Fisher transition matrix. It can only consider demographic models with one or two size changes due to computational complexity, and it can be slow for the two-size-change model. This is particularly limiting in large samples of human genetic variation since a single-size-change demographic model is insufficient for capturing the excess of rare variation in human populations (Keightley and Eyre-Walker 2007; Kousathanas and Keightley 2013). Another class of methods to infer the DFE uses the Poisson random field (PRF) approach (Sawyer and Hartl 1992; Hartl *et al.* 1994; Williamson *et al.* 2005; Eyre-Walker *et al.* 2006; Boyko *et al.* 2008; Li *et al.* 2010). This approach has been implemented in the program PRFREQ (Boyko *et al.* 2008), but that implementation becomes numerically unstable when applied to samples larger than a few hundred individuals. The program ∂a∂i (Gutenkunst *et al.* 2009) uses a similar framework, but implementing a DFE is slow due to the way that the DFE is repeatedly integrated (Figure S2 in File S1). Thus, there is a need for a new software tool to infer the DFE from large samples.

In this study, we first extend the program ∂a∂i to analyze arbitrary DFEs in a computationally efficient manner. We implement these features in a module for ∂a∂i, which we call Fit∂a∂i. We then use this approach to estimate the DFE of deleterious, nonsynonymous mutations from multiple large human data sets. We consider several different functional forms for the DFE. We find that across the multiple data sets, a mixture distribution where a proportion of mutations are neutral and the remainder are gamma distributed fits best. Analysis of multiple data sets suggests there are fewer strongly deleterious mutations where |*s*| > 10^{−2} (0.38–0.84 fold) than previously estimated in Boyko *et al.* (2008) (35%), regardless of the functional form of the DFE. Further, our results are not consistent with a model where 99% of new mutations have a selection coefficient weaker than 10^{−3}, as suggested by Li *et al.* (2010). Because we anticipate that our estimates of the DFE will be useful in subsequent simulation studies, we provide SFS_CODE (Hernandez 2008) and SLiM (Messer 2013) commands to simulate data where mutations have fitness effects from these DFEs.

## Materials and Methods

### Fit∂a∂i: software to infer the DFE

Here we present our new software, Fit∂a∂i, to infer distributions of selection coefficients of new mutations under the PRF model using the SFS. Fit∂a∂i is a module that extends the functionality of the Python package, ∂a∂i (Gutenkunst *et al.* 2009). Specifically, ∂a∂i uses diffusion theory to compute the expected SFS for a set of demographic parameters and selection coefficients. Fit∂a∂i offers a substantial computational improvement over the existing implementation of ∂a∂i for models involving more than a single selection coefficient. To do this, Fit∂a∂i computes SFSs for a range of selection coefficients and saves each SFS into an array. Then, subsequent integrations of the DFE are done using the array of precomputed SFSs. This process results in a substantial improvement in computational speed compared to the existing implementation of ∂a∂i, which recomputes the SFS for each selection coefficient in each step of the optimization process (Figure S2 in File S1). Fit∂a∂i also allows parallel computation of the SFS by using multiple cores or a cluster. Importantly, Fit∂a∂i leverages the modular nature of ∂a∂i to allow the user to define any arbitrary demographic model and DFE, including DFEs that are complex mixture distributions. Lastly, we incorporated an optimization routine that allows for constrained optimization of complex mixture distributions. Below we describe our inference procedure in greater detail, starting with inference of demography, followed by the details of Fit∂a∂i. We then discuss a simulation study to assess its performance, both under the assumptions of the PRF model as well as when some are violated, and then the data sets that we use to infer the DFE in humans.

### Inference using the SFS

We inferred demography and selection from segregating sites in a maximum likelihood framework (Williamson *et al.* 2005, Boyko *et al.* 2008, Gutenkunst *et al.* 2009). Because both demography and selection affect patterns of deleterious mutations, our inference of the DFE begins (as done in Williamson *et al.* 2005 and Boyko *et al.* 2008) by first estimating a demographic model for putatively neutral, synonymous sites. Then, conditional on the demographic parameter estimates, we infer the parameters for the DFE of nonsynonymous mutations.

To do this, we summarized synonymous and nonsynonymous sites with the SFS. The SFS can be described as a vector, *X***=** **[***X*_{1}, *X*_{2}, …, *X _{n}*

_{−1}

**]**, in which each entry

*X*describes the number of SNPs at frequency

_{i}*i*in a data set of size

*n*chromosomes. In the PRF framework, each entry in the SFS is assumed to be comprised of independent sites (Sawyer and Hartl 1992; Hartl

*et al.*1994).

Additionally, we folded the SFS to avoid difficulties with misidentification of the ancestral state (Hernandez *et al.* 2007). This form of the SFS counts the number of SNPs of minor allele frequencies (MAFs) 1 to *n*/2 without taking the ancestral state into account. The folded SFS has been shown to perform well at inferring the DFE of deleterious mutations (Keightley and Eyre-Walker 2007; Boyko *et al.* 2008; Tataru *et al.* 2016).

### Inference of demography

A demographic model, the parameters of which are denoted as Θ_{D}, was fit to the SFS of synonymous sites with ∂a∂i (Gutenkunst *et al.* 2009). Here, ∂a∂i uses a diffusion approximation to compute the distribution of allele frequencies given some demographic model, *f*(*x*; Θ_{D}). Then, the expected number of SNPs at frequency *i* in a sample of size *n* chromosomes can be written as:(1)The multinomial likelihood, computed with the folded SFS, is maximized to estimate the demographic parameters:(2)where denotes the observed count of SNPs at frequency *i* in the folded SFS. The multinomial likelihood uses the proportions of SNPs at a particular frequency in the sample rather than the counts from the model. Therefore, it does not require an *a priori* assumption about the mutation rate or ancestral population size. The mutation rate of synonymous sites, denoted *θ*_{S}, was then computed as the scaling factor difference between the optimized SFS and the data.

When fitting models incorporating periods of rapid exponential growth with ∂a∂i, we set the program parameter *dadi.Inference.set_timescale_factor=*10^{−}^{6}, in contrast to the default setting of 10^{−}^{3}. In ∂a∂i, periods of exponential growth are approximated with a series of instantaneous size changes and, if the time steps are not small enough, parameters related to exponential growth will not be inferred correctly. This causes the demographic model to inaccurately predict the expected numbers of rare variants, biasing downstream inference of selection.

### Inference of selection

To infer the DFE, we developed the Fit∂a∂i module, which uses ∂a∂i and some of the methodological improvements of Ragsdale *et al.* (2016). First, we condition on the demographic model that was fit to synonymous sites using the procedure described above. Given that demography, ∂a∂i is used to compute a distribution of allele frequencies *f*(*x*; Θ_{D}, *γ*), where Θ_{D} is a vector containing the demographic parameters inferred from synonymous sites and *γ* is a single population-scaled selection coefficient. Specifically, *γ* = 2*N*_{A}*s*, where *N*_{A} is the ancestral population size, but it is rescaled in each time period of the demographic model by the fold size change relative to the ancestral population. A DFE, denoted *g*(*γ*), can be incorporated by generating *f*(*x*; Θ_{D}, *γ*) for a range of *γ*, then weighting the contribution of each of these frequency spectra by *g*(*γ*) (Boyko *et al.* 2008):(3)In the standard implementation of ∂a∂i, this process is time consuming because the SFS must be computed repeatedly during each step of optimization. In other words, *f*(*x*; Θ_{D}, *γ*) is computed each time a given value of *γ* is evaluated in a DFE. This process can be especially slow for large ranges of *γ* and for large sample sizes. Therefore, similar to Ragsdale *et al.* (2016), we initially computed the SFS for a range of selection coefficients, and then cached these results to avoid recomputing the frequency spectra (Figure S2 in File S1). In addition, we computed many frequency spectra in parallel to save time, added compatibility for user-defined DFEs, modified the optimization routines available in ∂a∂i to work with cached spectra, and added the option to use constrained optimization for the inference of complex mixture distributions. These extensions are part of the Fit∂a∂i module.

To infer selection, we fixed the demographic parameters to the maximum-likelihood estimates (MLEs) inferred from synonymous sites, Then, we fit a DFE, the parameters of which are denoted as Θ_{DFE}, to the folded SFS of nonsynonymous sites by maximizing the Poisson likelihood:(4)Unlike the multinomial likelihood, the Poisson likelihood requires an *a priori* assumption about the mutation rate for nonsynonymous sites, *θ*_{NS}. To obtain this, we multiplied our estimate of *θ*_{S} by an assumed ratio of the nonsynonymous to synonymous mutation rate, *θ*_{NS}/*θ*_{S,} to obtain *θ*_{NS}. Specifically, we assumed the ratio to be *θ*_{NS}/*θ*_{S} = 2.31 (Huber *et al.* 2016), but also estimated the DFE using *θ*_{NS}/*θ*_{S} = 2.5 to provide a fair comparison to Boyko *et al.* (2008).

Each DFE is defined as an integrable function over a log-spaced range of 600 selection coefficients over intervals between |*s*| = [10^{−8}, 0.5]. We considered any portion of the DFE smaller than |*s*| = 10^{−8} to be effectively neutral (|*s*| = 0), and any variants of |*s*| > 0.5 to have negligible probability of being found in polymorphism data (*i.e.*, not found in the data). Note that here we only consider the deleterious DFE but this function can easily be extended to incorporate positive selection (Huber *et al.* 2016).

Fit∂a∂i includes many of the standard DFEs (Boyko *et al.* 2008; Kousathanas and Keightley 2013), such as a gamma distribution and several mixture distributions. Specifically, we investigated mixture distributions where a proportion of mutations are neutral; with the rest following a gamma distribution as well as a mixture distribution where a fraction is neutral, a fraction is lethal, and the remainder follows an exponential distribution of fitness effects. Lastly, Fit∂a∂i includes arbitrary mixture distributions with a fixed number of fitness classes, or bins, where each bin can have its own range of selection coefficients (called the “discrete DFE”). Fit∂a∂i infers the proportion of new mutations in each fitness class. For mixture distributions incorporating a point mass at neutrality or lethality, we define the DFE so it can be treated as a single integrable function. We add the area of the point mass to a part of the distribution that is assumed to be neutral or lethal. For example, to add a point mass of neutral mutations to the “neutral+gamma DFE,” we add the probability mass of neutral mutations, *p*_{neu}, to the |*s*| = [0, 10^{−8}] portion of the distribution. Then, we integrate the gamma DFE between |*s*| = [0, 10^{−8}] and sum it with *p*_{neu} to obtain the total mass of neutral mutations. Additionally, we used the SLSQP algorithm (Kraft 1988) as implemented in SciPy 0.17.1 to perform constrained optimization for mixture distributions incorporating more than two components. Throughout, we will use α and β to denote the shape and scale parameters of the gamma distribution, respectively, and λ to denote the rate parameter of the exponential distribution. The DFEs we report will be scaled by the ancestral population size. To estimate confidence intervals (C.I.’s) for our data, we Poisson resampled the nonsynonymous SFS and refit the DFE to the resampled data (Boyko *et al.* 2008). We note these C.I.’s are likely too narrow because they assume independence between all sites and do not account for the uncertainty in the demographic inference.

### Simulations

To assess the performance of Fit∂a∂i, we performed forward-in-time simulations under different models of selection and demography. Simulations of independent sites were done using the program PReFerSIM (Ortega-Del Vecchyo *et al.* 2016), which simulates unlinked SNPs under the PRF model. We simulated synonymous sites separately with a population-scaled mutation rate of *θ*_{S} = 4000 to approximately match the amount of synonymous genetic diversity in our data sets. We simulated nonsynonymous sites at a ratio of 2.5 nonsynonymous to 1 synonymous site, in other words using *L*_{NS}/*L*_{S} = 2.5. These simulations included sample sizes of *n* = 24 and *n* = 2596 chromosomes using a demographic model of constant size, a twofold instantaneous size change, and the demography inferred from the LuCamp data. We considered a variety of DFEs, which are described in more detail in specific instances below.

Because the PRF model makes several restrictive assumptions that are likely to not apply to real data sets, we performed an additional set of forward simulations violating these assumptions, and assessed the effect on inferences using Fit∂a∂i. Specifically, we investigated the effects that unmodeled linkage, background selection, and population structure might have on our inference of the DFE. To do this, we simulated 100–150 Mb regions using the recombination rate and arrangement of functional elements on human chromosome *1* (Huber *et al.* 2016) using the forward simulation program SLiM (Messer 2013). We assumed a gamma DFE for both nonsynonymous (α = 0.2, β = 200) and conserved noncoding sites (α = 0.0415, β = 50; see Torgerson *et al.* 2009). We assume that 400 generations ago, the ancestral population expanded eightfold and split into eight genetically isolated populations. This population size increase reflects the Neolithic expansion into Europe under the demic diffusion model (Chikhi *et al.* 2002; Gazave *et al.* 2014). We then sampled 100 chromosomes equally across the eight populations, combined them together, and analyzed them as though they were from a single population. The ancestral population was simulated for a burn-in period of 10*N* generations. To avoid prohibitively slow forward simulations, we simulated with an ancestral effective population size of 1000 and scaled mutation rate, recombination rate, selection coefficients, and demographic parameters accordingly (Aberer and Stamatakis 2013). We then fit a single population demographic model (which is the incorrect model) to the synonymous SNPs in each simulated data set using ∂a∂i. Then, conditional on these demographic parameters, we inferred the DFE using Fit∂a∂i. Our goal with these simulations is to mimic what researchers do in practice; where they do not know the true demographic model, but try to fit a simplified model to the data.

### Data

We downloaded SNP data for 432 unrelated European (EUR) individuals from the 1000 Genomes Project phase 3 release (The 1000 Genomes Project Consortium 2015); 6503 individuals from the National Heart, Lung, and Blood Institute ESP6500SI-V2 European American (EUR) data set (Tennessen *et al.* 2012; Fu *et al.* 2013); and 2000 Danish individuals from the LuCamp project (Lohmueller *et al.* 2013). The 1000 Genomes phase 3 data were downloaded from the European Bioinformatics Institute file transfer protocol site http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/. Related individuals were removed by sampling only mothers and fathers from trios or extended families. Only SNPs from the phase 3, exome-targeted sequencing which passed the strict mask filter were used. The total length of sites considered in the analysis, *L*_{S} + *L*_{NS}, was computed by taking this filtering into account. Variants were annotated using the 1000 Genomes Project-filtered annotations. The Exome Sequencing Project (ESP) ESP6400 data set was downloaded from the Exome Variant Server (http://evs.gs.washington.edu/EVS/). Only sites with 1800 or more European individuals sequenced, according to the site-by-site annotations, were used for the analysis. The LuCamp data were obtained from Lohmueller *et al.* (2013). For computational tractability, a hypergeometric distribution was used to project the LuCamp and ESP data sets down to sample sizes of 1298 and 1300 diploids (Gutenkunst *et al.* 2009), respectively, after filtering problematic individuals. All 432 unrelated European individuals from the 1000 Genomes Project were used. From these data, we assembled the folded SFS of synonymous and nonsynonymous sites, which were used for subsequent inference. To examine the effect of a smaller sample size on inference of demography and selection, we also projected the data down to a sample size of 24 chromosomes.

### Estimating *s* from *2Ns*

The DFEs inferred using the approach described above were for the population-scaled selection coefficient, scaled by twice the ancestral population size (*γ* = 2*N*_{A}*s*). Because we were interested in the distribution of *s*, we needed to estimate *N*_{A}. We computed *N*_{A} from the value of *θ*_{S} inferred from synonymous sites (Table S1 in File S1) using the equation *θ*_{S} = 4*N*_{A}*μL*_{S}. Detailed information about these parameters used for our analysis can be found in Table S2 in File S1. However, this value of *N*_{A} depends on assumptions about the per-base-pair mutation rate and the ratio of possible nonsynonymous to synonymous sites, *L*_{NS}/*L*_{S}, since these quantities are computed from the total number of coding sites, *L*_{S} + *L*_{NS}. We assumed the mutation rate to be *μ* = 1.5 × 10^{−8} to reflect estimates of the mutation rate in the human exome (Ségurel *et al.* 2014). For comparison to results from Boyko *et al.* (2008), we assumed the mutation rate to be *μ* = 1.8 × 10^{−8}. For the reanalysis of the Boyko *et al.* (2008) data set, we assumed the same ancestral population size, *N* = 7778 diploids, instead of computing it. To compute the total number of coding sites, *L*_{S} *+* *L*_{NS}, in each data set, we intersected the coding exons from the University of California Santa Cruz canonical transcript with the relevant filters for each data set. For the 1000 Genomes data, we intersected the phase 3 strict mask, the exome targets, and the hg19 coding exons. For the analysis of the ESP data set, we used the intersection of the hg19 coding exons and the site-by-site annotations to count the total number of coding sites for which *n* ≥ 2600 alleles had been sequenced. For the LuCamp data, we obtained the value of *L*_{S} + *L*_{NS} from Lohmueller *et al.* (2013).

### Data availability

This research uses previously published data sets obtained as previously described. The Fit∂a∂i software is available at https://github.com/LohmuellerLab/fitdadi.

## Results

### Validation of Fit∂a∂i by comparison to previous analyses

We first examined the performance of Fit∂a∂i by fitting a demographic model and DFE to the African-American SFS from Boyko *et al.* (2008). Fit∂a∂i produces similar estimates of the shape and scale parameters of the gamma distribution compared to Boyko *et al.* (2008) (Boyko: α = 0.184, β = 2488; Fit∂a∂i: α = 0.179, β = 3161). Additionally, Fit∂a∂i produced similar estimates of the proportions of mutations in different bins of the DFE (Table S3 in File S1).

### Performance on simulated data

We further investigated the performance of Fit∂a∂i by performing forward simulations under the PRF model (see *Materials and Methods*). We first considered the best-fit DFE of Boyko *et al.* (2008), rescaled to have an ancestral population size of *N* = 10,085 diploids (α = 0.184, β = 3226). Fit∂a∂i is able to accurately infer the DFE from our simulated data sets (Table 1). Predictably, the variance of our estimates of the most deleterious portion of the DFE (|*s*| > 10^{−2}) is five- to sixfold greater for the small (*n* = 24) samples. However, for the samples of size *n* = 2596, the variance in the estimates of this portion of the DFE is significantly reduced and overall estimates of the proportions of the DFE where |*s*| > 10^{−3} are accurate. Therefore, this sample size should allow us to accurately infer the most deleterious portions of the DFE.

Because it is not certain that the DFE is truly gamma distributed, we simulated data sets of 2596 chromosomes with other DFEs. Again, we scaled these DFEs to an ancestral population size of 10,085 diploids. We considered the mixture distribution of Li *et al.* (2010), which consists of 20% neutral and 80% gamma-distributed (α = 4, β = 2.148) selection coefficients (the neutral+gamma DFE). We also considered a mixture distribution consisting of five bins, (the discrete DFE) with breaks at |*s*| = [0, 10^{−5}, 10^{−4}, 10^{−3}, 10^{−2}, 1]. Within each bin, the values of *s* were uniformly distributed. We examined three weighting schemes for this distribution. First, we computed the probability mass in each bin from the mixture distribution of Li *et al.* (2010). Then, we computed the probability mass in each bin from a gamma distribution with parameters α = 0.203 and β = 1082.1, but where the mass in the |*s*| = [10^{−2}, 1] bin was placed into the |*s*| = [10^{−3}, 10^{−2}) bin, and the opposite case where the mass in the |*s*| = [10^{−3}, 10^{−2}) bin was placed into the |*s*| = [10^{−2}, 1] bin. This was done to evaluate whether we could distinguish between these discrete DFEs and to evaluate our ability to distinguish strongly deleterious mutations from moderately deleterious mutations in a large sample.

We find that if the true underlying DFE is distributed according to the Li *et al.* (2010) neutral+gamma distribution, the discrete DFE is able to estimate the true DFE, albeit with some limitations (Figure 2, A and B). For example, when the true DFE follows Li *et al.* (2010), our inference under the discrete DFE correctly estimates a negligible fraction of moderately or strongly deleterious new mutations (|*s*| > 10^{−3}), and correctly infers a mode of weakly deleterious mutations (10^{−4} ≤ |*s*| < 10^{−3}). However, estimates of the proportion of nearly neutral and neutral mutations (|*s*| < 10^{−4}) are less accurate (Figure 2A). When we simulate with the discretized distribution of Li *et al.* (2010), our estimates of the proportions of the discrete DFE are unbiased (Figure 2B). Additionally, we can distinguish between DFEs with varying proportions of moderately and strongly deleterious mutations (Figure 2, C and D). Although it is unlikely that the DFE of any natural population is discretized in this manner, these results show that the discrete DFE can help to approximate the general form of the underlying DFE, even if the true DFE is multimodal. This mimics what would be done in practice, where the precise functional form of the DFE is not known *a priori*. Therefore, fitting the discrete DFE should provide a general idea of the true DFE, especially if the true DFE is significantly multimodal. Notably, the discrete distribution can distinguish between strongly and moderately deleterious mutations at our sample size of 2596 chromosomes.

### Simulations with linkage and population structure

The procedure of first inferring demography from the synonymous SFS and then selection from the nonsynonymous SFS provides unbiased estimates of selection, even in the presence of linkage (Boyko *et al.* 2008; Messer and Petrov 2013; Comeron 2014). In other words, this methodology controls for the effects of selection at linked sites. However, it is unclear what effect population structure might have on inference of the DFE. It is well known that such cryptic structure affects the SFS and may bias demographic inference (Ptak and Przeworski 2002; Gazave *et al.* 2014). Further, large human resequencing data sets may contain cryptic population structure (Novembre and Ramachandran 2011). For example, the 1000 Genomes European sample is comprised of five different subpopulations. To examine the performance of Fit∂a∂i when applied to data sets where the assumptions of the PRF model and a single, unstructured population are violated, we performed forward simulations including background selection and population structure (see *Materials and Methods*). We fit a single population, single-size-change demographic model to synonymous sites; and then, conditional on the size-change demographic model, a gamma DFE to nonsynonymous sites for each simulation replicate. Even when using the incorrect demographic model, we accurately infer selection from simulated data in the presence of linkage and population structure (Figure 3). Importantly, the single-size-change demographic model provides a reasonable approximation to the SFS when there are both population expansions and structure (Figure 3A). This in turn allows for the accurate estimation of both the shape and scale parameters of the gamma DFE (Figure 3B).

Therefore, simulations and a comparison to existing empirical data suggest that Fit∂a∂i can reliably infer the DFE in the presence of complex demographic scenarios. Below we present additional simulation scenarios to examine the performance of Fit∂a∂i with varying sample sizes and when the assumed demography and DFE are misspecified.

### Demographic inference

We begin by fitting a demographic model to the synonymous SFS of each of the three data sets (LuCamp, ESP, and 1000 Genomes) using ∂a∂i. Briefly, this demographic model incorporates an out-of-Africa bottleneck, a recovery period, and recent exponential population growth (Figures S3 and S4 in File S1). Our estimates of demography as well as the inferred population sizes are presented in Tables S1 and S2 in File S1. Predictably, the parameter describing the magnitude of recent population expansion is harder to infer when using a sample size of 24 chromosomes than when using the larger sample sizes (*n* = 2596 chromosomes). Although the demographic model we infer is biased by linked selection, this step controls for these effects when inferring selection (Boyko *et al.* 2008; Kousathanas and Keightley 2013; Messer and Petrov 2013; Huber *et al.* 2016).

### Inference of the DFE from large data sets

Here we estimate the DFE for new nonsynonymous mutations using large samples. First, like previous studies, we fit a gamma distribution to the DFE (Table S4 in File S1). We infer a strongly leptokurtic distribution where there are many neutral and nearly neutral mutations (*i.e.*, 34–37% of new mutations have |*s*| < 10^{−4}), as well as a class of strongly deleterious mutations (*i.e.*, 15–22% of new mutations have |*s*| > 10^{−2}). Interestingly, the estimates from the three different data sets are generally concordant, though the 95% C.I.’s sometimes do not overlap. While this may suggest that the differences cannot be attributable to limited amounts of data in the SFS, we caution that these C.I.’s are likely too narrow because they do not account for the nonindependence of SNPs or the uncertainty of demographic estimates.

When compared directly to Boyko *et al.* (2008), the best-fit gamma DFEs inferred from all three data sets are generally shifted toward neutrality (Table 2 and Tables S4 and S5 in File S1), even when matching the mutation rates to those of Boyko *et al.* (2008) (*μ* = 1.8 × 10^{−8} and *L*_{NS}/*L*_{S} = 2.5). We infer 19.2–22.9% of new mutations have a selection coefficient |*s*| < 10^{−5}, compared to the 18.3% observed by Boyko *et al.* (2008). This corresponds to a 1.05- to 1.25-fold increase. Additionally, we infer 24.5–29.8% of new mutations are strongly deleterious (|*s*| > 10^{−2}), which corresponds to a 0.69- to 0.84-fold decrease of the 35.5% inferred by Boyko *et al.* (2008). Taken together, when assuming a gamma distribution for the DFE, all three data sets suggest fewer strongly deleterious mutations than seen in Boyko *et al.* (2008).

Next, we explored the fit of complex DFEs to these large samples. Using the same combination of mutation rates as with the gamma, we fit the neutral+gamma mixture distribution; a mixture distribution of a point mass of neutral, a point mass of lethal, and exponentially distributed new mutations (the “neutral+exp+lethal” DFE); and the discrete DFE described previously. The MLEs, as well as the proportion of mutations with varying selection coefficients predicted by these distributions, are depicted in Table 2, Table 3, and Table S4 in File S1.

When we assume *μ* = 1.5 × 10^{−8} and *L*_{NS}/*L*_{S} = 2.31, the neutral+gamma DFE fit best to the LuCamp and ESP data sets as reflected by the highest log-likelihood and Akaike information criterion (AIC) score (Table 3 and Table S4 in File S1). The gamma still fit best to the 1000 Genomes data set. Compared to the gamma DFEs inferred previously for two data sets, our best-fitting DFEs predict slightly fewer (0.92–0.98 fold) new mutations with |*s*| > 10^{−2}, and slightly more (1.06–1.18 fold) new mutations of |*s*| < 10^{−5}. When we matched the mutation rates of Boyko *et al.* (2008) with *μ* = 1.8 × 10^{−8} and *L*_{NS}/*L*_{S} = 2.5, the discrete DFE fit best to the LuCamp data set (Table 3 and Table S4 in File S1). However, the gamma DFE continued to fit best to the 1000 Genomes and ESP data sets under these assumptions. The best fitting DFEs are depicted in Figure 4 and Figure S5 in File S1, and a comparison of the model to the SFS of the data are shown in Figure S6 in File S1. When using the larger mutation rate, we find the discrete DFE to fit best to the LuCamp data set, which predicts significantly fewer (0.54 fold) new mutations of |*s*| < 10^{−2} than the gamma DFE fit using the same mutation rate.

One concern is that biases in SNP calling may affect these inferences. One way to test for this is by masking the singletons in the analysis, since singletons may be enriched for false SNPs due to sequencing errors (Boyko *et al.* 2008). We fit the gamma and neutral+gamma DFEs while masking the singleton category in the SFS and find little difference in the inferred DFEs (Table S6 in File S1). This finding suggests our inferences are robust to potential errors in SNP calling in these data sets.

The DFEs we have inferred thus far differ from that inferred in Boyko *et al.* (2008). In that study, 35.5% of new nonsynonymous mutations were inferred to be strongly deleterious in African-Americans, and 37.9% in Europeans. We infer fewer new strongly deleterious nonsynonymous mutations, even when matching the mutation rates used in Boyko *et al.* (2008) (Figure 4 and Figure S5 in File S1). Furthermore, the distribution of 2*Ns* also shows fewer strongly deleterious mutations (27.1–36.9% of mutations with 2*Ns* > 100) than seen in Boyko *et al.* (2008) (40.4% of mutations with 2*Ns* > 100; Figure S5 in File S1). Our results remain consistent across data sets and assumed forms of the DFE.

Additionally, our estimates of the DFE differ substantially from the estimates provided by Li *et al.* (2010). Specifically, Li *et al.* (2010) infer almost no strongly or moderately deleterious new nonsynonymous mutations. That is, 1% of new nonsynonymous mutations have selection coefficients of 10^{−3} < |*s*| < 10^{−2} and 0% have a selection coefficient |*s*| > 10^{−2} (Figure 1). All of our estimates infer that at least ∼30% of new nonsynonymous mutations have a selection coefficient |*s*| > 10^{−3}, even when the assumed mutation rate is small (Figure 4 and Table 3). Our simulations suggest if the true underlying DFE follows that suggested by Li *et al.* (2010), we should be able to estimate those proportions using both the neutral+gamma and the discrete DFE (Figure 2, A and B). The fact that our inferences did not show similar estimates to those inferred by Li *et al.* (2010) suggests that our data and analyses are not consistent with the distribution inferred by Li *et al.* (2010) (Table S5 in File S1). In the following sections, we explore several reasons why the different studies infer different DFEs.

### Assessing the role of sample size and model misspecification using simulations

One possibility for the distinct estimates of the DFE is that the three studies used different sample sizes. Larger samples will have more moderately and strongly deleterious variants segregating than will smaller data sets (Figure S1 in File S1). To investigate the effect of sample size on our ability to infer the DFE, we simulated 200 data sets, without linkage, of sample sizes *n* = 12, 24, 100, 250, and 500 chromosomes. Each data set was simulated using the demographic model and gamma DFE inferred from the LuCamp data set.

First, we simulated neutral synonymous sites and inferred the demographic parameters from each data set. This was done in two ways. First, we estimated the parameters from the full demographic model that was used to generate the data (herein the “full” model). Second, we inferred the parameters in a demographic model with only three instantaneous size changes (herein the “three-epoch” model). This is meant to mimic the situation in Boyko *et al.* (2008), where the true demography of the European population was likely complex, but simpler three-epoch demographic models could accurately fit the synonymous SFS. Next, as done in our inference and in previous studies, we estimated the parameters of a gamma distribution for the DFE of nonsynonymous mutations, conditioning separately upon the two demographic models.

When the full demographic model was fit to the simulated data sets, we found the variance of our estimates, both of demography and selection, decreased as sample size increased (Figure S7 in File S1). We were unable to correctly infer the magnitude of recent population growth with small sample sizes, consistent with previous work (Keinan and Clark 2012; Nelson *et al.* 2012; Tennessen *et al.* 2012; Fu *et al.* 2013). However, this did not affect the inference of selection as long as the demographic model fit reasonably well to synonymous sites (Figure S7 in File S1). At small sample sizes, the three-epoch model could fit the synonymous SFS well and thus estimates of selection were also unbiased. However, we found that for sample sizes >100 chromosomes, the three-epoch model increasingly became unable to account for the excess of rare variants caused by recent growth. The inability to account for the rare variants in the sample then biased the estimates of both the shape and scale parameters of the gamma distribution. However, this effect seems to be negligible at a sample size of 24 chromosomes (Figure S7 in File S1).

As long as the demographic model fits the observed SFS of synonymous sites, small sample sizes can estimate the parameters in a gamma distributed DFE, even when the demographic model is not the correct one. The accuracy of the estimates increases with sample size, especially for the scale parameter, and notably provides better estimates of the strongly deleterious portion of the DFE (Figure S7B in File S1 and Table 1). Thus, the results of Boyko *et al.* (2008) are unlikely to be affected by misspecification of demography due to small sample size.

Another possibility for the varying estimates of the DFE is that the DFE itself may be misspecified. Although parametric distributions are convenient for approximating the DFE, the true form of the DFE is unknown. Additionally, we have shown that the neutral+gamma DFE and the discrete DFE can fit large data sets better than the gamma DFE. To investigate an example of what would happen if the DFE is misspecified, we simulated 100 data sets without linkage for the best-fit neutral+gamma DFE inferred from the LuCamp data set, scaled to an ancestral population size of 10,085 diploids (*p*_{neu} = 0.164, α = 0.338, β = 358.8). We also downsampled each data set to *n* = 24 chromosomes. Then, we fit a gamma and neutral+gamma DFE to each full and downsampled data set.

When the true DFE is neutral+gamma distributed, inference of the DFE from small samples overestimates the proportion of strongly deleterious mutations (Figure 5). When the DFE is correctly specified, we obtain unbiased estimates of the DFE even from small samples. However, at a sample size of *n* = 24 chromosomes, both the gamma and neutral+gamma distributions have a similar log-likelihood (Figure 5A). This was also observed in Boyko *et al.* (2008). Then, the extra parameter in the neutral+gamma distribution penalizes the true DFE when choosing the best-fit DFE by AIC score. This leads one to choose the gamma distribution as the best-fit DFE to the small sample, even when the true DFE follows a neutral+gamma distribution. Fitting the gamma distribution yields a DFE with more new mutations with |*s*| > 10^{−2} and a decrease in the proportion of new mutations with |*s*| < 10^{−5} compared to the true DFE (Figure 5B).

### Assessing the role of sample size using real data

Next, we investigated the role that sample size has on inference of the DFE from real data. To do this, we projected our synonymous and nonsynonymous frequency spectra down to a sample size of *n* = 24 chromosomes to match the sample size of Boyko *et al.* (2008), then fit a demographic model and DFEs as previously described. Here we used the mutation rate assumptions *μ* = 1.5 × 10^{−8} and *L*_{NS}/*L*_{S} = 2.31, but also matched the mutation rate of Boyko *et al.* (2008) (*μ* = 1.8 × 10^{−8} and *L*_{NS}/*L*_{S} = 2.5). Then, we fit the gamma, neutral+gamma, and discrete DFEs—which were the best-fitting distributions to the full data—to the downsampled data sets.

As predicted by our simulations, there is generally little difference in the fit of the different DFEs to the downsampled data sets in terms of log-likelihood (Table S5 in File S1). The neutral+gamma and discrete DFEs often fit better than the gamma, but the difference in log-likelihood is small (0.1–0.6 log-likelihood units). Thus, the gamma DFE is selected as the best-fit DFE for all downsampled data sets by AIC. These results mimic what was observed in our simulations, although the pattern is not wholly consistent across data sets and mutation rates. When we assume *μ* = 1.8 × 10^{−5} and *L*_{NS}/*L*_{S} = 2.5, the gamma DFE fits best to both the full and downsampled 1000 Genomes and ESP data sets (Figure 4 and Table S5 in File S1). There also appears to be little difference between the gamma DFE fit to the full and downsampled data. By contrast, the discrete DFE fits best to the LuCamp data under these mutation rates. Additionally, the neutral+gamma fit best to the full ESP and LuCamp data when we assume *μ* = 1.5 × 10^{−5} and *L*_{NS}/*L*_{S} = 2.31. The gamma DFE fit to the downsampled versions of these data sets predicts more strongly deleterious (|*s*| > 10^{−2}) and more nearly neutral (|*s*| < 10^{−5}) new mutations (Figure 4 and Table S5 in File S1). The DFE fit to the 1000 Genomes data using the lower mutation rates does not follow this pattern. The gamma DFE fits best to both versions of the data set, yet the estimates from the small data set still predict more strongly deleterious new mutations. These results seem to corroborate the results from our simulations. That is, fitting a DFE using a small sample may result in misspecification of the DFE, which, in turn, may cause inaccuracies in the inferred proportions of the DFE. We believe this may explain some of the differences between the findings of Boyko *et al.* (2008) and the findings in this study.

### Assessing the role of the likelihood function using simulations

Next, we investigated the performance of the multinomial *vs.* Poisson likelihoods at inferring the DFE. In this study, as well as in Boyko *et al.* (2008), we fit the DFE using the Poisson likelihood, which uses an *a priori* estimate of the population-scaled mutation rate, *θ*, to fit the curvature of the SFS as well as the total number of SNPs. Too few segregating variants would suggest the presence of strongly deleterious variants that are not segregating in the sample (Boyko *et al.* 2008).

The multinomial likelihood fits the curvature of the SFS while conditioning on the total number of SNPs. As such, the number of SNPs provides no additional information. The multinomial inference is similar to how the DFE was inferred by Li *et al.* (2010) in that they only used information from the curvature of the SFS. Note, however, Li *et al.* (2010) fit the population frequency spectrum using a least-squares approach while the multinomial likelihood fits the sample frequency spectrum. As such, the multinomial likelihood function does not strictly mirror the procedure of Li *et al.* (2010). Using the multinomial likelihood is convenient because it does not require any prior assumptions about the population scaled mutation rate, *θ*, yet may be underpowered when trying to identify the proportion of strongly deleterious mutations, unless such variants are segregating in the sample.

To compare the two likelihood methods at varying sample sizes, we fit the full model to simulated data sets of *n* = 12, 24, 50, 100, 150, 200, and 250 chromosomes using both the multinomial and Poisson likelihoods (Figure S8 in File S1). Again, we simulated 200 data sets at each sample size with the LuCamp demography and a gamma DFE with parameters α = 0.203 and β = 1082.1. In general, the accuracy of our shape parameter estimate improves as the sample size increases, and we find the multinomial and Poisson likelihoods can both be used to reliably estimate the shape parameter. While this trend holds true for the scale parameter using the Poisson likelihood, we find that we are unable to accurately infer the scale parameter using the multinomial likelihood, even for a sample of *n* = 250 chromosomes. For example, nearly 50% of all the parameter estimates lie close to the maximum bound and 25% lie close to the minimum bound allowed during optimization. We found that this result can be explained by the likelihood surface being exceptionally flat with respect to the scale parameter. In other words, we cannot estimate the strength of purifying selection using only the curvature of the SFS with these sample sizes. Therefore, because Li *et al.* (2010) fit only the curvature of the SFS and excluded rare variants (<2% MAF) in a sample of size of *n* = 400 chromosomes, the power to detect strongly deleterious variants may be quite low, resulting in different parameter estimates from those in Boyko *et al.* (2008) and our present work.

## Discussion

We developed a computational method to infer the DFE of new mutations from large data sets, and then estimated the DFEs for nonsynonymous mutations using the SFS of 432 Europeans from the 1000 Genomes Project, 1300 Europeans from the ESP, and 1298 Europeans from the LuCamp project. The new DFEs suggest there are fewer strongly deleterious mutations than previously estimated (Figure 4). Although we find a neutral+gamma mixture DFE fits best to the ESP and LuCamp data sets, a gamma DFE seems to be a better fit to the 1000 Genomes data (Table 3). Nevertheless, our best-fit DFEs predict 0.38 to 0.84 as many strongly deleterious new mutations compared to the current, widely used estimates of Boyko *et al.* (2008). Additionally, these findings are robust to assumptions about the mutation rate or the assumed functional form of the DFE. We show small sample size can lead to incorrect estimates of the DFE, specifically when the DFE is approximated with a parametric distribution that is not the true distribution (Figure 5). Therefore, our estimates provide an important update to previous studies of the DFE that used smaller sample sizes. Our current estimates of the DFE, particularly the estimates of the proportion of moderately and strongly deleterious mutations, should be more reliable and precise than previous estimates. To facilitate their utility for future researchers, we provide scripts for implementing these models on GitHub (see *Materials and Methods*).

Our results suggest misspecification of the DFE may explain some of the differences in the DFEs we infer from small and large data sets. This is particularly relevant because the true DFE is almost certainly not a parametric distribution. At small sample sizes, different forms of the DFE tended to similarly fit the data in terms of log-likelihood (Table S5 in File S1). Therefore, the DFE that had fewer parameters (*i.e.*, gamma) was selected as the best-fit DFE. Additionally, we infer more strongly deleterious (|*s*| > 10^{−2}) new mutations from the downsampled data sets. We showed through simulations that even if the true DFE is neutral+gamma distributed, a gamma DFE is selected as the best fit to a small sample. Furthermore, this leads to inaccuracy in recovering the true proportions of the DFE (Figure 5). While the neutral+gamma distribution is also unlikely to be the true DFE, our simulations reproduce the patterns observed when downsampling the real data. Therefore, large sample size not only helps to improve the precision of the estimated DFEs, but also helps to approximate the correct form of the DFE. We expect this question to be better resolved as additional and larger sequencing data sets continue to be generated in the future.

A gamma or similarly shaped distribution of deleterious mutations is well supported by experimental estimates of the DFE in laboratory organisms (Martin and Lenormand 2006; Bataillon and Bailey 2014), although some studies suggest more complex distributions (Halligan and Keightley 2009; Hietpas *et al.* 2011; Jacquier *et al.* 2013). A number of theoretical models also predict the functional form of the DFE (Huber *et al.* 2016). Most progress in this regard comes from phenotype fitness-landscape models such as Fisher’s geometric model (FGM) (Martin and Lenormand 2006; Chevin *et al.* 2010; Lourenço *et al.* 2011; Tenaillon 2014; Huber *et al.* 2016) and biophysical models of protein stability (Cherry 1998; Goldstein 2013; Serohijos and Shakhnovich 2014). Under fairly general assumptions, the predicted DFE under these models for a perfectly adapted population is gamma distributed (Martin and Lenormand 2006; Martin 2014; Serohijos and Shakhnovich 2014), and a strongly leptokurtic shape would suggest that most mutations have low pleiotropy (Martin and Lenormand 2006; Lourenço *et al.* 2011). However, our finding of a neutral+gamma distribution suggests that the general FGM is inadequate, since it does not predict the neutral point mass. Alternatively, our support for a neutral point mass might not reflect truly neutral mutations, but rather slightly beneficial mutations that behave effectively neutral under the historically small human population size (Huber *et al.* 2016). Since these mutations would segregate at frequencies similar to neutral mutations, and since we do not explicitly model the effect of beneficial mutations on the SFS, our method would place these mutations at the neutral point mass. Such slightly beneficial mutations are predicted under FGM when deleterious mutations fix and move the population away from the optimal phenotype (Lourenço *et al.* 2011). Slightly deleterious mutations can fix in the population when the effect of drift is large, *i.e.*, the effective population size is small. Thus, our support for the neutral+gamma distribution might be consistent with a large effect of drift in the relatively small human population (Huber *et al.* 2016). Alternatively, a recent change in the selective environment could have moved the human population away from the phenotypic optimum at many genes, leading to a similar increase of the proportion of slightly beneficial mutations (Martin and Lenormand 2006; Chevin *et al.* 2010; Lourenço *et al.* 2011; Bank *et al.* 2014).

Additionally, our results show that estimates of the DFE are sensitive to the mutation rate. For any given data set, assuming a higher nonsynonymous mutation rate will result in the inference of stronger purifying selection due to the increased number of SNPs expected (but not observed) relative to the nonsynonymous mutation rate. There are two assumptions that factor into the calculation of the nonsynonymous mutation rate: First, it is unclear what the true mutation rate is. Whole genome, pedigree-based estimates suggest a mutation rate of about 10^{−8} per base pair per generation, exome-based estimates suggest rates of 1.5 × 10^{−8}, and phylogenetic estimates suggest a mutation rate in the range of 2.0–2.5 × 10^{−8} (Ségurel *et al.* 2014). Second, we infer a mutation rate from synonymous sites, but use that mutation rate to make an *a priori* assumption about the nonsynonymous mutation rate. Many studies have the nonsynonymous mutation rate at 2.5 times the synonymous mutation rate, but we believe 2.31 to be a more accurate estimate, as this takes into account the CpG mutational bias and a 3:1 transition:transversion ratio (Huber *et al.* 2016). These two factors combined can result in large differences in the DFE. For example, the gamma DFE fit to the LuCamp data predicts 15% of mutations to be strongly deleterious (|*s*| > 10^{−2}) when assuming *θ*_{NS}/*θ*_{S} = 2.31 and *μ* = 1.5 × 10^{−8}, but 25% of new mutations to be strongly deleterious when assuming *θ*_{NS}/*θ*_{S} = 2.5 and *μ* = 2.5×10^{−8}. Although our results remain qualitatively consistent across the range of mutation rates, uncertainty about the true rate leads to uncertainty in estimating the DFE.

Another important aspect of our results is the consistency of our estimates of the DFE between data sets. Our estimates of the DFE suggest a skew toward neutrality compared to previous studies, and we infer a consistent range of neutral (|*s*| < 10^{−5}, 24–26%), moderately deleterious (10^{−3} < |*s*| < 10^{−2}, 25–33%), and strongly deleterious (|*s*| < 10^{−2}, 14–22%) new mutations between the three data sets. The consistency of our results across data sets suggests our inferences are accurate and robust to sampling from different populations, sequencing, bioinformatic processing, and sample size. This suggests the DFEs we have inferred are reliable updates to the DFEs inferred by Eyre-Walker *et al.* (2006) and Boyko *et al.* (2008).

It is also worth noting that our methodology has key differences from that of Li *et al.* (2010). Li *et al.* (2010) estimated the DFE using the population frequency spectrum excluding rare variants (MAF < 2%), under a constant size demographic model using a least-squares method, and fit the curvature of the SFS while not considering the total number of SNPs in the sample. The extent to which these methodological differences as well as differences in sequencing or bioinformatic processing of the data between their study and our present study contribute to the different estimated DFEs remains unclear. However, we have shown that for small and moderately sized samples, fitting only the curvature of the SFS is insufficient for estimating the scale parameter of the DFE. In other words, for smaller samples, the number of SNPs in the data must be considered to estimate the proportions of moderately and strongly deleterious new mutations, since moderately to strongly deleterious mutations are unlikely to be found in the sample.

Fit∂a∂i infers the DFE for new mutations rather than segregating variants. Interestingly, even inference using the multinomial likelihood function, which only uses the frequencies of segregating variants, still infers the DFE for new mutations. We used simulations to compare the DFE of new mutations to that of segregating variants (Figure S9 and Table S7 in File S1). The DFE of segregating variants depends on the sample size and is shifted to be more neutral than that of new mutations. Approximating the DFE of segregating variants using a gamma distribution reveals a different shape parameter than that of new mutations (Table S7 in File S1); confirming that Fit∂a∂i, when applied to segregating variants, estimates the DFE of new mutations. While long-tailed distributions such as the gamma do not directly capture the mode of very strongly deleterious new mutations observed by many experimental studies (Eyre-Walker and Keightley 2007), the proportion could be extrapolated from the DFEs we inferred. For example, for a gamma-distributed DFE, the proportion of new mutations that is lethal would be computed as the mass of the distribution greater than |*s*| > 1.

Our results argue that, provided the demographic model fit to the data can adequately match the SFS of neutral synonymous sites, inference of the DFE should be robust to cryptic, unmodeled, population structure. In other words, the skew in the frequency spectrum due to demography must be reproduced accurately, but inference of selection is relatively robust to the way the skew is modeled. This result is consistent with the work of Ma *et al.* (2013). Alternately, other studies rescale the entries of the nonsynonymous SFS independently to match the skew of the synonymous SFS from the standard neutral model (Eyre-Walker *et al.* 2006; Galtier 2016; Tataru *et al.* 2016). However, this method is not always accurate for demographies including recent, rapid expansions since the skew on neutral and selected sites may differ (Eyre-Walker *et al.* 2006). Further, fitting many independent scaling parameters to large samples can be problematic (Tataru *et al.* 2016). Thus, Fit∂a∂i offers an advantage over the rescaling methods in these contexts.

Although Fit∂a∂i was developed to work with large sequencing data sets, it still has several limitations. The inference framework we use becomes increasingly slower for larger samples and requires significant computational resources and time to initially generate the SFS for the range of selection coefficients. Additionally, the frequency spectrum becomes difficult to compute for larger selection coefficients (2*Ns* > 10,000). This is mainly because finer integration grids must be used to accurately estimate low frequency variants. Also, like the method of Boyko *et al.* (2008), our method assumes additive selective effects and should be interpreted as averaging of selection over all heterozygotes and genetic backgrounds. Nevertheless, we anticipate that our method will be useful for estimating the DFE across the tree of life as polymorphism data sets from different species continue to accumulate.

Our results suggest that there may be more weakly and moderately deleterious nonsynonymous mutations than previously appreciated. This has a number of important implications for medical genetic studies. These variants could possibly contribute to disease risk. However, these mutations could also confound statistical tests that compare observed levels of variation to those predicted by population genetic models. For example, using the DFE of Boyko *et al.* (2008) would predict fewer segregating deleterious variants because more new mutations were estimated to be strongly deleterious and would not segregate in the sample. However, if those mutations were instead only moderately deleterious, some could drift up in frequency and actually segregate in the sample. Further, a common approach to modeling how deleterious variants affect complex traits (Eyre-Walker 2010) assigns mutational effects on a trait as a function of their effects on fitness. This approach has been widely used to quantify the architecture of complex traits (Morris *et al.* 2012; Mancuso *et al.* 2016), to study the effects of demography on traits (Lohmueller 2014a; Simons *et al.* 2014), and to assess the power of rare variant association tests (Uricchio *et al.* 2016). The accuracy and realism of these models depend on having accurate estimates of the DFE.

Additionally, the DFE determines the extent to which background selection affects patterns of neutral variation. Accurately characterizing the reduction in diversity (*i.e.*, effective population size *N*_{e}) should reduce bias when trying to learn the true demography of a population using sites linked to selected variants (Ewing and Jensen 2016; Schrider *et al.* 2016). We used a deterministic approximation (Nicolaisen and Desai 2013) of the models of Zeng and Charlesworth (2011) to contrast the effects of background selection predicted from the DFE of Boyko *et al.* (2008) and the DFEs we inferred in our study (Figure S10 in File S1). We computed the reduction in *N*_{e} (*i.e.*, increase in the rate of coalescence) as a function of time due to background selection for the two different mutation rates used in our inferences (*μ* = 1.5 × 10^{−8}, *L*_{NS}/*L*_{S} = 2.31; *μ* = 1.8 × 10^{−8} and *L*_{NS}/*L*_{S} = 2.5) as well as the higher deleterious mutation rate of McVicker *et al.* (2009): *μ* = 7.4 × 10^{−8}, also assuming *L*_{NS}/*L*_{S} = 2.5. Importantly, all the DFEs predict that background selection will reduce diversity and skew the SFS toward an excess of rare variants compared to models of constant population size (Figure S10 in File S1). However, DFEs with fewer strongly deleterious mutations, like the best fit DFEs to the ESP EUR and LuCamp data sets, predict less of an overall reduction in neutral diversity compared to Boyko *et al.* (2008). Further, the change in coalescent rates over time varies across DFEs, suggesting that the degree to which background selection affects the curvature of the SFS does depend on the specific DFE.

More broadly, our results have important implications for understanding and quantifying deleterious variants across human populations (Lohmueller *et al.* 2008; Lohmueller 2014b; Simons *et al.* 2014; Do *et al.* 2015). Specifically, the fate of strongly deleterious mutations is relatively insensitive to population demography. The fate of weakly and moderately deleterious mutations, however, is linked more closely with effective population size (Henn *et al.* 2016). Human evolution in particular is influenced by nearly neutral processes due to relatively small effective population sizes. Then, a DFE containing fewer strongly deleterious new mutations suggests the nature of purifying selection in humans may be different from what is currently understood. For example, larger proportions of moderately and weakly deleterious mutations may suggest greater differences in the proportion of segregating deleterious mutations and genetic load between human populations (Henn *et al.* 2016). Accurate inferences of the DFE are critical in this regard as researchers begin to use explicit models of demography and selection to quantify differences in the amounts of deleterious variants across populations (Brandvain and Wright 2016; Gravel 2016).

## Acknowledgments

We thank Emilia Huerta-Sanchez, Diego Ortega-Del Vecchyo, and Noah Rosenberg, as well as two reviewers, for constructive comments on the manuscript. This work was supported by a Searle Scholars Fellowship, an Alfred P. Sloan Research Fellowship in Computational and Molecular Biology, and National Institutes of Health grant R35 GM-119856 to K.E.L.

## Footnotes

Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.116.197145/-/DC1.

*Communicating editor: N. A. Rosenberg*

- Received October 23, 2016.
- Accepted February 14, 2017.

- Copyright © 2017 by the Genetics Society of America

Available freely online through the author-supported open access option.