## Abstract

We developed generalized functional linear models (GFLMs) to perform a meta-analysis of multiple case-control studies to evaluate the relationship of genetic data to dichotomous traits adjusting for covariates. Unlike the previously developed meta-analysis for sequence kernel association tests (MetaSKATs), which are based on mixed-effect models to make the contributions of major gene loci random, GFLMs are fixed models; *i.e.*, genetic effects of multiple genetic variants are fixed. Based on GFLMs, we developed chi-squared-distributed Rao’s efficient score test and likelihood-ratio test (LRT) statistics to test for an association between a complex dichotomous trait and multiple genetic variants. We then performed extensive simulations to evaluate the empirical type I error rates and power performance of the proposed tests. The Rao’s efficient score test statistics of GFLMs are very conservative and have higher power than MetaSKATs when some causal variants are rare and some are common. When the causal variants are all rare [*i.e.*, minor allele frequencies (MAF) < 0.03], the Rao’s efficient score test statistics have similar or slightly lower power than MetaSKATs. The LRT statistics generate accurate type I error rates for homogeneous genetic-effect models and may inflate type I error rates for heterogeneous genetic-effect models owing to the large numbers of degrees of freedom and have similar or slightly higher power than the Rao’s efficient score test statistics. GFLMs were applied to analyze genetic data of 22 gene regions of type 2 diabetes data from a meta-analysis of eight European studies and detected significant association for 18 genes (*P* < 3.10 × 10^{−6}), tentative association for 2 genes (*HHEX* and *HMGA2*; *P* ≈ 10^{−5}), and no association for 2 genes, while MetaSKATs detected none. In addition, the traditional additive-effect model detects association at gene *HHEX*. GFLMs and related tests can analyze rare or common variants or a combination of the two and can be useful in whole-genome and whole-exome association studies.

FOR association studies of many complex traits, multiple studies may have been conducted that have collected the same phenotypic traits. For example, a large number of studies of type 2 diabetes (T2D) have been conducted to evaluate the relationship between single-nucleotide polymorphisms (SNPs) and T2D (Morris *et al.* 2012; Scott *et al.* 2012; Li *et al.* 2014). The sample size of an individual study can be small or moderate and may not always lead to a significant association signal at a genome-wide requirement. It is desirable to combine multiple studies for a unified meta-analysis in order to reach rigorous significant threshold levels (Zeggini and Ioannidis 2009; Evangelou and Ioannidis 2013; Liu *et al.* 2014). By combining multiple studies together, one can get a sample with a large sample size, and it is more likely to produce significant results. However, different studies may contain different genetic data or covariates, which make analysis of the combined data difficult. It is important to develop statistical methods that analyze the combined data of multiple studies.

To perform an association meta-analysis for complex traits, one may take two strategies: (1) single-genetic-variant-based approaches and (2) gene-based-variant-analysis approaches. The single-genetic-variant approaches use only one genetic variant at a time and are useful to analyze common variants (Zeggini *et al.* 2008; Hindorff *et al.* 2009; Stahl *et al.* 2010). Gene-based association analysis uses multiple genetic variants to detect an association. In recent years, there has been a great deal of interest in developing statistical methods and tests for gene-based association analysis of complex traits (Hu *et al.* 2013; Liu *et al.* 2014). Gene-based analysis can lead to higher power and improve multiple-comparison problems compared to single-marker analysis because fewer tests are required. More important, gene-based analysis can be the only way to analyze rare variants that have minor allele frequencies (MAFs) < 0.01–0.05 because it could be powerless to use a single rare variant in an analysis.

Burden tests and kernel-based test methods are popular approaches to performing rare variant-gene-based association analyses. Burden tests collapse rare variants into a single variable to test for an association with a complex trait and to reduce the high dimensionality of genetic data (Li and Leal 2008; Madsen and Browning 2009; Han and Pan 2010; Morris and Zeggini 2010; Price *et al.* 2010; Neale *et al.* 2011). Kernel-based test methods are based on mixed-effect models in which the regression coefficients of multiple genetic variants are random with means of zero and constant variance. The association is tested by testing a null hypothesis of zero variance by a sequence kernel association test (SKAT). The SKAT and its optimal unified test (SKAT-O) were found to have higher power than burden tests (Wu *et al.* 2011; Lee *et al.* 2012). By extending SKAT and SKAT-O to perform meta-analyses, Lee *et al.* (2013) developed the meta-analysis for sequence kernel association test (MetaSKAT) and its optimal unified test (MetaSKAT-O) to carry out meta-analyseis.

The regression coefficients of genetic terms in the models of SKAT and MetaSKAT were assumed to be random because the number of genetic variants is usually large for modern genetic data. In population genetics, however, the genetic effects of major gene loci are usually assumed to be fixed, while the contributions of polygenic loci are modeled as a random term (Fisher 1918). The high dimensionality of modern genetic data does not necessarily imply that traditional population genetics theory is not correct because the number of causal variants may not be large. A fixed model should be fine to analyze the major gene locus data in most cases if the dimension of the genetic data can be properly reduced.

By viewing genetic variant data as realizations of an underlying stochastic process, functional regression models were proposed to reduce the dimensionality and to perform a gene-based association analysis of quantitative, qualitative, and survival traits (Luo *et al.* 2011, 2012, 2013; Fan *et al.* 2013, 2014, 2015, 2016; Vsevolozhskaya *et al.* 2014; Zhang *et al.* 2014; Wang *et al.* 2015). For quantitative traits, functional linear models lead to both *F*- and chi-squared-distributed test statistics that are almost always more powerful than SKAT and SKAT-O (Luo *et al.* 2012; Fan *et al.* 2013, 2015; Wang *et al.* 2015). For dichotomous and survival traits, functional regression models lead to test statistics that are more powerful than SKAT and SKAT-O except in some cases where the causal variants are all rare (Luo *et al.* 2011, 2013; Fan *et al.* 2014, 2016; Vsevolozhskaya *et al.* 2014). Therefore, functional regression models are found to outperform other methods and potentially to be useful in gene-based association analysis of complex traits.

In our functional regression models, the genetic effects are treated as a function of the physical position, and the genetic-variant data are viewed as stochastic functions of the physical position, so any orders of linkage disequilibrium (LD) are taken care of in the models (Ross 1996). The regression coefficients of genetic terms in the SKAT and MetaSKAT models do no depend on the physical position, while our genetic-effect function depends on the physical position and is actually a function of physical position. Hence, the functional regression models can fully use LD and physical position information. The functional regression models are a natural extension of traditional population genetics because we model the genetic effects of major gene loci as fixed functions.

In this paper, generalized functional linear models (GFLMs) are developed for a meta-analysis of multiple studies. GFLMs can analyze rare or common variants or a combination of the two. Both chi-squared-distributed Rao’s efficient score test statistics and likelihood-ratio test (LRT) statistics are introduced to test for an association between disease traits and multiple genetic variants. Extensive simulations are performed to evaluate the type I error rates and power performance of the GFLMs and tests. The proposed methods were applied to analyze T2D data from a meta-analysis of eight European studies.

## Materials and Methods

Consider a meta-analysis with *L* case-control studies in a genomic region. For the *ℓ*th study, we assume that there are individuals who are sequenced in the genomic region at variants. We assume that the variants are located with ordered physical positions . To make the notation simpler, we normalized the region to be [0, 1]. For the *i*th individual in the *ℓ*th study, let denote his or her dichotomous trait (here indicates that the individual is an affected case of the disease of interest, indicates that the individual is a normal control individual), denotes his or her genotypes of the variants, and denotes his or her covariates. Hereafter in this paper, a prime denotes the transpose of a vector or matrix. For the genotypes, we assume that (= 1, 2, 3) is the number of minor alleles of the individual at the *j*th variant located at position .

### Traditional additive-effect models

By using logistic regression, an additive-effect model (AEM) can be used to analyze the relation between the disease trait and the variants in the *ℓ*th study as (Cordell and Clayton 2002) (1)where is the disease probability, is the regression intercept, is a column vector of regression coefficients of covariates, and is the additive genetic effect of variant *j* for the *ℓ*th study. The number of the parameters of the model (1) can be large, so it may not be powerful. Despite the potential drawbacks, the model (1) can be easily implemented by standard statistical software such as R. If the number of genetic variants is large, one may decompose the genotype matrix into the product of an orthogonal matrix **Q** and a triangular matrix **R** via Gram-Schmidt process to remove the redundancy to facilitate computation in applications, *i.e.*, the **QR** decomposition.

### β-Smooth-only GFLMs

To model the relation between the disease trait and the variants, we propose the following functional logistic regression model: (2)where is the genetic effect of the variant at position , and the other terms are similar to those in the AEM (1). Note that we have *L* studies, so the effect of a common covariate can be either the same or different across the studies: (1) heterogeneous: we treat as different for different studies, *i.e.*, are all different; and (2) homogeneous: if a covariate is present in different studies, we model its regression coefficient by one common coefficient.

In the model (2), is introduced as the genetic effect of the variant at position . We assume that is a continuous function of the physical position *t*. One may expand it by B-spline or Fourier or linear spline basis functions. Formally, let us expand the genetic-effect function by a series of basis functions as , where is a vector of coefficients . We consider two types of basis functions: (1) the B-spline basis and (2) the Fourier basis , and . Here, for Fourier basis, is taken as a positive odd integer (de Boor 2001; Ramsay and Silverman 2005; Ramsay *et al.* 2009; Ferraty and Romain 2010; Horváth and Kokoszka 2012). Replacing by the expansion, the model (2) can be revised as (3)where . In the model (2) and its revised version (3), we use the raw genotype data directly in the analysis.

### General GFLM

In this subsection we view the *i*th individual’s genotype data as a genetic variant function (GVF) in addition to treating the genetic effects as functions . To relate the GVF to the phenotypic traits adjusting for covariates, we consider the following functional logistic regression model: (4)where is the genetic effect of GVF at position *t*, and the other terms are similar to those in the β-smooth-only model (2). In this model, the integration term is used to replace the summation term in the β-smooth-only model (2).

#### Estimation of GVF:

Let be a series of *K* basis functions, such as the B-spline basis and Fourier basis functions. Let denote the × *K* matrix containing the values , where . Denote . Using the discrete realizations , we may estimate the GVF using an ordinary linear square smoother as follows (Ramsay and Silverman 2005, Chapter 4):

#### Revised GFLM:

As in the β-smooth-only case, the genetic effect is expanded by a series of basis functions . Replacing in (4) by in (5) and by the expansion, we have the following revised logistic regression model: (6)where . In this revised regression model, one needs to calculate and in order to get . In the statistical package R, there are readily available codes to calculate them (Ramsay *et al.* 2009).

### Test statistics of association

We consider the revised regression models [(3) and (6)] as usual logistic regressions that model the genetic effect of GVFs adjusted for covariates. First, assume that the genetic effects among the *L* studies are heterogeneous. To test for an association between the genetic variants and the disease trait, the null hypothesis is . We may test the null hypothesis by a chi-squared-distributed Rao’s efficient score statistic with a degree of freedom of . The Rao’s efficient score statistic is denoted by GFLM Het-Rao. An alternative approach is to use a LRT statistic to test for association, which is also chi-squared distributed with degrees of freedom and is denoted by GFLM Het-LRT.

If the genetic effects are homogeneous, *i.e.*, , we may test for association by testing a simplified null hypothesis . Again, one may use a chi-squared-distributed Rao’s efficient score statistic and a chi-squared-distributed LRT statistic to test the null hypothesis. Both the chi-squared-distributed Rao’s efficient score statistic and the LRT statistic have a degree of freedom of and are denoted by GFLM Hom-Rao and GFLM Hom-LRT, respectively.

For the AEM (1), the null hypothesis is , under an assumption of heterogeneous genetic effect. The corresponding chi-squared-distributed Rao’s efficient score test and LRT statistics are chi-squared distributed with degrees of freedom. The tests are denoted as AEM Het-Rao and AEM Het-LRT, respecyively. Assume that each individual of the *L* studies is sequenced at the same variants at and so . In addition, assume that the genetic effects are homogeneous, *i.e.*, . Then the AEM (1) is simplified as(7)The null hypothesis of no association between the genetic variants and the disease trait is . The corresponding Rao and LRT statistics are chi-squared distributed with *m* degrees of freedom. The tests are denoted as AEM Hom-Rao and AEM Hom-LRT, respectively.

### Parameters of functional data analysis

In the data analysis and simulations, we used a functional data analysis procedure in the statistical package R. We use two functions in library fda of R package as follows to create basis:basis = create.bspline.basis(norder = order, nbasis = bbasis)basis = create.fourier.basis(c(0,1), nbasis = fbasis)

The three parameters were taken as *order = 4*, *bbasis = 10*, *fbasis = 11* for the heterogeneous genetic-effect model and as *order = 4*, *bbasis = 12*, *fbasis = 13* for the homogeneous genetic-effect model in all data analyses and simulations. To make sure that the results are valid and stable, we tried a wide range of parameters: (1) for the heterogeneous genetic-effect model and (2) for the homogeneous genetic-effect model. The results are similar to each other (data not shown).

### Data availability

Computer Program: The methods proposed in this paper are implemented by using the procedure of functional data analysis (fda) in the statistical package R. The R codes for data analysis and simulations are available from http://www.nichd.nih.gov/about/org/diphr/bbb/software/fan/Pages/default.aspx.

## Results

### Meta-analysis of T2D in eight European cohorts

The proposed methods were applied to analyze a set of studies investigating T2D that includes eight European cohorts: the FIN-D2D 2007 study (D2D2007), the Diabetes Genetic study (DIAGEN), the Finnish Diabetes Prevention Study (DPS), the Finland–United States Investigation of NIDDM Genetics study (FUSION Stage 2), the Nord-Trøndelag Health Study 2 (HUNT), the Metabolic Syndrome in Men study (METSIM), and the Tromsø study (TROMSO). The sample sizes of cases and controls for each study are provided in Supporting Information, Table S1. For each cohort, 54,741 genetic variants are genotyped and are located in 97 genetic regions across the 22 autosomes. For our analysis, we used the literature on T2D as a reference for gene selection and found that 22 gene regions were fine mapped (Zeggini *et al.* 2008; Voight *et al.* 2010; Morris *et al.* 2012; Scott *et al.* 2012; Li *et al.* 2014; Liu *et al.* 2014). We used Builder Mar. 2006 (NCBI36/hg18) to determine gene positions, and 5 kb was used to extend the gene region on each side of a gene. A summary of the 22 genes and the number of genetic variants in each gene region are given in Table S2.

Association analysis between T2D status and each of the 22 genes was performed by the proposed methods and MetaSKAT. Except for METSIM, age and sex were used as covariates. For METSIM, age was used as a covariate because no females were included in the study. A significance threshold of *P* < 3.1 × 10^{−6} was taken from Liu *et al.* (2014). If a *P*-value is around 10^{−5} but larger than 3.1 × 10^{−6}, we call it a “tentative significant association signal.”

Table 1 reports the results of association analysis of the eight European cohorts by heterogeneous Rao’s efficient score test (Het-Rao), Het-MetaSKAT-O, and Het-MetaSKAT, and Table 2 reports the results by homogeneous Rao’s efficient score test (Hom-Rao), Hom-MetaSKAT-O, and Hom-MetaSKAT. The results of Het-LRT and Hom-LRT are reported in Table S3 and Table S4, respectively. At the significance threshold of *P* < 3.1 × 10 ^{−6}, we observe associations for 17 genes, *PCSK9*, *APOB*, *IGF2BP2*, *CDKAL1*, *JAZF1I CDKN2B*, *CDC123*, *IDE*, *KIF11*, *TCF7L2*, *KCNQ1*, *MTNR1B*, *TSPAN8*, *HNF1A*, *OASL*, *FTO*, and *APOE*, by both Het-Rao and Het-LRT in both the revised β-smooth-only GFLM (3) and the revised general GFLM (6) for both B-spline and Fourier basis functions in Table 1 and Table S3. For the *LPL* gene, a significant association signal is observed by both Het-Rao and Het-LRT in both the β-smooth-only GFLM (3) and the revised general GFLM (6) for Fourier basis functions, while B-spline basis functions lead to tentative association signals. Tentative association signals are observed for two genes, *HHEX* and *HMGA2*, in Table 1 and Table S3, respectively. Only two genes, *LDLR* and *GIPR*, show no association signal.

By both Hom-Rao and Hom-LRT in both the revised β-smooth-only GFLM (3) and the revised general GFLM (6) for both B-spline and Fourier basis functions in Table 2 and Table S4, association is observed for gene *TCF7L2* at the significance threshold of *P* < 3.1 × 10^{−6}. By both Hom-Rao and Hom-LRT in the β-smooth-only GFLM (3) for Fourier basis functions, significant association signals are observed for two genes, *MTNR1B* and *FTO*, in Table 2 and Table S4. Tentative association signals are observed for two genes, *CDKN2B* and *TSPAN8*, by both Hom-Rao and Hom-LRT in Table 2 and Table S4 for Fourier basis functions in the revised general GFLM (6), respectively.

The *P*-values of Hom-LRT in Table S4 are very similar to those of Hom-Rao in Table 2, and the *P*-values of Het-LRT in Table S3 are slightly smaller than those of Het-Rao in Table 1. Hence, the LRT statistics can be slightly more powerful than the Rao’s efficient score test statistics. It is noteworthy that most association signals are detected by Het-LRT and Het-Rao, but Hom-LRT and Hom-Rao only detect association signals for three genes, *TCF7L2*, *MTNR1B*, and *FTO*, reflecting the presence of heterogeneity of the genetic effects.

In addition to the results of GFLMs 3 and 6, MetaSKAT, and MetaSKAT-O, Table 1, Table 2, Table S3, and Table S4 report the results of traditional additive-effect models 1 and 7. Additive-effect models 1 and 7 detect most association signals of GFLMs 3 and 6 in Table 1 and Table 2. In particular, the Het-Rao and Het-LRT of the AEM (1) detect association for *HHEX* in Table 1 and Table S3.

It is noteworthy that Het-MetaSKAT-O, Het-MetaSKAT, Hom-MetaSKAT-O, and Hom-MetaSKAT do not detect any significant signals in any of the 22 genes. The 22 genes are from the literature on T2D, and each of them contains SNPs that are associated with T2D. Thus, significant association signals for T2D are expected for most of the 22 genes if a gene-based method is sensitive. However, MetaSKAT detected no associations for the 22 genes, although our GFLMs and AEM detect associations for 19 genes. Therefore, MetaSKAT is less sensitive than the proposed LRT and Rao’s efficient score test statistics for the T2D data in the European cohorts. In Table S5, Table S6, Table S7, and Table S8, we report the results of Rao’s efficient score tests by dividing the data between rare and common variants based on a cutoff of 0.03. It is worth noting that the 22 genes contain both rare and common variants and that the associations are mainly from common variants. SKAT and MetaSKAT are designed to analyze rare variants, while the GFLMs and the AEM can analyze rare or common variants or a combination of the two.

When we analyze the data sets separately for each study by the method proposed in Fan *et al.* (2014), significant association is only detected at *TCF7L2* in the study of Norway by Rao’s efficient score test and the LRT (data not shown). Thus, it is advantageous to perform a meta-analysis of multiple studies.

### A simulation study

Simulations were performed to evaluate the performance of the proposed methods for two cases: (1) all causal variants are rare, and (2) some causal variants are rare and some are common. Three scenarios listed in Table 3 were considered for the simulations. Scenarios 1 and 2 used the European-like (EUR) sequence data, which are the same as those in Lee *et al.* (2012). Scenario 3 used both the EUR and African-American-like (AA) sequence data. The EUR sequence data were generated using COSI’s calibrated best-fit models, and the generated European haplotypes mimick CEPH Utah individuals with ancestry from northern and western Europe in terms of site-frequency spectrum and LD pattern (Schaffner *et al.* 2005, Figure 4; International HapMap Consortium 2007). Similarly, the AA sequence data mimick the Yoruba from Ibadan (YRI) (Nigeria in Africa) individuals with a 20:80 mixture of Europeans and Africans, together with parameters calibrated to model realistic demographic history (including bottleneck, population expansion, and migration events). The EUR data included 10,000 chromosomes covering 1-Mb regions, and the AA data included 45,000 chromosomes covering 0.1-Mb regions.

#### Type I error simulations:

To evaluate the type I error rates of the proposed models and tests, we generated phenotype data sets by using the model (8)for scenario 1 in Table 3 and (9)for scenarios 2 and 3 in Table 3, where is a dichotomous covariate taking values 0 and 1 with a probability of 0.5, and are continuous covariates from standard normal distributions , and was chosen to provide a disease prevalence of 0.01 under a null hypothesis . To obtain genotype data, 3-kb subregions were randomly selected in the 1-Mb regions of EUR and AA data. The ordered genotypes were these variants in the 3-kb subregions. Note that the trait values are not related to the genotypes, so the null hypothesis holds. We calculated empirical type I error rates for both Rao’s efficient score test and LRT statistics.

The sample sizes of the data sets were taken as 1600 (study 1), 2200 (study 2), and 3200 (study 3), respectively. For each study, half the sample consists of cases, and the remaining half consists of control individuals. The simulation settings are summarized in Table 3. For each sample-size combination, 10^{6} phenotype-genotype data sets were generated to fit the proposed models and to calculate the test statistics and related *P*-values. Then an empirical type I error rate was calculated as the proportion of 10^{6} *P*-values that were smaller than a given α level (*i.e.*, 0.05, 0.01 and 0.001, 0.0001, respectively).

#### Empirical power simulations:

To evaluate the power performance of the proposed models and tests, we simulated data sets under the alternative hypothesis by randomly selecting 3-kb subregions to obtain causal variants for the disease traits as follows: once a 3-kb subregion was selected, a subset of *p* causal variants located in the 3-kb subregion was then randomly selected to obtain ordered genotypes []. Then we generated the disease traits by (10)for scenario 1 in Table 3 and by (11)for scenarios 2 and 3, where and are the same as in models 8 and 9, and β is as follows: we used , where MAF_{j} is the MAF of the *j*th variant. Three different settings were considered: 5, 10, and 20% of variants in the 3-kb subregion are chosen as causal variants. When 5, 10, and 20% of the variants were causal, two parameter settings of genetic effects were considered for : (1) homogeneous and (2) heterogeneous (Table 4). In the homogeneous case, the genetic effects are the same for the three studies, *i.e.*, . In the heterogeneous case, the genetic effects are different for the three studies, *i.e.*, . For each setting, 1000 data sets were simulated to calculate the empirical power as the proportion of *P*-values that are smaller than an level.

#### Type I error simulation results:

The empirical type I error rates are reported in Table 5 and Table 6. In Table 5, only rare variants were used to generate genotype data, but none of them relates to the trait. In Table 6, all variants were used to generate genotype data. For the GFLMs Hom-Rao and Het-Rao, all empirical type I error rates are below the nominal α levels for both B-spline and Fourier basis functions (columns 4–7 of Table 5 and Table 6). Therefore, the chi-squared-distributed Rao’s efficient score statistics are very conservative and can be useful in whole-genome and whole-exome association studies.

For the GFLM Hom-LRT, all empirical type I error rates are around the nominal α levels for both B-spline and Fourier basis functions when all variants were used to generate genotype data (bottom parts of columns 8–11 of Table 6). For the GFLM Het-LRT, the empirical type I error rates are slightly higher than the nominal α levels when all variants were used to generate genotype data (top parts of columns 8–11 of Table 6), and the GFLM Het-LRT statistics can inflate type I error rates.

When only rare variants were used to generate genotype data, the empirical type I error rates are much higher than the nominal α levels for both B-spline and Fourier basis functions for GFLM Het-LRT statistics (top parts of columns 8–11 of Table 5). Relatively, the empirical type I error rates of GFLM Hom-LRT statistics are only slightly higher than the nominal α levels for both B-spline and Fourier basis functions (bottom parts of columns 8–11 of Table 5).

In Fan *et al.* (2014), it was found that the Rao’s efficient score test statistics are very conservative when the sample is small or moderate from a single study (*i.e.*, the sample ranges from 200 to 2000). Hence, the results of this paper are consistent with those of Fan *et al.* (2014) for the Rao’s efficient score test statistics. When the sample is smaller than or equal to 2000, Fan *et al.* (2014) found that the LRT statistics inflate the type I error rates. In this paper, we have a very big sample size of 7000 by combining three studies for a unified analysis, and the GFLM Hom-LRT controls the type I error rates correctly, but the GFLM Het-LRT still may inflate the type I error rates.

In short, the chi-squared-distributed Rao’s efficient score test statistics of GFLMs Hom-Rao and Het-Rao are very conservative. If the sample size is large, GFLM Hom-LRT statistics control the type I error rates well when all variants were used to generate genotype data and can slightly inflate the type I error rates when only rare variants were used to generate genotype data. The GFLM Het-LRT statistics may inflate the type I error rates, which may be due to the large degrees of freedom.

#### Statistical power results:

We compared the power performance of the proposed tests with MetaSKAT based on the simulated COSI sequence data. The empirical power levels at the level are plotted in Figure 1, Figure 2, Figure 3, Figure 4, Figure S1, Figure S2, Figure S3, and Figure S4. In all theses figures, “GVF&Beta, B-sp” (or “GVF&Beta, F-sp”) means that both GVF and the genetic-effect function were smoothed by B-spline (or Fourier) basis functions, and “Beta, B-sp” (or “Beta, F-sp”) means that only the genetic-effect function was smoothed by B-spline (or Fourier) basis functions (*i.e.*, β-smooth-only). Moreover, the results of Het-MetaSKAT, Het-MetaSKAT-O, Hom-MetaSKAT, and Hom-MetaSKAT-O using the R package MetaSKAT are reported for power comparison (Lee *et al.* 2013).

In Figure 1, Figure 2, Figure 3, and Figure 4, the results of GFLM Hom-Rao are reported, and the Rao’s efficient score test statistics are constructed using the homogeneous effect model. In Figure S1, Figure S2, Figure S3, and Figure S4, the results of GFLM Het-Rao are reported, and the Rao’s efficient score test statistics are constructed using the heterogeneous effect model. Moreover, the results of AEM Het-Rao for the additive-effect model (1) are reported in each figure for a comparison. In Figure 1, Figure 2, Figure S1, and Figure S2, the simulated data are generated under the assumption of homogeneous genetic effect, and in Figure 3, Figure 4, Figure S3, and Figure S4, the simulation data are generated under the assumption of heterogeneous genetic effect (Table 4).

When some causal variants are rare and some are common, the GFLM Hom-Rao has higher power than MetaSKAT and MetaSKAT-O for scenarios 1 and 2 and has similar power as MetaSKAT and MetaSKAT-O for scenario 3 in Figure 1 and Figure 3. The GFLM Het-Rao also has higher or similar power as MetaSKAT and MetaSKAT-O in Figure S1 and Figure S3. Therefore, the proposed Rao’s efficient score test statistics have good power performance when some causal variants are rare and some are common. By a comparison of power levels in Figure 1 *vs.* Figure S1 and Figure 3 *vs.* Figure S3, the power levels of the GFLM Hom-Rao are generally higher than those of GFLM Het-Rao, which may be due to the large degrees of freedom of the GFLM Het-Rao. The AEM Het-Rao has slightly lower power levels than GFLM Hom-Rao and GFLM Het-Rao but performs well.

When the causal variants are all rare, the GFLM Hom-Rao has slightly lower or similar power levels as MetaSKAT and MetaSKAT-O in Figure 2, Figure 4, Figure S2, and Figure S4. Again, the power levels of the GFLM Hom-Rao in Figure 2 and Figure 4 are generally higher than the corresponding power levels of GFLM Het-Rao in Figure S2 and Figure S4. The AEM Het-Rao has low power levels.

In each graph, we compared five Rao’s efficient score test statistics: one is based on the additive-effect model (1), two are based on B-spline basis functions, and two are based on Fourier basis functions. In the two Rao’s efficient score test statistics to use B-spline (or Fourier) basis functions, one is to smooth both the GVFs and the genetic-effect function , and the other is only to smooth the genetic-effect function (*i.e.*, β-smooth-only). The four Rao’s efficient score test statistics of the GFLMs have similar power. The power levels of β-smooth-only are almost identical to those of smoothing both the GVFs and genetic-effect function by B-spline basis (or Fourier basis). Thus, the tests do not strongly depend on whether the genotype data are smoothed or not. In addition, the Rao’s efficient score test statistics do not strongly depend on which basis functions are used. We also calculated the empirical power levels of the LRT statistics, which provide very similar empirical power levels as the Rao’s efficient score test statistics (data not shown).

## Discussion

In this paper, GFLMs are developed to perform a meta-analysis of multiple case-control studies to connect genetic data to dichotomous traits adjusting for covariates. Based on the GFLMs, chi-squared-distributed Rao’s efficient score test and LRT statistics are introduced to test for an association between a complex trait and multiple genetic variants. Extensive simulations are performed to evaluate empirical type I error rates and the power performance of the proposed GFLMs and tests. We show that the proposed Rao’s efficient score test statistics are very conservative. The Rao’s efficient score test statistics have higher power than MetaSKAT when some causal variants are rare and some are common. When the causal variants are all rare (*i.e.*, MAF < 0.03), the Rao’s efficient score test statistics have similar or slightly lower power than MetaSKAT. For homogeneous genetic effect models, the GFLM Hom-LRT generates accurate type I error rates. For heterogeneous genetic models, the GFLM Het-LRT may inflate type I error rates owing to large degrees of freedom. The GFLMs and related test statistics can be useful in whole-genome and whole-exome association studies.

The GFLMs and AEM were applied to analyze the genetic data of 22 gene regions of T2D data from a meta-analysis of eight European studies and detected significant associations for 19 genes (*P* < 3.1 × 10^{−6}), tentative association for 1 gene (*P* ≈ 10^{−5}), and no association for 2 genes, while MetaSKAT detected none. Because the 22 genes are from the literature on T2D showing that each of them contains SNPs that are associated with T2D, the association is confirmed by our fixed models and related tests for the 19 genes, although MetaSKAT failed to confirm any of the associations. One may note that the European cohorts were analyzed by MetaSKAT in Lee *et al.* (2013), but no results were reported for the dichotomous traits of T2D.

Unlike other methods such as SKAT or MetaSKAT, which are based on mixed-effect models, GFLMs are fixed-effect models, and the genetic effects of multiple genetic variants are assumed to be fixed. The formulation of the β-smooth-only model (2) is similar to that of SKAT and MetaSKAT. However, the assumptions are totally different. Specifically, the regression coefficients of genetic variant terms in the models of SKAT and MetaSKAT are random, while the genetic effects in model 2 are fixed unknown functions. Our GFLMs are a natural extension of traditional population genetics without a polygenic term because we consider the population data. By using functional data-analysis techniques, we develop procedures to estimate the genetic-effect functions and introduce test statistics to test for an association.

If the causal genetic variants are all rare, the number of causal rare variants is large, and each contributes a small amount to the trait, it would be reasonable to assume the genetic contribution of major gene loci to be random, and then the mixed models of SKAT and MetaSKAT can be valid. In our power comparison, we found that the proposed Rao’s efficient score test statistics have similar or slightly lower power than MetaSKAT when the causal variants are all rare. However, the proposed Rao’s efficient score test statistics have higher power than MetaSKAT when some causal variants are rare and some are common (in this case, it is likely that the effects of a few genetic variants of the major gene locus are large, so fixed-effect models perform well). It is noteworthy that this paper deals with dichotomous traits. For quantitative traits, it was found that functional linear models lead to both *F*- and chi-squared-distributed score test statistics that are more powerful than SKAT and MetaSKAT (Luo *et al.* 2012; Fan *et al.* 2013).

In the proposed models and tests, we do not make any assumptions about whether the genetic variants are rare or common variants or a combination of the two. The proposed models are very flexible and can analyze rare or common variants or a combination of the two. We do assume that the number of genetic variants in a genetic region is large, which is true for modern genetic data. When a large number of genetic variants are available in a genetic region, estimation of the GVF is accurate, which makes our GFLMs very reliable. In our simulation and data analysis, models 2 and 4 perform very close to each other.

In Fan *et al.* (2013, 2014), we investigated the performance of the mixed models by making the regression coefficients β of genetic-effect function random in the frameworks of our functional regression models. It was found that the mixed models perform well only when the causal genetic variants are all rare and the traits are dichotomous (for rare variants, we used an artificial cutoff of 0.03). For most diseases, the causal variants can be both rare and common. Because the proposed models are very flexible in analyzing rare or common variants, we focus on fixed-effect models in this paper. In our simulations, we treat the regression effect of covariates as heterogeneous. We also investigate the performance of the proposed models by treating the regression effect of covariates as homogeneous, and we find that the results are similar in terms of empirical power performance and type I error rates.

For small- and moderate-sample-size single studies when the sample sizes are smaller than or equal to 2000, the LRT statistics of GFLMs were found to inflate the type I error rates, while chi-squared-distributed Rao’s efficient score test statistics control type I error rates correctly (Fan *et al.* 2014). Hence, Rao’s efficient score test statistics were recommended for small- and moderate-sample-size single studies. In this paper, we show that Rao’s efficient score test statistics control the type I error rates correctly when the sample sizes of combined multiple studies are large. For homogeneous genetic-effect models, the LRT statistics were found to have correct type I error rates; for heterogeneous genetic-effect models, the LRT statistics inflated the type I error rates. Therefore, one needs to be cautious about using LRT statistics for dichotomous traits. For quantitative traits, both the LRT and *F*-distributed statistics have correct type I error rates and good power performance for a sample with a sample size ≥ 1500 (Fan *et al.* 2013).

The proposed method requires individual genotype data and is more powerful than MetaSKAT and MetaSKAT-O when genotype data are available from all studies. However, the proposed method cannot meta-analyze summary statistics. If only summary statistics of GFLMs are available from different studies using Fan *et al.* (2014), it is still an open question as to how to use them for a meta-analysis.

## Acknowledgments

Two anonymous reviewers and the editors, Dr. Chiara Sabatti and Dr. Gary Churchill, provided very good and insightful comments for us to improve the manuscript. We greatly thank the European cohorts groups for letting us analyze the data and use them as examples. Dr. Heather M. Stringham and Dr. Tanya M. Teslovich kindly sent us the data of the European cohorts and patiently answered many questions about the cohorts, and we greatly appreciated them. This study used the high-performance computational capabilities of the Biowulf Linux cluster at the National Institutes of Health (http://biowulf.nih.gov). No GWAS data were generated in this paper. This study was supported by the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health (R.F., Y.W., and C.-y.C.), by NIH grants R01-EY024226 and R01-HG007358 (to W.C.), by the University of Pittsburgh (R.F. is an unpaid collaborator on grant R01-EY024226), by NIH grants R01-HG006292 and R01-HG006703 (to Y.L.), and by NIH grants LM-009012 and LM-010098 (to J.H.M).

## Footnotes

*Communicating editor: C. Sabatti*Supporting information is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.115.180869/-/DC1

- Received July 18, 2015.
- Accepted December 9, 2015.

- Copyright © 2016 by the Genetics Society of America