## Abstract

Schizophrenia is a psychiatric disorder with large personal and social costs, and understanding the genetic etiology is important. Such knowledge can be obtained by testing the association between a disease phenotype and individual genetic markers; however, such single-marker methods have limited power to detect genetic markers with small effects. Instead, aggregating genetic markers based on biological information might increase the power to identify sets of genetic markers of etiological significance. Several set test methods have been proposed: Here we propose a new set test derived from genomic best linear unbiased prediction (GBLUP), the covariance association test (CVAT). We compared the performance of CVAT to other commonly used set tests. The comparison was conducted using a simulated study population having the same genetic parameters as for schizophrenia. We found that CVAT was among the top performers. When extending CVAT to utilize a mixture of SNP effects, we found an increase in power to detect the causal sets. Applying the methods to a Danish schizophrenia case–control data set, we found genomic evidence for association of schizophrenia with vitamin A metabolism and immunological responses, which previously have been implicated with schizophrenia based on experimental and observational studies.

SCHIZOPHRENIA, a mental disorder affecting ∼1% of the population worldwide, is characterized by hallucinations, delusions, and cognitive deficits. From twin studies, the heritability of schizophrenia has been estimated to (Sullivan *et al.* 2003). Schizophrenia is a complex disorder caused by both genetic and environmental factors. High personal and social costs, combined with the desire for better diagnosis and treatment, have led to numerous studies trying to identify the predisposing genetic loci.

Genotyping of several thousand affected and nonaffected individuals has formed the basis for numerous genome-wide association studies (GWAS), and several genome-wide significant genetic markers have been identified (International Schizophrenia Consortium 2009; Schizophrenia Psychiatric Genome-Wide Association Study Consortium 2011; Schizophrenia Working Group of the Psychiatric Genomics Consortium 2014). Evidence collected across studies points toward the susceptibility to schizophrenia being highly polygenic, with thousands of independent common variants with small effects likely contributing to the risk. Recently, 108 loci were identified in the largest GWAS to date (Schizophrenia Working Group of the Psychiatric Genomics Consortium 2014). Around 23–32% of the variation on the liability scale to schizophrenia is captured by common single-nucleotide polymorphisms (SNPs) (International Schizophrenia Consortium 2009; Lee *et al.* 2012; Ripke *et al.* 2013). Also, it has been suggested that the genetic risk for schizophrenia is due to a combined effect of common and rare markers (Gratten *et al.* 2014; Kato 2015; Loh *et al.* 2015). It is therefore a challenge to pinpoint the genetic etiology of polygenic diseases, when the applied method mainly detects the fraction of genetic markers with the relatively largest effect (Hirschhorn and Daly 2005; McCarthy *et al.* 2008; Fridley and Biernacka 2011; Wang *et al.* 2011).

Realizing the limitations of traditional single-marker analysis has promoted the development of alternative, or complementary approaches for GWAS. Adapting ideas from gene expression microarrays (Goeman *et al.* 2004; Subramanian *et al.* 2005; Goeman and Bühlmann 2007; Newton *et al.* 2007), several set test methods have been proposed (Wang *et al.* 2007, 2010; Holmans *et al.* 2009; Network and Pathway Analysis Subgroup of the Psychiatric Genomics Consortium 2015). These methods rely on the collective action of multiple SNPs aggregated into defined sets. The aggregation of genetic markers is based on the physical position of the SNPs and their proximity to known genes. Aggregation based on biological networks arises from the knowledge that genes interact in complex networks and cellular pathways (Schadt 2009), and associated markers often are enriched in genes or regulatory regions of genes that are connected in networks and pathways (Allen *et al.* 2010).

Set test methods examine whether a summary statistic for a set of markers deviates from a summary statistic computed for a random set of markers. Summary statistics can be obtained in several ways (Fridley and Biernacka 2011; Mooney *et al.* 2014). Commonly, single-marker test statistics are computed for all markers. These measure the association between a marker and phenotype. Subsequently, for a set of genetic markers a summary statistic is computed and to test for enrichment the observed summary statistic is compared to an empirical distribution of summary statistics. The single-marker test statistics can be obtained in several ways, such as from single-marker associations, *e.g.*, logistic regressions (*i.e.*, or *P*-values), or from multimarker associations, such as the score test approaches (*e.g.*, sequence kernel association test (SKAT)) (Goeman *et al.* 2004, 2006; Wu *et al.* 2011; Wang *et al.* 2013). Alternatively, test statistics could be derived from a “black-box” approach, namely genomic best linear unbiased prediction (GBLUP). This involves solving linear mixed models for fixed and random effects simultaneously and incorporating genomic relationships among individuals, the predicted genetic effects for the individuals are obtained. Here, we attempted to open this black-box modeling approach and propose a new summary statistic for associating sets of SNPs to schizophrenia. We call this new summary statistic the covariance association test (CVAT).

The aim of this study was twofold. First, it aimed to compare the performance of our proposed CVAT to other commonly used set test methods. This was achieved by simulating a population with a disease prevalence corresponding to that of schizophrenia, from which we sampled our study population. Second, it aimed to apply the methods to a previously published Danish schizophrenia case–control study (Børglum *et al.* 2014).

## Methods

To derive CVAT and other summary statistics derived from GBLUP we begin the *Methods* section by describing GBLUP, and then we derive CVAT. Then we describe other related summary statistics and provide a statistical comparison between some of the methods. The *Methods* section ends with describing the schizophrenia data and how we simulated the case–control data.

### GBLUP

The purpose of this section is to obtain individual genetic effects (), which later will be partitioned into feature-specific genetic effects () when used in the CVAT. The genetic effects are obtained by fitting the linear mixed model,(1)where is a vector of disease status; *i.e.*, 0 = control, 1 = case. and are design matrices linking fixed and random effects to respectively. Here, (I is an identity matrix), thus, can be omitted in the following expressions. is a vector of fixed effects, is a vector of random genetic effects, and is a vector of the residual effects. The genetic effects are assumed to be normally distributed and defined as where is the genomic relationship matrix computed based on all genetic markers: Here, is the number of markers and is a centered and scaled genotype matrix. Each column vector of is where is the minor allele frequency (MAF) of the *i*th marker, and is the *i*th column vector of the allele count matrix, containing the genotypes encoded as 0, 1, or 2, counting the number of minor alleles (VanRaden 2008).

Estimates of the fixed () and random effects () are obtained as the best linear unbiased estimator (BLUE) and the best linear unbiased predictor (BLUP), Equations 2 and 3, respectively:(2)(3)In Equations 2 and 3, is the estimated phenotypic covariance, given by Single-marker effects can be computed as(4)and the (co)variance of the single-marker effects as(5)To test the null hypothesis of a *t*-test statistic for was computed as(6)where is the estimate of the variance of the *i*th element of obtained from the *i*th element of the diagonal of the covariance matrix of the single-marker effects. Under the null hypothesis, *i.e.*, follows a *t* distribution with residual degrees of freedom (Cule *et al.* 2011). is computed as which is equivalent to such that is the total number of observations, and represents the degrees of freedom occupied by the penalized fit (*i.e.*, the linear mixed-model fit). is the hat matrix that transforms into and thus (Nelder *et al.* 1965).

Predicting the genetic effects in Equation 3 requires the estimated variance component for the genetic effect Here, the variance components are estimated using the Newton–Raphson algorithm. The heritability is defined as the proportion of total phenotypic variation that is due to additive genetic components (Lynch and Walsh 1998), Estimates of from SNP data will give the proportion of phenotypic variation captured by common SNPs; however, for case–control studies such estimates are biased, because the scale at which the phenotype is measured differs from the scale at which the heritability is expressed (Lee *et al.* 2011). Also, the proportion of cases is larger than the disease prevalence in the population; thus, the heritability will be biased by ascertainment (Lee *et al.* 2011). To transform the observed SNP heritability to the liability scale and to adjust for ascertainment bias we used a transformation proposed by Lee *et al.* (2011),(7)such that is the disease prevalence in the population, is the proportion of cases in the sample, and is the height of the standard normal curve at the truncation point; *i.e.*,

### Tests for association between genomic feature and phenotype

In the previous section we obtained a measure of association between single genetic markers and the phenotype (Equations 4 and 6). However, for highly polygenic traits, *i.e.*, traits under the influence of many causal variants with small effects across the genome, detecting causal variants is extremely challenging. Therefore, we argue that methods that exploit the inherent characteristics of complex traits, *i.e.*, many genetic markers with small effect sizes scattered around the genome in a nonrandom fashion, will have increased power to detect sets of markers enriched for causal variants. These methods are commonly known as set tests, and a variety of methods have been developed (*e.g.*, Fridley and Biernacka 2011; Mooney *et al.* 2014).

Here we describe a new method for associating sets of SNPs to a phenotype, namely CVAT. The CVAT summary statistic measures the covariance between the joint genetic values for markers in the feature set and the genetic values for the remaining markers outside the feature set. A feature set is a collection of markers grouped according to known genomic or biological structures, such as genes, gene ontologies, pathways, etc.

Following the derivation of CVAT we describe three commonly used summary statistics and compare the different methods mathematically. We end this section by defining the null hypothesis and how this is tested.

#### CVAT:

The CVAT summary statistic ( Equation 8) captures the covariance between the total genetic effect from all markers and the genetic effect from the markers within the feature set (8)The subscripts f and r signify whether the genetic effects are computed based on markers within the feature set or based on the remaining markers, respectively. contains the contribution from the genetic values of the markers within feature and the covariance between genetic values for the markers in the feature and outside the feature set. is derived from the explained sum of squares (9)The first term in Equation 9 constitutes the part of that is explained by the genetic markers linked to the genomic feature of interest, the second term is the part explained by the markers not included in the feature set, and the last term is the covariance between the genetic values for the two marker sets. The first two terms will always take positive values, whereas the covariance term can be either negative or positive. Large values of indicate a genomic feature contributing strongly to Because of LD may contribute to It may be helpful to rewrite the total genetic effects (Equation 3). When we get(10)Using and the two terms in (Equation 8) can be written as(11) and are the observed covariance between the adjusted phenotype and the markers within the feature set and the markers not included in the feature set, respectively. is a correlation matrix quantifying the observed linkage disequilibrium (LD) between all markers. The matrix can be partitioned into submatrices accordingly to the genetic variants within or outside the genomic feature:(12)The block diagonals, and harbor the correlation among markers in the genomic feature and among the markers not included in the feature set, respectively. The off-diagonal element, is the correlation between the markers in and outside the feature set.

The predicted genetic effects are assumed to be drawn from the same distribution, namely It is likely, though, that the genetic effects come from a mixture of distributions, *e.g.*, are caused by differences in MAFs (Loh *et al.* 2015). Therefore, we extend CVAT, allowing the genetic effects to have different distributions caused by different underlying partitioning. This is obtained using our genomic feature BLUP (GFBLUP) in which an additional variance component is added to the traditional GBLUP (Edwards *et al.* 2016; Sarup *et al.* 2016):(13)The notation is similar to Equation 1 except and are vectors of genetic values captured by genetic markers linked to different subsets. The random genetic and residual effects were assumed to be independent and distributed as and The genomic relationship matrices and are computed based on the subset of markers, and Here we illustrate this by partitioning using different MAF boundaries, *e.g.*, < MAF 0.05, and > MAF 0.05 (see Figure 4 and Table 1 for the boundaries used for the simulated and observed data). The two genetic effects for that is, and are then based on the genetic marker effects now arising from two distributions:

#### Comparable summary statistics:

In the following we describe three commonly used summary statistics. The first type is based on counting the number of markers in the feature set that showed association with the phenotype:(15)In Equation 15, is the number of markers in the feature set, is a test statistic for the *i*th marker (*e.g.*, a *P*-value of SNP association), is an arbitrary threshold for the marker test statistic, and is an indicator function that takes the value one if the argument is satisfied, and otherwise it is zero. When defining the value of arbitrariness is introduced. Markers whose test statistic was slightly higher than would not be captured in such a model and therefore not be considered. Such tests have high power to detect association if the feature set harbors genetic markers with large effects, but it will not detect sets with many markers with small to moderate effect sizes (Newton *et al.* 2007). Therefore, it might be more powerful to use a summary statistic that considers all markers within the feature set. The second summary statistic was based on the sum of the squared test statistics for all markers belonging to a genomic feature:(16)In Equation 16, is the test statistic for the *i*th marker and is the number of markers in the feature set. We used the squared test statistic, because some types of test statistics can take both positive and negative values.

The third approach is a score test statistic, which is derived from the first derivative of the likelihood as is Rao’s score test (Bera and Bilias 2001). One of the major differences compared to Rao’s score test is that only the quadratic term (in the first derivative) forms the basis of these test statistics (Goeman *et al.* 2004; Wu *et al.* 2011; Wang *et al.* 2013). The argument is that this is the only part that involves the data (Huang and Lin 2013). The score test statistic can therefore be written as(17)In Equation 17 where the subscript f indicates markers within the feature. The score test statistic requires and and these depend on how the null model is specified. The purpose of the null model is to adjust for environmental nongenetic effects and genetic factors not part of the genomic feature. If the null model is specified as the GBLUP (Equation 1), the score statistics can be expressed as(18)where which corresponds to centering the observed values by the fixed effects and scaling by the inverse of the estimated phenotypic covariance. In the case of the simulated data, the null model is while for the observed disease data the null model also includes an adjustment for population structure and sex differences; thus, Therefore, From Equation 18 it can be seen that the score statistic is the covariance between the adjusted phenotypic values and the SNPs within the feature. Thus, close to zero indicates low covariance between SNPs within the feature set and the phenotypic values.

#### Comparison of summary statistics:

The purpose of this paragraph is to show the similarity and dissimilarity between the summary statistics described. First, we show that and are proportional when GBLUP marker effects ( Equation 4) are used:(19)The result in Equation 19 can also be expressed for the markers within the feature set:(20)Combining the results in Equation 20 with the expression for (Equation 18), we get(21)which is proportional to when single-marker effects are used as test statistics:(22)If the parameters are estimated under the GBLUP model, and can be compared directly. Compared to also considers the covariance among markers within the feature set and markers outside the feature set (Equation 8). Even if the contribution of individual markers to the summary statistic is weighted according to the degree of correlation among the markers (Equations 11 and 12).

#### Defining and testing :

The null hypothesis we consider is a competitive test; *i.e.*, the degree of association between a particular set of genetic markers and a phenotype is equal to the degree of association to other complement sets, thus, the number of genetic markers and correlation structure among markers should be equal to the set of interest (Goeman and Bühlmann 2007).

To test whether a set of markers displays the same degree of association as a complement set, the observed summary statistic was compared to an empirical distribution of summary statistics. First, the observed test statistics were ordered according to the markers’ physical positions and next, one such element was randomly selected. All elements were then shifted to a new position, such that the one picked was the first element, and the remaining markers were shifted to new positions, but in the same order as originally, *i.e.*; A new summary statistic was computed based on the original position of the genomic feature. This uncoupled the association between the markers and the genomic feature, but retained the correlation structure among test statistics. The permutation was repeated 10,000 times for each set in the feature class and an empirical *P*-value was obtained as a one-tailed test of the proportion of the randomly sampled summary statistics being larger than the observed summary statistic.

### Schizophrenia case–control data

The method described above was applied to a schizophrenia case–control study previously published (Børglum *et al.* 2014). The data consist of 1770 individuals born in Denmark between 1981 and 2006: 882 controls and 888 schizophrenia cases with a diagnosis according to the ICD-10 Classification of Mental and Behavioral Disorders Diagnostic Criteria for Research (ICD-10-DCR), F20. Cases and controls were matched by age and sex. The genotypes were subject to quality control, as described in Børglum *et al.* (2014). The Danish Data Protection Agency and the ethics committees in Denmark approved the human subjects protocol. Using the *Variant Effect Predictor* from Ensembl (McLaren *et al.* 2010), the marker positions were updated from human genome assembly hg18, NCBI Build 36 to UCSC hg38. In this process we restricted the genotypes to be autosomal, resulting in a total of 520,897 SNPs.

### Genomic feature classes

Numerous genomic feature classes can be constructed. We created sets of genetic markers defined by gene ontologies (GOs) and immunological signatures. Annotated markers were grouped by GOs, using the Bioconductor package “org.Hs.eg.db” (Gentleman *et al.* 2004; Carlson 2014), and immunological signatures were obtained from the Molecular Signature Database (MSigDB) (Subramanian *et al.* 2005; Godec *et al.* 2016).

### Simulating case–control populations

To evaluate the performance of CVAT and to compare it to other set test methods, several simulations were established. To create simulations directly comparable to the observed schizophrenia case–control data, we simulated ∼2000 individuals with the proportion of cases:controls of 0.5. Cases were drawn from a population with a disease prevalence (*K*) corresponding to that of schizophrenia; *i.e.*, Thus, to obtain a total of = 100,000 individuals were simulated and cases were obtained such that which gave ∼1000 cases (varied between simulations). The 1000 controls were sampled from the remaining individuals. We simulated a total of = 80,000 SNPs distributed along 22 autosomes with the same correlation structure as in the human populations (Janss *et al.* 2012). The phenotypes were simulated based on 1000 causal markers. To obtain marker effects with different effect sizes, the causal variants were divided into two groups : and The variants in were sampled from 20 regions of 25 markers (to imitate genes), whereas the variants in were sampled randomly from the remaining markers. To obtain different effect sizes of the variants in several scenarios were established such that the proportion of genetic variance explained by ranged from 0.05, 0.1, 0.2, 0.3, 0.4, to 0.5. The total phenotypic variation explained by the genetic variation and thus the total heritability was set to the proportion of variance in liability attributable to genome-wide markers estimated from our schizophrenia data set ( see *Results*). To obtain the same as in the observed schizophrenia data, the phenotypes were simulated with the heritability on the liability scale Thus, after selecting the 2000 individuals we obtained For each scenario a total of 50 data sets were obtained. To investigate the effect of noncausal markers within the causal sets of variants, we added an increasing number of noncausal markers (100, 200, ... , 4900, 5000), to the causal sets, which we refer to as the dilution effect. To determine the false-positive rate, 50 noncausal marker sets of varying sizes (10 sets each containing 100, 500, 1000, 5000, or 10,000 SNPs) were sampled, none of which were contained in the causal variants.

The simulations consisted of five steps (see *Algorithm* below). First, a vector of MAFs is obtained. To simulate genotypes according to a particular correlation structure between the markers a correlation matrix was simulated (). The *j*th genotype of each *i*th individual was simulated as individual alleles [ and ], and the sum of the two alleles produced the genotype (). Each allele was simulated using a multivariate normal distribution, and to convert these to binary genotypes, truncation of the multivariate normal distribution was performed. However, truncation of the normal distribution would change the correlation structure within the binary data, and thus We applied an algorithm proposed by Leisch *et al.* (1998) such that after the multivariate normal data were simulated based on we obtained the discretized genotypes (0, 1, 2) with the desired Finally, the binary phenotypes () were simulated from a normal distribution as where is a vector of fixed effects, is a vector of genetic effects [ where is the marker effects of the causal variants within and ], and is the residual effect. The binary phenotypes were then obtained using truncation of the inverse of the standard normal distribution on using the disease prevalence.

To illustrate a potential extension of CVAT we performed a second set of simulations. Here, 50% of the causal genetic variants were set to have a MAF to mimic a scenario where the disease is influenced by both common and rare variants (Loh *et al.* 2015).

### Algorithm

Given the vector of MAFs, calculate

For a matrix simulate and

such that

where

### Assessing the power of summary statistics:

The performance of the different set tests was assessed using the *F*_{1} classification score:(24)The *F*_{1} score is the harmonic mean of precision () and recall () (Powers 2011). is the proportion of true positives (TP) that are correctly identified, *i.e.*, the ratio between the number of identified causal sets and the number of sets that should have been identified and thus the sum of TP and false negatives (FN). Contrarily, is the proportion of true positives that truly are positives, *i.e.*, the proportion of true causal sets of all sets identified and thus the sum of TP and false positives (FP). The *F*_{1} score can take values between 0 and 1, with maximum performance at the value of 1.

### Software implementation

The methods described above have been implemented in the R package qgg, which is available at http://psoerensen.github.io/qgg/.

### Data availability statement

To protect the identity of the Danish individuals, the genotypic data generated for the Danish study are not publicly available. Data are available through the Aarhus University Data Access Committee for researchers who meet the criteria for access to confidential data. To request access to the data, contact John Westensee at jwe@au.dk.

## Results

### Quantitative genomic parameters

Quantitative genomic parameters for schizophrenia were obtained by fitting the linear mixed model as described in Equation 1, in which we accounted for sex and population structure by incorporating the first principal component. The proportion of phenotypic variance captured by common genetic markers was estimated as = 0.29, and accounting for ascertainment bias (Equation 7) we obtained = The heritability on the liability scale was used in our simulations, and across all simulation scenarios we obtained = 0.27.

### Lessons from simulations

Single-marker association has been used to study complex traits, but for highly polygenic traits such methods may have limited power to detect causal variants. To illustrate this, we performed a logistic regression on the simulated data. The *F*_{1} classification score was used to describe the power to detect the causal variants (Figure 1). This clearly showed limited power to detect causal variants in *C*_{1} or *C*_{2}. The power to detect the main causal effects () increased as the proportion of genetic variance explained by the markers in () increased (Figure 1); however, the power remained very limited, not exceeding an *F*_{1} score of 0.15. Thus, given the genetic model of our simulations, using a logistic regression with an arbitrary significance level (*P* < 0.001), a large proportion of noncausal genetic markers were identified, and a minor proportion of the causal variants were actually identified, altogether resulting in an *F*_{1} score <0.05 (Figure 1).

The main objective of our study was to compare the performance of CVAT to that of other set test methods. Overall, as the proportion of genetic variance explained by increased, the power to detect the causal sets increased (Figure 2). Across the different genetic architectures, CVAT performed equally to the score method (Figure 2). The sum of marker effects and the sum of *t*-test of marker effects performed at an equal level (Figure 2). Counting the number of markers with a *P*-value below an arbitrary significance level showed the lowest performance, when the proportion of genetic variance explained by the main causal markers was low (*i.e.*, = 0.05 and = 0.10); however, as increased this summary statistic performed better than both and (Figure 2).

Adding noncausal markers to the causal sets, *i.e.*, diluting the causal effects, resulted in the same trend across all methods: *F*_{1} was reduced as the dilution effect got larger (Figure 3). Also, as increased, *F*_{1} increased. At low counting the number of markers below a significance threshold started at a lower *F*_{1} score than the other methods, and and tend to have the lowest performance at the largest dilution effect.

When data were enriched for causal variants with low MAF, extending the CVAT to utilize marker effects with different distributions increased the power considerably (Figure 4). When the proportion of genetic variance explained by the main causal effects was large, the *F*_{1} scores were close to its maximum, and thus the largest gain in performance was retrieved at intermediate to low

### Gene ontologies and immunological signatures associated with schizophrenia

We applied the six genomic BLUP-derived set tests to the Danish schizophrenia case–control data. To obtain genomic feature adjusted *P*-values we applied a false discovery rate on all the empirical *P*-values. As an additional constraint, feature sets were assigned as significant only if four of the six methods resulted in an adjusted *P*-value <0.05.

We utilized two types of prior biological knowledge to create a group of markers: GOs and immunological signatures. A total of 38 different GOs were associated by at least one method, and 11 GOs were associated by at least four of the methods (Table 1 and Supplemental Material, Table S1). A total of 16 different immunological signatures were associated by at least one of the methods, while only 1 was associated by more than four methods (Table 1 and Table S2). Two main categories of associated GOs could be identified. The first group contained four GOs related to developmental processes (GO:0021983, GO:0001568, GO:0060324, and GO:0043403), while the second group contained GOs related to vitamin A metabolism, including retinol metabolism (GO:0006776, GO:0042572, GO:0042573, and GO:0042574). The three remaining GOs were related to gene expression (GO:0010628), phosphorylation activity (GO:0008138), and light detection (GO:0050908). The immunological signature was GSE17721, which represents dendritic responses to PAM_{3}CSK_{4} (an agonist to the membrane protein TLR2).

Extending CVAT to utilize marker effects based on the GFBLUP model to account for a mixture distribution entailed more feature sets having a false discovery rate (FDR) < 0.05 (Table 1). Examining the distribution of marker effects from GBLUP and GFBLUP clearly showed a mixture of two distributions (Figure 5). This pattern could be caused by a general enrichment of low MAFs within the associated genomic features; however, this was not observed (Figure 6).

## Discussion

In the present study, we compared the performance of our newly proposed CVAT to a selection of commonly used set test methods. By simulating genotypes and binary phenotypes we investigated the power to detect causal sets of genetic markers. The simulations provided knowledge of how the different methods perform under different genetic scenarios. We applied the six methods to a schizophrenia case–control data set with the aim of identifying genomic features associated with schizophrenia.

### From single-marker association to set tests

Despite the success of single-marker analyses, identification of genetic markers associated with highly polygenic traits is more restricted. Markers may be truly associated, but with effect sizes too small to be detected by traditional single-marker association analyses, or fail to pass an arbitrary, stringent genome-wide significance threshold (Hirschhorn and Daly 2005; Wang *et al.* 2010, 2011). Using simulations we investigated this particular problem and found indeed, that for highly polygenic traits the power to detect causal variants was very limited (Figure 1). Thus, for large-scale genomic data and polygenic traits, there is a need to get past single-marker analysis and move toward methods that take into account the collective action of multiple markers.

Set tests have been suggested as methods that can increase the power to detect groups of genetic markers containing causal variants. Comparing the performance of CVAT to existing methods showed that in all scenarios CVAT performed equally to the score-based approach and that both CVAT and the score approach outperformed the other methods (Figure 2). Summary statistics based on the sum of single-marker test statistics were expected to have more power to detect sets harboring markers with small to moderate effect size, in contrast to the count of the number of “significantly” associated markers within the set (Newton *et al.* 2007). This was observed at low and as the effect size of the causal markers increased, due to increased the count-based method outperformed the sum methods (Figure 2). This was expected because the count-based method tends to have higher power to detect feature sets harboring markers with large effect sizes (Newton *et al.* 2007). If and are obtained from the same model, they are proportional and therefore should have the same power (Equations 17–20). Here, and were not based on the same model: was based on GBLUP (Equation 1), while was based on a null model that did not include any genetic components, The disparity in performance may arise because the GBLUP only in part captures the true genetic signal into because it relies on separation of the genetic and residual components, while is based on a response variable containing most of the true genetic signal plus random noise, *i.e.*, and hence no separation of the two components.

Recent findings indicate that low-frequency markers may have larger effect sizes (Loh *et al.* 2015), thus resulting in a mixture distribution of marker effects. When the simulated data were enriched for causal variants with low MAFs, a large improvement in the power to detect the causal sets was observed (Figure 4). We used a two-component partitioning (Equation 13), fitting two genetic effects jointly. Multiple genetic components can be fitted (Speed and Balding 2014); however, if the correlation between the multiple correlation matrices is high, uncertainties of the variance components increase (Gusev *et al.* 2014; Edwards *et al.* 2016), which affects the accuracy of the marker effects. Therefore, using marker effects from multigenetic components might decrease the power of CVAT because the marker effects are estimated with bias. However, this should be investigated further. Other extensions of CVAT can easily be implemented, including dominance effects, genotype-by-genotype interactions, and genotype-by-environmental effects. Since CVAT was derived from GBLUP, extending CVAT to a multitrait model is straightforward.

All analyses performed in our study were based on individual genotypes. Access to individual genotypes may be restricted, and thus methods that utilize published marker test statistics do have advantages. Both the score approach and CVAT require individual genotypes, whereas the count- and sum-based methods require only individual marker test statistics. There is a trade-off between the power of the set test and the availability of genotypes: If the premise is a data set containing both response variables and large-scale genotype variant data, then CVAT, with its many possible extensions, is an attractive method.

### Biological insights from set tests

Statistical tests relying on prior information and aggregation of genetic markers based on some criteria other than the data currently being analyzed have the potential to recover more knowledge than single-marker analyses do. The gain in knowledge highly depends on the quality and complexity upon which the aggregation is based. Aggregation based on physical genomic regions, such as chromosomes or single genes, might not increase the information level, but as additional complexity layers are added, such as pathways, the biological interpretation might become easier and more informative. Especially, the bridging between different types of studies is a clear advantage, because the sharing of data could increase knowledge within different fields of science.

A large proportion of the GOs associated across methods were related to metabolism of retinoic acids (including vitamin A, Table 1). Of all associated GOs a total of 9/38 GOs were related to the metabolism of retinol (Table S1). These findings were supported by several independent lines of evidence, such as schizophrenia mouse models with impaired retinoic acid signaling (Lamantia 1999), microarray studies on human brains showing alterations in the whole retinoid cascade (Goodman 2005), and finally that low maternal vitamin A intake was associated with a threefold increase in the risk of schizophrenia in the offspring (Bao *et al.* 2012). Immunological changes have also been associated with the risk of developing schizophrenia (Maino *et al.* 2007), which supported our findings of a significant immunological signature (Table 1). In particular, the response to PAM_{3}CSK_{4}, an agonist to the membrane protein TLR2, has been shown to decrease the volume of gray matter in mice (Du *et al.* 2011): Reduction in gray matter is observed in schizophrenia patients (Pol *et al.* 2002).

Fewer genomic features were associated with the standard CVAT compared to the other methods; however, when extending CVAT to utilize marker effects arising from a mixture distribution, an increase in the number of associated feature sets was observed. Arguably, this could be due to either an increased power of CVAT or an increased false positive rate. Based on our simulations, we argue that the larger proportion of identified feature sets was due to increased power. In the case of the simulated data enriched for causal variants with low MAF, we saw a reduction in power of the standard CVAT (Figure 4). However, when exploiting the mixture distribution, the power increased considerably (Figure 4). Examining the distribution of SNP effects from the schizophrenia data, it is evident that the marker effects are a mixture of two distributions (Figure 5). This pattern could arise if the associated genomic features in general were enriched for genetic markers with low minor allele frequency. We did not observe this (Figure 6), but instead we observed that the associated genomic features were enriched for causal variants contributing to the disease risk, supporting the suggestions that the genetic risk of schizophrenia is a combination of common and rare variants (Gratten *et al.* 2014; Kato 2015; Loh *et al.* 2015).

### Conclusion

The aim of the present study was to (1) compare the performance of our GBLUP-derived CVAT to commonly used set test methods and (2) apply the methods to identify genomic features associated with schizophrenia. Overall we found that CVAT performed equivalently to the score method and outperformed the other approaches. By extending CVAT to utilize a mixture distribution of marker effects, the power of CVAT increased significantly compared to the other set test methods. Several other extensions of CVAT are possible, making CVAT an attractive method for future research. Applying the set tests on a schizophrenia case–control data set, we identified several pathways, such as vitamin A metabolism and an immunological signature, which both previously have been functionally associated with schizophrenia. Despite the relatively small sample size, we were able to identify significant sets of markers having biological relevance.

## Acknowledgments

This research was supported by the Center for Genomic Selection in Animals and Plants funded by the Danish Council for Strategic Research (contract no. 12-132452), the Faculty of Health Sciences and Centre for Integrative Sequencing at Aarhus University, and the Lundbeck Foundation.

## Appendix

The Genomic Medicine for Schizophrenia (GEMS) Group is as follows and affiliations are listed below: Ole Mors (1, 2), David M. Hougaard (3), and Preben Bo Mortensen (2, 4):

Research Department P, Aarhus University Hospital, 8240 Risskov, Denmark.

iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, 8000 Aarhus, Denmark.

Danish Center for Neonatal Screening, Statens Serum Institut, 2300 Copenhagen, Denmark.

National Center for Register-Based Research, Aarhus University, 8000 Aarhus, Denmark.

## Footnotes

*Communicating editor: N. R. Wray*Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.116.189498/-/DC1.

↵2 The Genomic Medicine for Schizophrenia Group members are listed in the

*Appendix*.

- Received March 19, 2016.
- Accepted June 9, 2016.

- Copyright © 2016 by the Genetics Society of America