## Abstract

Age-at-onset is one of the critical traits in cohort studies of age-related diseases. Large-scale genome-wide association studies (GWAS) of age-at-onset traits can provide more insights into genetic effects on disease progression and transitions between stages. Moreover, proportional hazards (or Cox) regression models can achieve higher statistical power in a cohort study than a case-control trait using logistic regression. Although mixed-effects models are widely used in GWAS to correct for sample dependence, application of Cox mixed-effects models (CMEMs) to large-scale GWAS is so far hindered by intractable computational cost. In this work, we propose COXMEG, an efficient R package for conducting GWAS of age-at-onset traits using CMEMs. COXMEG introduces fast estimation algorithms for general sparse relatedness matrices including, but not limited to, block-diagonal pedigree-based matrices. COXMEG also introduces a fast and powerful score test for dense relatedness matrices, accounting for both population stratification and family structure. In addition, COXMEG generalizes existing algorithms to support positive semidefinite relatedness matrices, which are common in twin and family studies. Our simulation studies suggest that COXMEG, depending on the structure of the relatedness matrix, is orders of magnitude computationally more efficient than coxme and coxph with frailty for GWAS. We found that using sparse approximation of relatedness matrices yielded highly comparable results in controlling false-positive rate and retaining statistical power for an ethnically homogeneous family-based sample. By applying COXMEG to a study of Alzheimer’s disease (AD) with a Late-Onset Alzheimer’s Disease Family Study from the National Institute on Aging sample comprising 3456 non-Hispanic whites and 287 African Americans, we identified the *APOE* ε*4* variant with strong statistical power (*P* = 1e−101), far more significant than that reported in a previous study using a transformed variable and a marginal Cox model. Furthermore, we identified novel SNP rs36051450 (*P* = 2e−9) near *GRAMD1B*, the minor allele of which significantly reduced the hazards of AD in both genders. These results demonstrated that COXMEG greatly facilitates the application of CMEMs in GWAS of age-at-onset traits.

AGE-AT-ONSET and time-to-event are among the most important traits of interest in cohort studies of age-related diseases (He *et al.* 2016c; Kulminski *et al.* 2016) and behavioral genetics (He *et al.* 2016a), as these traits carry information on timing of events. Genome-wide association studies (GWAS) of time-to-event traits are critical to gain insights into the genetics of disease development and progression. Specifically, for many age-related diseases, including Alzheimer’s disease (AD), pathological processes may be triggered decades before the first symptoms. Identifying genetic variants associated with age-at-onset may provide insights into potential etiologies of such diseases. Time-to-event traits are also of inherent interest in specific fields such as clinical trials. In such situations, time-to-event analysis is essential because events occur in a subset of subjects whereas information on the outcome of other subjects is mostly censored. Compared to case-control studies using logistic regression models, evidence shows that cohort studies using proportional hazards (or Cox) regression models generally increase statistical power (Green and Symons 1983; Callas *et al.* 1998; Staley *et al.* 2017). The logistic regression model may also yield inflated estimates of the effect sizes in GWAS (Staley *et al.* 2017).

While time-to-event analysis using a Cox model plays a central role and is often more powerful in many biological and epidemiological studies, its application to GWAS is hampered by its massive computational burden when random effects are needed to account for unexplained heterogeneity (*e.g.*, genetic relationship). Unlike a computationally efficient linear mixed model (LMM), which has been well optimized and broadly applied in genetics (Kang *et al.* 2010; Lippert *et al.* 2011; Svishcheva *et al.* 2012; Zhou and Stephens 2012; Loh *et al.* 2015), much less attention has been paid to Cox mixed-effects models (CMEMs). To facilitate genetic association analysis in family-based studies, Therneau (2015) developed a specific data structure (bdsmatrix) and a fast algorithm implemented in C in the Cox mixed-effects R package (coxme). The coxme package is optimized for block-diagonal kinship matrices constructed based on pedigree information (Therneau 2003; Pankratz *et al.* 2005). The performance of the coxme package is remarkable for block-diagonal kinship matrices, reducing the central processing unit (CPU) time for one single nucleotide polymorphisms (SNP) from ∼15 min for a dense relatedness matrix to < 1 sec when analyzing a cohort comprising ∼3500 individuals [*e.g.*, the Late-Onset Alzheimer’s Disease Family Study from the National Institute on Aging (NIA-LOADFS) (Lee *et al.* 2008)]. Despite this dramatic improvement, applying a CMEM (Ripatti and Palmgren 2000; Therneau 2015) to a large-scale GWAS involving millions of SNPs is still computationally challenging. Even for the NIA-LOADFS sample, it would take coxme ∼3 months using one CPU thread to run a GWAS for 10 million imputed SNPs. Moreover, it is limited to block-diagonal and positive definite relatedness matrices. For a more general genetic relationship matrix (GRM) [*e.g.*, the GRM proposed in Yang *et al.* (2011)] that cannot be grouped into a block-diagonal matrix or is not positive definite due to, *e.g.*, monozygotic (MZ) twins, coxme would take decades or even fail.

Current GWAS often include many subjects (*e.g.*, more than tens of thousands of subjects in biobank data) and multiple studies, especially in consortia. Recent GWAS have also interrogated higher-order relationships such as SNP–environment and SNP–SNP interactions, requiring a large number of tests. Therefore, there is an urgent need to develop fast algorithms for CMEMs that can be applied to large-scale GWAS involving time-to-event traits. In this work, we introduce efficient estimation and testing algorithms for CMEMs through an R package COXMEG (https://cran.r-project.org/web/packages/coxmeg/), in which G stands for GWAS. COXMEG addressed the aforementioned computational challenges, and is well suited for large-scale GWAS. We made five major contributions in COXMEG. First, it substantially improved computational efficiency. For example, for a family-based design using a block-diagonal kinship matrix, COXMEG is often > 100-fold faster than the remarkably optimized coxme package in the same computational environment. Second, we developed a novel inexact Newton method, which is not restricted to a block-diagonal pedigree-based relatedness matrix. An arbitrary sparse positive definite matrix is able to enjoy the speed gain from COXMEG. For example, we showed in our application to NIA-LOADFS that COXMEG attained high computational performance and statistical power using a sparse GRM after thresholding. This new algorithm significantly expands the application of COXMEG in many areas including, *e.g.*, biobank data that use sparse relatedness matrices. Third, when a fully dense relatedness matrix cannot be approximated by a sparse matrix, it is computationally demanding to obtain *P*-values by estimating the hazard ratios (HRs) and then performing a Wald test. We developed a fast score test for such situations, dramatically reducing the computational cost of obtaining the *P*-values by not estimating the effect size of each SNP. Fourth, we introduced a generalized penalized partial likelihood (GPPL) to handle positive semidefinite relatedness matrices, which are very common in GWAS. Fifth, for a large-scale (> 20,000) fully dense relatedness matrix, we developed a fast randomized algorithm based on stochastic Lanczos quadrature (SLQ), considerably improving the performance for estimating the variance component.

Feasibility of COXMEG was demonstrated by performing GWAS of age-at-onset of AD in an NIA-LOADFS sample. AD is a complex age-related disease, spanning multiple stages and probably involving different biological mechanisms [*e.g.*, β-amyloid deposition, pathogenesis, vascular changes, activation of the immune system, and phosphorylation of tau (Finch and Kulminski 2019)] at different stages. Our hypothesis is that genetic architecture of AD is likely implicated in faster pathological progression from the initial stage. We demonstrate high computational performance of COXMEG, and report novel findings using a transethnic sample of NIA-LOADFS.

## Methods

### Model specification in COXMEG

We describe the estimation framework in COXMEG using a penalized partial likelihood (PPL) proposed by Breslow and Clayton (1993), and Ripatti and Palmgren (2000), based on the Laplace approximation. Other estimation strategies such as the penalized likelihood method (McGilchrist and Aisbett 1991; McGilchrist 1993), the hierarchical likelihood (h-likelihood) framework (Ha *et al.* 2001, 2011; Lee *et al.* 2006), and the expectation–maximization (EM) algorithm (Cortiñas Abrahantes and Burzykowski 2005) lead to a similar iterative procedure. Specifically, suppose that we have right-censored time-to-event observations from a cohort of N subjects, and a corresponding variable indicating whether a failure or censoring occurs at for each subject. The semiparametric proportional hazards (or Cox) model (Cox 1972) assumes that the conditional hazard function for subject i has the formwhere is an unknown baseline hazard function, X is a matrix of m predictors corresponding to the fixed effects, and Z is a design matrix for the random effects, respectively. The fixed effects include both the SNP to be tested and other covariates. In a CMEM, we assume that Z is the identity matrix (often denoted as I), and γ follows a multivariate normal distribution, *i.e.*,where is expected to be a symmetric positive definite (SPD) matrix measuring the genetic distance between the subjects (*e.g.*, a kinship matrix or a GRM), and τ is the variance of the genetic component. For simplicity, we consider here one variance component, but generalization to multiple components is straightforward. The situation where Σ is symmetric positive semidefinite (SPSD) is more complicated, and will be discussed in a separate section. The log-likelihood based on the above model is given by(1)where is the cumulative hazard function. The baseline hazard is unknown, but can be estimated from the data. Either substituting in Equation 1 with a partial likelihood (McGilchrist 1993; Ripatti and Palmgren 2000) or using maximum likelihood estimation for based on the h-likelihood (Ha *et al.* 2001) leads to the following PPL or equivalently h-likelihood(2)where is the set of all subjects at risk when an event is observed for subject i.

Because β and γ are mutually canonical [*e.g.*, see Lee *et al.* (2006)], the h-likelihood can be treated as an ordinary likelihood for estimating β and γ simultaneously, which we denote by , when τ is given (Ha *et al.* 2001; Lee *et al.* 2006). However, τ has to be estimated separately using a marginal likelihood or an adjusted profile likelihood. Following Ripatti and Palmgren (2000), an integrated marginal likelihood of τ using the Laplace approximation is(3)where is an information matrix of γ using the PPL with estimated from Equation 2. A standard two-layer algorithm for estimating θ, such as coxme, is to iterate between and . However, in GWAS it is often sensible to practically assume that τ is a constant for estimating the effect of each SNP (Kang *et al.* 2010; Loh *et al.* 2015). We leverage this assumption in COXMEG by estimating the variance component τ under a null model with only relevant covariates, and then use the estimated to analyze all SNPs. This step substantially reduces computational time compared to coxme, which estimates τ for each SNP individually. Some studies show that using a uniform variance component may lead to loss of statistical power to detect very significant SNPs (Zhou and Stephens 2012). Nevertheless, this potential issue can be amended by conducting a sensitivity analysis for those SNPs passing certain significance in the first round of the GWAS. We first focus on the maximization of , where an optimization algorithm finds given a known . The estimation of τ under the null model will be discussed in a later section in detail.

### Estimation algorithm in COXMEG-sparse

We consider in COXMEG-sparse a situation where Σ is a sparse SPD matrix. Under a large sample size, is highly quadratic in adjacent of the optimal θ, the Newton–Raphson (NR) method generally converges in a small number of steps, and thus is an efficient optimization method for this high-dimensional problem. Computing the first and second derivatives of with respect to θ gives the following score function and the Fisher information matrixwherewhere R is an indicator matrix for the risk set , in which we adopt the Breslow’s approximation (Breslow 1974) for ties. The NR method estimates θ by computingiteratively until converges. In terms of computational burden, the cross product takes rather than because W and A are diagonal, and M is an indicator matrix similar, through permutation, to a triangular matrix when no ties are present. Another computational challenge is to compute , which takes if solved directly using, *e.g.*, Cholesky decomposition (Cholesky decomposition is valid because H is always SPSD as shown in Supplemental Material, File S1 Materials A.2, and V is SPD when Σ is SPD). If there are p SNPs and on average iterations are needed in the NR optimization of for each SNP, the dominant term in the total computational cost for estimating θ would be , which becomes immediately intractable when the sample size N becomes thousands. We notice that when Σ is block-diagonal (*e.g.*, a pedigree-based kinship matrix), coxme computes only the block-diagonal entries of V corresponding to those in Σ. Such an approximation can theoretically reduce the computational time to , where F is the largest block (or family) size. In family-based studies, F is generally much smaller than N, and can often be assumed as a constant that does not increase with N. This approximation provides remarkable gain in computational performance. However, the drawback of this method is that this approximation depends on the structure of Σ and is limited to a block-diagonal Σ. Accordingly, more general GRMs with long-range correlations, *e.g.*, long-distance cryptic relatedness, or other sparse correlation matrices are not efficiently handled in this framework. In addition, the block-diagonal approximation is not optimal when there are large sparse blocks in Σ. We next propose an algorithm that is efficient for an arbitrary sparse SPD matrix.

Rewriting using Schur complements leads to(4)whereandNote that the computation of does not involve explicit evaluation of , and requires only multiplication and addition in . Therefore, once is solved, the remaining computation can be done in . The obstacle of inversing is that is fully dense even if Σ is sparse because is a fully dense matrix. To solve , we observe that is diagonal, is low-rank, and Σ (or ) is sparse. Rewriting using the Woodbury identity (Hager 1989) gives(5)where(6)The advantage of using Equation 5 is that it disentangles the sparse and dense parts in , so that the sparse term in Equation 5 can be evaluated efficiently by leveraging sparse algorithms depending on which of Σ and has the better sparsity pattern. For example, when Σ is block-diagonal, both matrices are sparse. When Σ is a matrix with first-order autoregressive [AR(1)] correlation structure [example 4 in Liang and Zeger (1986)], only is sparse. However, in many cases, might not be as sparse as Σ. In the worst case where the preconditioned conjugate gradient (PCG) method is used, the time complexity of solving is linear with respect to the number of nonzero entries in Σ. The major computational bottleneck in Equation 5 is the evaluation of , which is at least where is the number of noncensored subjects. So we simply ignore it in COXMEG-sparse, and propose to use instead the approximation(7)Substituting by in Equation 4 gives an approximated NR method. Similarly, no explicit evaluation of is needed when using Equation 7. Hence, once an sparse Cholesky decomposition (SCD) of S is obtained, the evaluation of the second term in Equation 7 mostly requires only one more forward–backward substitution. In the special case where Σ is block-diagonal, the computational burden for solving is .

We show in File S1 Materials A.1 that is a first-order approximation of . In fact, using Neumann series, can be expressed asfrom which we see that is a zero-order approximation of , equivalent to completely ignoring from the information matrix V. This amounts to adding , an SPSD matrix, to the information matrix. We show in File S1 Materials A.2 that COXMEG-sparse satisfies the conditions for a family of inexact Newton methods (Dembo *et al.* 1982; Eisenstat and Walker 1996), suggesting that it converges locally as long as a forcing term is bounded away from one (Dembo *et al.* 1982; Kelley 1995). Nevertheless, it no longer has the quadratic convergence rate generally (*i.e.*, it converges in more iterations than the NR method). We prove the local convergence in more detail in File S1 Materials A.2, but in short, the convergence rate depends on multiple properties including the variance component τ and the spectrum density of Σ. In terms of computational efficiency, a higher-order approximation speeds up the convergence, which, on the other hand, requires one more forward–backward substitution. Therefore, there is a trade-off between the convergence rate and the number of forward–backward substitutions in selecting an optimal order of approximation. We propose the first-order approximation because our simulation and real studies showed that the first-order approximation is often adequate to achieve nearly optimal performance in many practical applications to family-based GWAS (Figure S4 and Materials A.2 in File S1). In the cases where the algorithm converges slowly with , which often happens for a large τ or condition number of Σ, a higher-order approximation should be used for better computational performance (see Figure 1B and additional results in File S1 Materials A.2). Finally, replacing by in gives the following approximated variance of :We found that the relative error between and was < 0.1% under most settings considered in our simulation, except for (Figure S8 in File S1), suggesting that the first-order approximation is sufficiently accurate for testing β in general applications in GWAS when the variance component τ is not overly large. Otherwise, the second-order approximation should be used. The *P*-values are then calculated based on and using a Wald test.

As aforementioned, the major step in evaluating is to compute . Either term in Equation 6 can be used depending on which matrix, Σ or , is sparser. The performance of SCD algorithms is also highly affected by fill-ins. In our simulations, we compared three sparse algorithms in R to solve a block-diagonal S: (i) the LDLT SCD in RcppEigen, (ii) the sparse PCG algorithm using diagonal elements in RcppEigen, and (iii) the SCD in the Matrix R package. We found in our study that the LDLT SCD in RcppEigen yielded the best performance under most scenarios, especially for very sparse Σ (Figure S5 in File S1). On the other hand, the PCG method showed computational advantages when Σ is very large or less sparse (Figure S5 in File S1). The PCG method is also much faster than SCD for a sparse Σ whose inverse is dense.

### Score test for dense relatedness matrices

When both the matrices Σ and are dense, COXMEG-sparse can no longer leverage the sparse algorithms to speed up the evaluation of . Instead of the Cholesky decomposition used in coxme, we implemented a PCG method (Lidauer *et al.* 1999; Tsuruta *et al.* 2001) (termed as COXMEG-pcg) to compute each step in the NR algorithm. This alteration reduces the time complexity from to . However, for a large-scale GWAS, the PCG method might still be overly time-consuming. Therefore, we further propose COXMEG-score, a fast score test for the situation in which Σ and cannot be approximated by a sparse matrix. In COXMEG-score, we first fit the full CMEM under the null model, in which only covariates are included in X. After the null model is fitted, we obtain the estimates of the score function of τ, H, and the inverse of the information matrix under the null hypothesis, denoted by , , and , respectively, where . Denote by z a vector of the genotypes of a SNP to be tested. The score test statistic proposed in COXMEG-score is given bywhere is the variance of the score in the numerator given byBecause the h-likelihood can be treated as an ordinary likelihood once the variance component is given, approximately follows a distribution with 1 d.f. under the null hypothesis that the SNP has no effect on the HR. Note that the computation of involves only one matrix–vector multiplication and the rest can be performed in . So the computational burden of the score test alone is regardless of the structure of the relatedness matrix. This time complexity is the same as the score tests proposed for generalized linear models (Chen *et al.* 2016, 2019). Our simulation results suggest that COXMEG-score is statistically as powerful as the Wald tests like COXMEG-sparse and coxme (Figures S1 and S2).

### Estimation of variance components

We have described above the estimation and testing algorithms when the variance component τ is given. Here, we describe the estimation of τ. In COXMEG, τ is estimated only once under the null model, and then is used for testing all SNPs. The estimation of τ can be performed by calculating the first and second derivatives of given in Equation 3, and iterating between and using the NR method, as suggested in Ripatti and Palmgren (2000) and Ha *et al.* (2011). We found that this EM-like procedure practically converges rather slowly for estimating τ. Instead, we follow the strategy implemented in coxme. Because τ is often a scalar or in a low-dimensional space, we treat as a marginal likelihood of τ and directly use a derivative-free optimization. We found in our simulation study that this strategy converged faster. Thus, after is optimized, the only additional computation in each iteration for optimizing is the evaluation of the log-determinant of . As is SPD, a direct evaluation would require a Cholesky decomposition taking , where is the number of iterations in optimizing , and is often between 10 and 50 depending on the convergence. We evaluated the performance of the following three derivative-free algorithms: (i) Bobyqa (Powell 2009) implemented in the nloptr R package (Ypma 2014), (iii) Brent’s method, and (iii) the Nelder–Mead method (Nelder and Mead 1965) in the optim R function. Bobyqa was the fastest in most cases, and Brent’s method was slightly slower than Bobyqa in our simulation study.

When N is very large (*e.g.*, > 20,000), the evaluation of the log-determinant becomes a bottleneck in COXMEG-sparse and COXMEG-score, and even intractable. In addition, a second-stage sensitivity analysis using a variant-specific variance component may be performed for those SNPs passing a predefined threshold of significance in the first round. When the number of significant SNPs is large, this second-stage analysis would be computationally intensive. So, we propose fast approximation methods to estimate τ for sparse and dense GRMs, respectively. Specifically, for a sparse GRM, we propose the following marginal likelihood instead of .Compared to , the only difference is that uses a diagonal approximation of . When Σ or is sparse, the determinant can often be efficiently computed using SCD depending on its sparsity pattern. We found in our simulation study that the estimated τ using is very close to that from coxme, which uses a block-diagonal approximation of , with a relative difference < 1% in most cases (Figure S6 File S1).

When the relatedness matrix is dense or its sparsity pattern results in too many fill-ins, the above diagonal approximation does not help. Instead, we propose a method based on SLQ (Bai *et al.* 1996; Golub and Meurant 2009; Ubaru *et al.* 2017) to approximate the log-determinant, which reduces the time complexity from cubic to quadratic (, see File S1 Materials A.4) and achieves accurate estimation (Figure S7 in File S1). The basic idea is to first express the log-determinant using a Monte Carlo estimator of the trace of the matrix logarithm. Each term in the Monte Carlo estimator can be written as a Riemann–Stieltjes integral approximated by the Gaussian quadrature rule. The nodes and weights in the Gaussian quadrature can be elegantly computed by the eigenvalues and first elements in the eigenvectors of the tridiagonal matrix obtained from the Lanczos algorithm. We found that the accuracy of the SLQ method for approximating a log-determinant was generally higher by one order of magnitude than that using Chebyshev orthogonal polynomials (Pace and LeSage 2004; Han *et al.* 2016), both of which were more accurate than Martin’s Taylor expansion (Martin 1992; Barry and Kelley Pace 1999), consistent with previous reports (Han *et al.* 2016; Ubaru *et al.* 2017). More details about the SLQ approximation are described in File S1 Materials A.4.

### Handling a positive semidefinite relatedness matrix

When the relatedness matrix Σ is SPSD, becomes invalid because does not exist. So, we instead propose the following GPPL(8)where is the pseudoinverse defined as , where U is the eigenvectors of Σ and is the pseudoinverse of the diagonal matrix consisting of the eigenvalues of Σ. In addition, N is replaced by , the number of nonzero eigenvalues of Σ. The rationale of Equation 8 is similar to that in Kauermann and Wegener (2011) and He *et al.* (2016b), in which the penalty is imposed on a subspace of the coefficients. We show in File S1 Materials A.3 that when replacing with the GPPL for an SPSD Σ, the NR algorithms in COXMEG-sparse and COXMEG-score still work, except that the sum of elements in each row of Σ is zero (*i.e.*, an SPSD matrix has eigenvector 1 with eigenvalue 0). In such a case, becomes noninvertible. In addition, the last transformation in Equation 6 is valid only for Σ being SPD. So, if Σ but not is sparse, it would be better to make Σ SPD, as discussed in the following section The above GPPL ignores penalties with respect to zero eigenvalues in *Λ*, and thus it can be used when the SPSD matrix originates from biologically meaningless singularities (e.g., singularity originated from calculating a covariance matrix). However, when the singularity is biologically meaningful (e.g., for MZ twins), such zero eigenvalues would strongly penalize the difference between random effects within MZ twins and they should not be ignored. In this case, we propose a modified PPL by replacing all zero eigenvalues in *Λ* with a small positive number (e.g., 10^{−6}) and keeping the other eigenvalues unchanged, which makes Σ positive definite, and preserves the strong penalties.

### Construction of relatedness matrices for COXMEG

Depending on the purpose of using the CMEMs, the relatedness matrix Σ summarizes the dependence structure between individuals. Such a matrix in the context of GWAS is generally a kinship matrix or a GRM. There exist a variety of approaches defining and calculating these matrices depending on the assumption of the genetic architecture underlying the trait. Here, we focus on two frequently used types of relatedness matrices, a pedigree-based kinship matrix and a SNP-based GRM based on the infinitesimal model (Yang *et al.* 2011), and discuss how to construct such matrices for COXMEG.

A pedigree-based kinship matrix provides a measure of the total average genetic correlation based on the pedigree information but no genotype information, and it does not account for a specific phenotype. When the individuals in a cohort are more or less ethnically homogeneous, the CMEM with a pedigree-based kinship matrix is used primarily to account for family structure. A pedigree-based kinship matrix [for example, generated by the kinship2 R package (Sinnwell *et al.* 2014)], is generally block-diagonal SPD in which individuals from different families are regarded as independent. One exception is MZ twins, which lead to an SPSD kinship matrix. COXMEG-sparse is suitable for this situation because such matrices are sparse when the family size is not at the same magnitude of the sample size.

In contrast, a SNP-based GRM is more flexible in terms of genomic regions, more precise in terms of measuring genetic correlation, and can detect long-distance cryptic relatedness. For example, a GCTA GRM introduced in Yang *et al.* (2011) is calculated by(9)where is the frequency of the coding allele of SNP i, and is the dosage of SNP i and individual j. This type of GRM is a covariance matrix, which is generally SPSD even if because 1 d.f. is lost for estimating from the same sample. It always has eigenvector 1 with eigenvalue 0. Note that an estimated GRM may have negative eigenvalues if the SNPs used to construct it have many missing values, which should be avoided.

For an ethnically homogeneous cohort, a GRM can often be approximated by a sparse matrix using thresholding because most elements outside of a family are almost zero. Various methods can be used to approximate the GRM, such as hard thresholding (Bickel and Levina 2008), and soft thresholding using graphical lasso (Friedman *et al.* 2008). Our analysis of the NIA-LOADFS data showed that simple hard thresholding, which has literally little computational cost, using a small cutoff (*e.g.*, 0.02) could induce sparsity and often (but not always) preserve SPD (Bickel and Levina 2008). Sparsity approximation with guaranteed positive definite is also available [*e.g.*, Rothman (2012); implemented in the PDSCE R package] albeit computationally intensive compared to a simple hard thresholding.

### Simulation study

We performed simulation studies to investigate the computational and estimation performance under R 3.5. The simulation studies were conducted on a Linux server with an Intel Xeon E5-2680v2 CPU, and Windows machines with an Intel Core i5-6300HQ or i7-8700 CPU. We considered block-diagonal correlation matrices, banded AR(1) correlation matrices, and sparse GRMs by thresholding a GRM estimated from the genotype data in the NIA-LOADFS sample. All blocks in the block-diagonal correlation matrices shared the same block size and a single correlation coefficient. We evaluated multiple block sizes (5, 20, 100, and 500) and correlation coefficients (0.1, 0.5, and 0.9). The elements in the banded AR(1) correlation matrices were generated by and then set zeros for all , in which we evaluated . For each relatedness matrix Σ, we generated random effects based on , in which we considered to reflect different magnitudes of heritability. Genotype dosages for each SNP were generated from a binomial distribution , in which p, the minor allele frequency (MAF), ranged between 0.05 and 0.5. A disease time variable was generated using the following exponential distribution , in which we assumed a constant baseline hazard with . For each individual, we generated a censoring time variable using the following exponential distribution , in which we considered to reflect scenarios from heavy censoring to almost no censoring . The time-to-event variable was generated by

### Data availability

This manuscript was prepared using limited access data sets obtained through dbGaP [accession numbers: phs000168.v2.p2 (LOADFS)]. Code used to generate the simulated age-at-onset data can be found in the vignettes of the coxmeg R package at https://cran.r-project.org/web/packages/coxmeg/. Supplemental material available at figshare: https://doi.org/10.25386/genetics.11862486.

## Results

### Overview of COXMEG

Depending on whether the relatedness matrix is sparse, we propose in COXMEG two algorithms accounting for the dependence in subjects, COXMEG-sparse for a sparse relatedness matrix, and COXMEG-score for a dense relatedness matrix. Two primary purposes to adjust for dependence structure in the context of GWAS are: (i) correcting for population stratification and (ii) accounting for family structure. We consider these two purposes separately because the rationale, and thus the relatedness matrices, used in the two situations are different. In the first case, the major concern is to avoid identifying false-positive SNPs correlated with race. This is because population stratification results in many SNPs that can almost tag the race information in the sample, and consequently are highly correlated with race. If race or relevant variables, such as principal components (PCs), are not included in the model, the estimated effects of these SNPs should be attributed to the effect of race, leading to a large number of false positives. However, in the second case, the major benefit of explicitly specifying the sample dependence is to control the overall false-positive rate (FPR) and increase statistical power due to family structure. Previous studies have shown that ignoring dependence in the Cox model leads to overestimation of the model parameters (Wei *et al.* 1989). We show in our analysis that COXMEG-score is well suited for the first case, and COXMEG-sparse is dedicated to the second case.

Sparse approximation of the relatedness matrix is attractive, particularly in large-scale biobank data where the full relatedness matrix is too large to be squeezed into computer memory. When a selected study population is almost ethnically homogeneous and a CMEM is adopted primarily to correct for family structure, for example, in family-based GWAS (He *et al.* 2016a), the relatedness matrix is often sparse or can be approximated by a sparse matrix. For a family-based study, a widely used relatedness matrix is a pedigree-based kinship matrix, which is generally block-diagonal and sparse. When a SNP-based GRM computed from genotypes [*e.g.*, proposed in Yang *et al.* (2011)] is used, we notice that it often has a large proportion of near-zero elements. We show in the following analysis that, compared to a fully dense GRM, it often produces comparable results to use a sparse approximation of the GRM by, *e.g.*, thresholding the whole matrix or extracting its entries corresponding to family members. Using a sparse approximation dramatically improves both computational speed and memory usage. Compared to coxme, which is only optimized for a block-diagonal pedigree-based kinship matrix, COXMEG-sparse achieves its speed gain by introducing a novel approximation of the Hessian matrix in the optimization procedure. As we will show in this work, such an approximation cannot only accommodate an arbitrary sparse relatedness matrix, but is also practically much faster than coxme for a block-diagonal relatedness matrix. In contrast, COXMEG-score is dedicated to dense relatedness matrices for which COXMEG-sparse cannot achieve acceleration. The core idea in COXMEG-score is a fast score test for genome-wide association analysis and an efficient stochastic algorithm for estimating the variance component.

### COXMEG-sparse performs faster than coxme and coxph for family-based studies

We first investigated the computational efficiency of COXMEG-sparse for block-diagonal relatedness matrices. We compared it with the coxme R package (Therneau 2015), which is widely used to fit a CMEM and highly optimized for block-diagonal matrices. We also included in the comparison the function “coxph” in the survival R package (Therneau and Lumley 2015) with a shared Gaussian frailty model. A shared frailty model (Vaupel *et al.* 1979) was used as a fast, but generally less accurate, alternative to coxme in some studies (He *et al.* 2016c). It assumes that all members in one family share the same genetic component, which might not be a realistic presumption in many situations. In both coxme and coxph, block-diagonal approximation was used. Figure 1A showed that it took COXMEG-sparse ∼60 sec (using one CPU thread) to analyze 10,000 SNPs for a small sample of 2000 individuals with a family (or block) size of five on a Linux server with a Xeon E5-2680v2 CPU. In contrast, under the same computing environment for the same task, it took coxme ∼200 min, ∼200-fold as much as that of COXMEG-sparse. Moreover, COXMEG-sparse was ∼20-fold faster than coxph under this setting. For block-diagonal relatedness matrices, the computational burden of COXMEG-sparse rose almost linearly with increasing sample size (Figure 1A), consistent with theoretical estimates given in the *Methods* section. In contrast, the computational cost of coxme and coxph increased more than linearly. When the sample size was up to 10,000, COXMEG-sparse was > 1000-fold and > 100-fold faster than coxme and coxph, respectively (Figure 1A). On the other hand, the computational burden of COXMEG-sparse increased more rapidly than coxme with respect to the family size (Figure 1A). These observations are consistent with the theoretical time complexity of COXMEG-sparse, which is linear and quadratic with respect to sample size and family size, respectively, as derived in the *Methods* section. Interestingly, the computational time of coxme climbed modestly with increasing family size. This is probably because some heavier computation in coxme masks the time complexity of the Cholesky decomposition. On the contrary, the computational burden of coxph dropped rapidly with increasing family sizes. This is because coxph depends on the number of families, which decreased when the family size increased and the total sample size was kept unchanged.

To further understand how COXMEG-sparse achieved the speed gain, we evaluated its performance in estimating the variance component for one SNP. We found that on average ∼20 iterations for the marginal likelihood , described in the *Methods* section, are needed for COXMEG-sparse to estimate the variance component. Because COXMEG-sparse does not estimate a variant-specific variance component and thus requires only one iteration for each SNP, this gives COXMEG-sparse an ∼20-fold boost of speed. Next, when the family size was five and the sample size was 5000, COXMEG-sparse was ∼18-fold faster than coxme for estimating the variance component (Figure 1B, left). Multiplying these two factors gives COXMEG-sparse a 360-fold boost under this setting, consistent with the benchmarking in Figure 1A. The computational cost for estimating the variance component of coxme and coxph soared with increasing sample size (Figure 1B), justifying their poor performance under large sample size shown in Figure 1A. We found that the performance also depended on the operating systems. On a Windows 10 machine with a Core i5-6300HQ CPU, COXMEG-sparse was only ∼10-fold faster than coxme for estimating the variance component under the same setting (Figure S1 in File S1). We found that this difference is because RcppEigen, the program used in COXMEG for the Cholesky decomposition of a block-diagonal matrix, was much (∼3×) slower than bdsmatrix used in coxme on Windows, while they had comparable performance on Linux. This suggests that there is much room for COXMEG-sparse to optimize for Windows by using faster Cholesky decomposition programs.

On the other hand, when the family size was 100, the speed advantage of COXMEG-sparse dropped (Figure 1B, right). One cause is that the relatedness matrix with larger block size has a large condition number, and thus the first-order approximation converges slowly, as described in the *Methods* section. Indeed, we found that using a higher-order approximation in COXMEG-sparse reduced computational cost by ∼40% (Figure 1B). Another cause is that the performance of RcppEigen used in COXMEG-sparse dropped rapidly with decreasing sparsity (Figure S5 in File S1). Fortunately, such relatedness matrices are rarely encountered in real data analysis. For example, the average family size in the NIA-LOADFS sample is 2.16.

### COXMEG-sparse accommodates general sparse matrices

Next, we examined the performance of COXMEG-sparse for general sparse relatedness matrices. We assessed a class of sparse GRMs obtained by hard-thresholding small values of a dense GRM so that individuals with long genetic distance were treated as independent. This type of GRM is of interest to us because we observed that most elements in a GRM based on an infinitesimal model are almost zero in a family-based study consisting of an ethnically homogeneous cohort. For instance, when we studied only the non-Hispanic whites in the NIA-LOADFS sample, and set all elements below a prespecified small threshold (*e.g.*, 0.02) to zero, the resulting sparse GRM had on average < 1% nonzero elements per row, and most of them were from the family members. This type of GRM is not block-diagonal because it has sporadic nonzero elements. As shown in Figure 2A, COXMEG-sparse was dramatically more efficient to deal with this sort of approximated GRM than coxme, which was extremely slow for a matrix that is not block-diagonal. By thresholding a GRM including 4376 subjects from the NIA-LOADFS sample using different cutoffs, we examined sparse GRMs having 0.25–5% nonzero elements. Figure 2A showed that it took coxme > 3 years to estimate 10,000 SNPs, while it took COXMEG-sparse only 5 min when 0.25% of the elements were nonzero. The computational time decreased almost linearly with increasing sparsity of the relatedness matrix (Figure 2A), consistent with the theoretical time complexity derived in the *Methods* section.

To evaluate the performance of COXMEG-sparse for other correlation patterns, we conducted a simulation study using banded AR(1) matrices, which cannot be factorized as a block-diagonal matrix. Similarly, we found that COXMEG-sparse was substantially computationally faster than coxme (Figure 2B). The computational time increased almost linearly with respect to the sample size, much slower than coxme (Figure 2B). We did not include coxph in the above analyses because a shared frailty component is not clearly defined for a general sparse covariance matrix.

### Efficient testing using COXMEG-score for dense relatedness matrices

When there is a necessity to correct for both family structure and population stratification in a family-based multiethnic study, the GRM is often dense and might not be directly approximated by a sparse matrix using naive thresholding methods (*e.g.*, hard thresholding). For a fully dense GRM, the estimation algorithm would be extremely slow for coxme, which may even take 10 min to analyze one SNP with a small sample size of 2000. For such dense GRMs, we implemented an estimation method based on PCG, termed COXMEG-pcg, which substantially reduced computational intensity compared to the Cholesky decomposition used in coxme. As shown in Figure 3A, the computational time of COXMEG-pcg dropped drastically compared to coxme in practice. However, the efficiency of COXMEG-pcg might still not be acceptable for a large sample size when speed is a leading concern. Thus, we propose COXMEG-score, a much faster method, by conducting a score test to obtain the *P*-values. As compared in Figure 3A, COXMEG-score completed the analysis for 10,000 SNPs of a sample consisting of 1000 individuals within 15 sec, computationally much more efficient than COXMEG-pcg and coxme, which took 1000 and 10,000 sec, respectively. When the sample size increased up to 10,000, it took COXMEG-score and COXMEG-pcg ∼1.4 hr and ∼1.2 days, respectively, while it would take coxme ∼3 years according to our projection.

For a very large cohort (*e.g.*, > 20,000), the estimation of the variance component in COXMEG-score becomes a computational bottleneck because the estimation requires an evaluation of a log-determinant of a large matrix, the time complexity of which is cubic. Therefore, for large sample size, we introduce a randomized method using SLQ (Bai *et al.* 1996; Golub and Meurant 2009; Ubaru *et al.* 2017) to approximate the log-determinant, reducing the time complexity from cubic to quadratic. As demonstrated in Figure 3B, COXMEG-score with SLQ employing 100 Monte Carlo samples and 10 steps of the Lanczos algorithm markedly lowered the computational burden for a cohort with ≥ 10,000 individuals, and maintained control of the FPR for detecting significant SNPs (File S1 Materials A.4). Our simulation results indicated that the accuracy of the SLQ approximation depended on the structure of the relatedness matrix, and an estimated variance component was particularly accurate for a large relatedness matrix with weak correlations (*i.e.*, more accurate for a larger sample size and a smaller condition number of the relatedness matrix) (File S1 Materials A.4).

### COXMEG controls FPR in real data analysis

In addition to the speed gain, we demonstrated the validity of COXMEG by assessing its statistical power and FPR through a simulation study. We found that COXMEG, including COXMEG-score and COXMEG-sparse, had nearly the same statistical power and FPR as coxme, while coxph with a shared frailty had inflated FPR in certain situations of family-based data sets (Figures S2 and S3). Detailed results about the evaluation of the statistical power and FPR using simulated data can be found in File S1 Materials A.5.

Given the validated performance of COXMEG, we then used it to carry out GWAS of age-at-onset of AD for the NIA-LOADFS data set to detect SNPs associated with the hazards of late-onset AD. We first focused on a homogeneous subpopulation by selecting 3456 non-Hispanic white subjects with available AD status from the cohort. For the subjects with missing information about the age-at-onset of AD, we treated them as censored and set the age-at-onset as the age at recruitment. The genotypes were imputed using the Michigan Imputation Server with a reference panel from HRC (Version r1.1 2016) (Das *et al.* 2016). A total of ∼5.42 million SNPs were interrogated in the analyses after removing SNPs having MAF < 0.05. Sex was included as a covariate in all analyses.

We investigated the performance of COXMEG in the following three aspects. First, we evaluated the performance of COXMEG-sparse, COXMEG-score, coxme, and coxph (with a shared Gaussian frailty corresponding to the families) in terms of statistical power and control of the genomic inflation factor. Second, we investigated the potential influence on the statistical power of using different approximated GRMs in COXMEG-sparse. Third, we investigated the performance of using a uniform variance component estimated from the null model for analyzing all SNPs. More specifically, we assumed in coxph that all members in each family shared the same frailty. In the other three methods, we computed a GRM using all qualified genotypes (filtered by MAF > 0.05 and missing rate < 1e−4) from a 600,000 Illumina Human610_Quadv1_B genotype array. The GRM was estimated based on the GCTA model (Yang *et al.* 2011) (also see Equation 9 in the *Methods* section) and the diagonal elements were further scaled to one using the SNPRelate R package (Zheng *et al.* 2012). For coxme, we used a block-diagonal GRM by first defining diagonal blocks according to the families, and then setting all the elements outside the block-diagonal parts to zero. We found that eight pairs of individuals were in complete correlation, indicating that they were MZ twins. As coxme does not handle such positive semidefinite matrices, we set those off-diagonal elements corresponding to the MZ twins to 0.99 to make the GRM positive definite. For COXMEG-sparse, we tested two GRMs: (i) the same block-diagonal GRM as used in coxme and (ii) a sparse GRM by setting all elements in the GRM that had an absolute value < 0.03 to zero. For COXMEG-score, we tested three GRMs, the original GRM, the block-diagonal GRM, and the sparse GRM. In terms of the computational efficiency, the analyses using COXMEG-sparse and COXMEG-score were completed in ∼1 day using one CPU thread, while it took coxph and coxme > 3 weeks to complete the analysis. Surprisingly, we found that coxph took longer than coxme. Note that the average family size in NIA-LOADFS is only 2.16. Combined with the trend shown in the simulation study (Figure 1), these observations suggested that coxph can be less efficient than coxme when average family size is small.

Our results further showed that COXMEG-sparse, COXMEG-score, coxme using the same block-diagonal GRM, and coxph had almost the same genomic inflation factor (*λ* = 1.0214, 1.0215, 1.024, and 1.0008, respectively) (Figure 4A), suggesting that all four methods were comparable in terms of controlling false positives for this analysis. Using a family-based block-diagonal GRM or assuming a shared frailty did not inflate the significance of *P*-values in this study. We observed that coxph produced slightly less significant *P*-values for top SNPs than the other methods. On the other hand, we observed that the statistical power of COXMEG-score to detect the *APOE* ε*4* variant (*rs429358*) was much higher (more than four orders of magnitude) than that of the other three methods (Figure 4A). We observed little difference in the *P*-values for detecting top SNPs between the original GRM, the approximated sparse GRM, and the block-diagonal GRM (Figure 4B). In addition, they yielded almost the same genomic inflation factor (*λ* = 1.0014, 1.0215, and 1.0173, respectively), suggesting that the sparse approximations were practically viable for ethnically homogeneous family-based cohorts. Finally, we evaluated potential consequences of using a fixed variance component estimated from the null model instead of a variant-specific variance component for top SNPs. We estimated a variant-specific variance component, and recomputed the effect size and its *P*-value for each SNP having a *P*-value < 1e−5. We found that using the fixed variance component did produce slightly less significant *P*-values for those SNPs outside the *APOE* region (Figure 4C), while no evident difference was observed for those SNPs in the *APOE* region (Figure 4D). However, the variance component estimated from the null model including only sex was 0.26, while it rose to 0.38 when the null model included both sex and the *APOE* ε*4* variant.

### COXMEG is statistically powerful and identifies novel SNPs associated with age-at-onset of AD

The results from the non-Hispanic whites show that the *APOE* ε*4* variant was the strongest SNP, associated with increasing hazards of AD with a *P*-value of 1e−101 (Figure 5A), which was more significant than the *P*-value (*P* = 3.3e−96) reported from a previous study of age-at-onset of AD using a much larger sample size (9162 Caucasian participants) (Naj *et al.* 2014), in which the NIA-LOADFS sample is a subset. The previous study used log-transformed age-at-onset as the phenotype and generalized estimating equations for estimation (Naj *et al.* 2014). This might suggest that the CMEM is a more appropriate and powerful model than a marginal model for studying AD. In addition, we found that *rs36051450*, located on 11q24.1 and ∼25-kb upstream from *GRAMD1B*, yielded a *P*-value of 8e−8, narrowly below the genome-wide significance (Figure 5A). *GRAMD1B* is a cholesterol transporter mediating nonvesicular transport of cholesterol from the plasma membrane to the endoplasmic reticulum. In contrast, a logistic mixed-effects model produced a *P*-value of only 2.85e−05 for *rs36051450*. We also examined 23 common SNPs identified by a recent large-scale meta-analysis associated with incidence of AD (Jansen *et al.* 2019). We found that eight SNPs (in *CR1*, *BIN1*, *CLNK*, *HLA-DRB1*, *MS4A6A*, *PICALM*, *ABI3*, and *CASS4*) were nominally significantly (*P* < 0.05) associated with the hazards of AD (Table S1 in File S1), in which *rs4663105* in the *BIN1* region showed the strongest significance (*P* = 2.96e−04).

We further applied COXMEG-score to a sample consisting of 3456 non-Hispanic whites and 287 African Americans in NIA-LOADFS. The genotypes were imputed using the Michigan Imputation Server with the reference panel from 1000 Genomes Phase 3 (Version 5) (Das *et al.* 2016). A total of ∼20 million SNPs were included in the GWAS of age-at-onset of AD after removing SNPs having MAF < 0.05. A GRM was constructed using the same procedure as in the previous analyses. When applying COXMEG-score, we included sex as a covariate. To examine the consequence without adjusting for the dependence between subjects, we performed two additional GWAS using coxph without frailty. In the first analysis, we included only sex as a covariate. In the second analysis, we included sex and the top five PCs computed from the GRM to account for population stratification. We did not include coxme for comparison because it was not able to finish the analysis with a dense GRM in a reasonable time.

The top SNP identified by COXMEG-score was again the *APOE* ε*4* variant *rs429358* with a *P*-value of 1e−91 (Figure 5B), slightly less significant than that obtained from the non-Hispanic white sample alone, which probably indicates that the effect of *APOE* ε*4* is lower in African Americans than non-Hispanic whites. Outside the *APOE* region, we identified the novel SNP *rs36051450* (*P* = 2e−9) near *GRAMD1B*, the *P*-value of which was more significant than that from the non-Hispanic whites. We found that the minor allele of *rs36051450*, with an allele frequency of ∼18% in the sample, had a protective effect on AD, significantly reducing the hazards in both male and female individuals (Figure 5C). The effect in males was less significant than that in females probably because of the smaller sample size of males. Interestingly, *rs735665*, a nearby SNP in high linkage disequilibrium (*r*^{2} = 0.94) with *rs36051450*, is significantly (*P*-value = 4e−9–4e−39) associated with chronic lymphocytic leukemia (CLL) and follicular lymphoma (Di Bernardo *et al.* 2008; Conde *et al.* 2010; Slager *et al.* 2012; Berndt *et al.* 2013; Speedy *et al.* 2014). In contrast, the minor allele of *rs735665* increases the risk of CLL (odds ratio = 1.45–1.81), which suggests a potential antagonistic pleiotropy of these SNPs.

Given that the censoring proportion is 55%, the heritability of the risk of AD estimated from the variance component obtained from the CMEM in the NIA-LOADFS sample was 0.43 based on Equation 10 in the *Discussion* section. The genomic inflation factors were 1.014, 1.063, and 1.065 for COXMEG-score, coxph, and coxph with PCs, respectively (Figure 5D), suggesting that the FPR was inflated when not accounting for the family structure. Using PCs to account for population stratification slightly improved the FPR. We observed that COXMEG-score yielded more significant *P*-values for the top SNPs than coxph with PCs (Figure 5D), suggesting that COXMEG-score achieved higher statistical power to detect SNPs in the *APOE* region.

## Discussion

Age-at-onset is one of the most common and fundamental traits in genetic epidemiology. When the underlying hazard function is unknown, the Cox model (Cox 1972) is a prevalent routine for analyzing age-at-onset traits as it is more flexible than a parametric hazard model (*e.g.*, Weibull). When subjects are correlated, a CMEM is frequently used to model the dependence. Although the CMEMs have been investigated for decades (McGilchrist and Aisbett 1991; McGilchrist 1993; Ripatti and Palmgren 2000; Ha *et al.* 2001; Ripatti *et al.* 2002; Cortiñas Abrahantes and Burzykowski 2005), its estimation algorithm, unfortunately, has no closed form in contrast to a parametric hazard frailty model, in which it is possible to express the marginal likelihood explicitly using the Laplace transform for some frailty families (Yashin *et al.* 1995; Wienke *et al.* 2003). Iterative methods are required for estimation, which are computationally intensive, and consequently prohibit its broad application in GWAS. Therneau (2003, 2015) optimized the estimation algorithm specifically for block-diagonal relatedness matrices and achieved remarkable performance, but its use has still been limited in large-scale GWAS.

This work sets out to fill in the gap in the application of the CMEMs in GWAS. It might be of particular interest to many areas in epidemiology, such as age-related diseases (He *et al.* 2016c; Kulminski *et al.* 2016), behavior genetics (He *et al.* 2016a), and perhaps clinical trials, in which age-at-onset is a central concern. Moreover, the application of COXMEG includes, but is not limited to, genetic data. As demonstrated, COXMEG-sparse is computationally fast for general sparse matrices. A time-to-event analysis involving sample dependence structure, *e.g.*, geographically clustered time-to-event data, can also benefit from COXMEG.

We have proposed in COXMEG novel efficient algorithms to facilitate the application of the CMEMs to genome-wide age-at-onset or time-to-event analysis. We demonstrated in our simulation and real data analyses that COXMEG substantially improves computational efficiency, and is applicable for a broader family of relatedness matrices. Specifically, for a block-diagonal relatedness matrix, COXMEG is often hundreds to thousands of times faster than coxme, while for a dense relatedness matrix, COXMEG using a score test can be 100,000-fold faster. In addition, compared to coxme, COXMEG deals with positive semidefinite relatedness matrices, which are common in the context of GWAS. We demonstrated through a study of AD using the NIA-LOADFS sample that these improvements and generalizations in COXMEG allow fast GWAS of age-at-onset for a cohort of thousands, or tens of thousands, of individuals. For example, COXMEG completed the genome-wide analysis within 1 day for 10 million SNPs in NIA-LOADFS (∼3500 subjects) using one CPU thread.

Our results about the risk of AD show that the CMEMs may identify new SNPs that are not significant using a linear or logistic model. Interestingly, for the identified novel SNP *rs36051450* near *GRAMD1B*, the *P*-value from the logistic model was less significant than that from the CMEM, suggesting that this SNP might contribute more to accelerating the disease progression than the incidence. According to GTEx (GTEx Consortium *et al.* 2017), *GRAMD1B* is abundantly expressed in brain tissues. Cell-sorting experiments of human cortex tissue have indicated that *GRAMD1B* is most strongly expressed in astrocytes and neurons among five major brain cell types (Zhang *et al.* 2016). Further research could be attempted to replicate this finding in other cohorts.

Analogous to heritability defined in a linear model, a generalized heritability for the CMEMs defined on a logarithmic scale of cumulative hazards can be obtained as proposed in Korsgaard *et al.* (1999), Yazdi *et al.* (2002), and Schneider *et al.* (2005) using, *e.g.*, the following formula,(10)where is the proportion of censored subjects and is the estimated variance component in CMEMs.

Despite the efforts made in COXMEG, it is still challenging, in terms of both computational time and memory, to handle a fully dense relatedness matrix for a cohort of > 50,000 individuals in a transethnic study. As discussed in Gianola *et al.* (2016), a fast and viable solution to account for sample dependence is to split the relatedness matrix into two parts based on PC analysis. As we showed in our analysis of the NIA-LOADFS sample, using a sparse approximation of the SNP-based GRM in an ethnically homogeneous study of family-based design achieved highly comparable performance and considerably alleviated computational burden. Thus, an efficient solution to a very large transethnic cohort might be to use top PCs from the GRM to correct for population stratification, and a sparse approximation to correct for family structure.

## Acknowledgments

The authors thank the editors and four reviewers (including Doug Speed, two anonymous reviewers for *GENETICS*, and one anonymous reviewer for *Bioinformatics*) for their careful reading and constructive comments, and Terry Therneau for providing detailed information about the implementation of coxme and constructive discussion. This research was supported by grants from the National Institute on Aging (P01 AG043352, R01 AG047310, and R01 AG061853). The funders had no role in study design, data collection and analysis, decision to publish, or manuscript preparation. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The authors declare that they have no conflict of interest.

Author contributions: L.H. conceived, developed, and implemented the algorithms. L.H. designed the simulation study, and analyzed the simulated and real data. A.M.K. contributed to acquiring the real data, examining the algorithms, and discussion of final results. L.H. and A.M.K. contributed to the writing of the manuscript.

## Footnotes

Supplemental material available at figshare: https://doi.org/10.25386/genetics.11862486.

*Communicating editor: E. Eskin*

- Received November 22, 2019.
- Accepted March 1, 2020.

- Copyright © 2020 by the Genetics Society of America