## Abstract

Mapping and identifying variants that influence quantitative traits is an important problem for genetic studies. Traditional QTL mapping relies on a variance-components (VC) approach with the key assumption that the trait values in a family follow a multivariate normal distribution. Violation of this assumption can lead to inflated type I error, reduced power, and biased parameter estimates. To accommodate nonnormally distributed data, we developed and implemented a modified VC method, which we call the “copula VC method,” that directly models the nonnormal distribution using Gaussian copulas. The copula VC method allows the analysis of continuous, discrete, and censored trait data, and the standard VC method is a special case when the data are distributed as multivariate normal. Through the use of link functions, the copula VC method can easily incorporate covariates. We use computer simulations to show that the proposed method yields unbiased parameter estimates, correct type I error rates, and improved power for testing linkage with a variety of nonnormal traits as compared with the standard VC and the regression-based methods.

VARIANCE-COMPONENTS (VC) linkage analysis (Amos 1994; Almasy and Blangero 1998) plays an important role in mapping quantitative trait loci (QTL) that influence quantitative traits in humans. Unlike the original Haseman–Elston regression (Haseman and Elston 1972), which lacks flexibility in modeling variance–covariance structures and covariates, the VC method can analyze pedigrees essentially of any configuration and provides increased linkage power (Amos *et al*. 1996; Williams and Blangero 1999). In the simplest implementation of the VC method, trait values are assumed to follow a multivariate normal distribution with the variances and covariances depending on identical-by-descent (IBD) sharing between relative pairs and major gene, shared polygenes, and environmental variance components.

A key assumption in the VC method is that the quantitative traits follow a multivariate normal distribution within a family. Violation of this assumption can lead to inflated type I error, reduced linkage power, and biased parameter estimates (Allison *et al*. 1999; Epstein *et al*. 2003). Several solutions have been proposed when the trait distribution is not normal. Most simply, one can transform the data to univariate normal and apply the standard VC method. For example, for continuous traits, this can be achieved by the inverse-normal transformation using the empirical distribution of the trait. This transformation is quite accurate when the sample size is large and ensures that the trait is approximately distributed as univariate normal. Alternatively, a semiparametric model that jointly estimates an empirical transformation and genetic model parameters can be used (Diao and Lin 2005). A weakness of these transformation-based approaches is that they are not appropriate for the analysis of discrete or censored traits.

To analyze the data without transformation, one could use an approach based on generalized estimating equations (GEEs) (Liang and Zeger 1986). Chen *et al*. (2004) described a GEE framework for linkage analysis that includes the Haseman–Elston regression and the standard VC methods as special cases, where the different methods result from different choices of a working covariance matrix. This approach allows for various robust score tests to be defined and can be extended to take higher moments of the trait distribution into account (Chen *et al*. 2005). If sufficient computing resources are available, another option is to use the standard VC method and assess significance through gene-dropping simulations. For settings where computing resources are limited, Blangero *et al*. (2000) proposed a robust estimator of the covariance matrix that controls type I error but may lead to loss of efficiency.

As far as efficiency is concerned, the maximum-likelihood procedure with the proper distribution is the method of choice. For example, Lange *et al*. (1989), Wan *et al*. (1998), and Epstein *et al*. (2003) developed VC models for data from *t-*, log-normal, and censored normal distributions, respectively. In each case, parameters are estimated by maximum likelihood and the methods outperform alternatives for the analysis of data with *t-*, log-normal, and censored distributions, respectively. Given that quantitative traits of interest in genetic studies follow many different distributions, it is desirable to develop a unified likelihood framework that allows the analysis of a broad range of distributions.

In the statistical literature, the most general way to describe dependence between correlated random variables is to use copulas (Sklar 1959). Copulas are multivariate distribution functions whose one-dimensional margins are uniform on the [0, 1] interval (Nelson 1999). Copulas are useful for constructing joint distributions, especially when working with nonnormal random variables (Joe 1997). Here, we use Gaussian copulas, which share many similarities with the multivariate normal distribution. We show three examples in Figure 1, where the bivariate normal, the bivariate censored normal, and the bivariate gamma distributions are modeled by Gaussian copulas. In principle, copulas can be used to model the joint distributions of any continuous or discrete traits and even mixed continuous and discrete traits.

Copulas have been employed in previous genetic mapping studies. Tregouet *et al*. (1999) developed a parametric copula model for the analysis of familial binary data and conducted a combined segregation–linkage analysis of levels of plasma angiotensin. Wang and Huang (2002) proposed a score test for QTL mapping with sibships of arbitrary size using transformed data based on Gaussian copulas. Basrak *et al*. (2004) described a bivariate Gaussian copula approach to relax the normality assumption in QTL mapping. Both Wang and Huang (2002) and Basrak *et al*. (2004) implemented their copulas using a two-stage approach in which the inverse-normal transformation was first used to standardize the data, and the transformed traits were later tested for linkage assuming multivariate normality. However, the main strength of the copula is not transformation but its ability to describe the joint distribution of multivariate random variables and to characterize their dependence structure. Compared to the two-stage approach, we expect that methods based on the joint distributions of the original traits will be more efficient.

In this article, we describe a unified method for mapping genes that influence quantitative traits by use of the Gaussian copulas in the VC framework. We call this the “copula VC method.” Our method allows the analysis of continuous, discrete, and censored traits, and the standard VC method is a special case of our method when the data are distributed as multivariate normal. The copula VC method shares several features of the standard VC method; it is (i) likelihood based, (ii) applicable to pedigrees of any configuration, and (iii) readily incorporates covariates through the use of link functions. We evaluated the performance of the copula VC method by simulating data from multivariate Poisson and censored normal distributions. We compared our method with the standard VC method and a regression-based method (Sham *et al*. 2002), which is equivalent to a robust score test derived under the GEE framework (Chen *et al*. 2004). Our simulation results indicate that the copula VC method yields unbiased parameter estimates, correct type I error, and modest improvement of power for testing linkage.

## MATERIALS AND METHODS

We consider the problem of identifying genetic variants that influence quantitative traits, which may be continuous, discrete, or censored, and have distributions that may not be normal. Here, we develop a unified likelihood framework for the analysis of quantitative traits with a broad range of distributions. We seek to (i) identify major genetic loci that influence the quantitative traits and (ii) estimate the major gene heritability, overall genetic heritability, and regression coefficients of measured environmental factors. In the following sections, we review likelihood calculation in the standard VC method, briefly describe Gaussian copulas, and provide details of our approach.

### Likelihood of the standard VC method:

A critical assumption in the standard VC method is that the trait values within a family are distributed as multivariate normal. For a family with *m* related individuals, denote their trait values by . Let denote a vector of observed covariates for individual *j*, and let the mean of the trait value be . In the standard VC method, the trait value is modeled as the sum of independent effects due to measured covariates, such as age and gender, and unmeasured factors that can be modeled as random effects, such as the effect of the major-gene (mg), polygenes (pg), and individual-specific environmental (e) factors. The covariance matrix for the *m* individuals has elementswhere denotes the proportion of alleles at the major gene shared IBD between individuals *j* and *k*, is the kinship coefficient, is the additive genetic variance of the major gene, is the additive genetic variance due to polygenes, and is the residual environmental variance. The IBD sharing probabilities are typically unobservable, but can be estimated from genetic marker data by use of the Lander–Green algorithm (Lander and Green 1987), as implemented in software packages such as GENEHUNTER (Kruglyak *et al*. 1996), ALLEGRO (Gudbjartsson *et al*. 2000), and MERLIN (Abecasis *et al*. 2002). Under the multivariate normality assumption, the likelihood of the family is(1)

### Gaussian copulas:

The standard VC method uses the Pearson correlation, a measure of linear dependence, to model phenotypic similarity between a pair of individuals. However, when the traits are nonnormally distributed, linear dependence may not be suitable due to the presence of higher-order correlation, especially when the traits are highly skewed or discrete. A more flexible way to describe dependence is to use copulas. Consider *m* possibly dependent uniform random variables on the [0, 1] interval. The dependence relationship can be modeled through copula , where *C* is the joint distribution function of .

A copula of particular interest is the Gaussian copula, defined as , where and are the standard univariate and multivariate normal cumulative distribution functions (CDFs), and Γ is an correlation matrix. By using the Gaussian copulas, handling of a multivariate distribution can be separated into a marginal model for the inverse normal score and a model for the joint distribution of the inverse normal scores with , where is the CDF of . If the trait is continuous, then the CDF is uniformly distributed, and the corresponding inverse-normal score is distributed as standard univariate normal. In genetic linkage studies of univariate traits, we are typically interested in trait values that follow the same marginal distribution. For now, we assume that all marginal distributions are the same and denote .

### Gaussian copula VC models:

We assume that the marginal distribution of each trait value comes from an exponential family with distribution function (McCullagh and Nelder 1989), where *a*, *b*, and *c* are known functions, is the dispersion parameter, and is the canonical parameter. The mean and variance of the trait value are given by and , respectively. Given a set of covariates **x**, the mean is related to **x** through a known link function . The specification of depends on the trait distribution. For example, for a normally distributed trait, for a count-related trait, for a binary trait, for a gamma-distributed trait, one can use either a reciprocal link function or a log link function . For a gamma-distributed trait, the reciprocal link is the canonical link, but it prohibits negative mean values and can lead to unstable estimation of parameters (McCullagh and Nelder 1989), so that the log link is typically preferred.

Given the marginal trait distributions, the Gaussian copula gives rise to the following joint distribution of ,(2)where the correlation matrix Γ has elements 1 on the diagonal and , , on the off-diagonal. The correlation matrix Γ characterizes the pairwise nonlinear dependence of the trait values, , among the components of **y**.

### Joint probability/density functions:

Given the joint distribution function of **y**, the corresponding joint probability/density function can be obtained by taking derivatives with respect to (2) (Song 2000). When the trait is continuous, the joint density function of is(3)where and , is a vector of inverse-normal scores , and *I _{m}* is an

*m*-dimensional identity matrix.

When the trait is discrete, the joint probability function of **y** is obtained by taking the Radon–Nikodym derivative for in (2) with respect to counting measure (Song 2000),(4)where and . Here, is the left-hand limit of at *y _{j}*, which is equal to when takes integer values as for the Poisson and Binomial distributions.

Finally, when the *m* margins include continuous and discrete outcomes, the joint density function can be obtained as follows. Let and . The same partition and notation are applied for vectors **q** and **y**. Letand then the joint density of **y** is given by(5)where is the dispersion parameter for the first continuous outcomes and 's are defined as above.

It is worth noting that when the marginal distribution *G* is continuous, the transformed trait vector is distributed as multivariate normal with mean vector **0** and correlation matrix Γ. However, this is not true for discrete traits. By taking Radon–Nikodym derivatives with respect to counting measure, Equation 4 allows us to calculate explicitly the joint probability mass function of the correlated discrete traits. This makes our method different from those of Wang and Huang (2002) and Basrak *et al*. (2004) who assumed that the inverse-normal transformed traits are distributed as multivariate normal regardless of the trait types. Without deriving the joint probability mass function for the original traits, their methods should be used for the analysis of continuous traits only.

Given the joint density/probability function for one family in Equations 3, 4, or 5, the construction of the full likelihood for the trait data is simply the product of the joint density/probability functions over all families:(6)The joint probability/density functions allow us to analyze traits with a variety of distributions. For example, for traits that have skewed distributions, one might assume that the trait values within a family follow a multivariate gamma distribution and model their joint density by Equation 3; for count data, one might assume that the trait values within a family follow a multivariate Poisson or negative binomial distribution and model their joint probability by Equation 4.

It is interesting to note that our likelihood calculation also allows the analysis of censored data that may arise in genetic studies, for example, due to assay limitations or when some subjects are taking medication (Epstein *et al*. 2003). Here, we assume that the latent distribution of the censored data is continuous. To illustrate how one could obtain joint densities of censored data, consider a sib pair with trait values that have a bivariate censored normal distribution. For convenience, assume that censoring results in all trait values less than a threshold value *y** are equal to *y**. A censored trait value can be regarded as being generated from a Bernoulli distribution with the probability parameter being the proportion of censoring. If both sibs are censored, their joint probability function is given by the discrete-type Equation 4; if neither observation is censored, their joint density function is given by the continuous-type Equation 3; and if one sib is censored and the other sib is not, their joint density function is given by the mixed-type Equation 5. The joint density for bivariate censored normal data is given in appendix a. Similar derivations apply to censored data in higher dimensions and with other latent distributions.

### Test of linkage:

Testing linkage is the central task in QTL mapping studies. Invoking the above copula joint models, we can establish a test for linkage at the major gene *vs*. within the framework of likelihood-ratio tests, where the likelihood-ratio statistic is , with and being the likelihood (6) maximized under the alternative and null hypotheses, respectively. Since the value of is on the boundary of the parameter space under the null hypothesis, the asymptotic null distribution of the likelihood-ratio statistic will be approximated by a 50:50 mixture of and a point mass at 0 (Self and Liang 1987). The LOD score at the locus being tested is , which is equivalent to units of the likelihood-ratio statistic. We maximized these likelihoods via a Gauss–Newton type algorithm (Ruppert 2005) (appendix b), which requires the first derivatives of the log-likelihood.

### Simulations:

We conducted several simulations to examine and compare the performance of the copula VC, the standard VC, and the regression-based method as implemented in MERLIN-REGRESS for the analysis of nonnormal data. We specified the true population mean, variance, and heritability of the traits in MERLIN-REGRESS, therefore allowing this method to achieve optimal performance. The method implemented in MERLIN-REGRESS has been shown to be equivalent to a robust score test using the GEE framework (Chen *et al*. 2005). For illustration purposes, we examined Poisson- and censored normal-distributed traits. We first simulated data sets of 400 sib trios according to the copula model (4) to generate trait values with a Poisson distribution. We simulated a map of 10 markers each with four equally frequent alleles evenly spaced at 11.16-cM intervals, corresponding to recombination fraction 0.10 under Haldane's (1919) no interference map function. A QTL with two equally frequent alleles was placed in the middle of the map. We generated data with different values of major-gene heritability and overall genetic heritability . We removed the QTL genotypes prior to data analysis.

To determine whether the tested methods have correct type I error under the null hypothesis of no linkage, we simulated 10,000 replicate data sets. We also conducted simulations to compare the power of the three methods using 5000 replicate data sets. We consider trait models with combinative parameter values of , where is the mean parameter of Poisson distribution. To make fair power comparisons between the methods, we used empirical significance thresholds obtained from the null distribution simulations with the same total heritability, , but assuming that the major gene effect was . To determine the impact of discreteness on the estimation of covariate effects, we conducted additional simulations including a covariate that was generated from the standard normal distribution. We set the regression coefficient or 1 and simulated 2000 additional replicate data sets in each setting. The trait values were connected with the covariates using a log link function. In this setting, we analyzed the simulated data using the copula VC method only since the standard VC method assumes an identity link and thus it is not appropriate to compare covariate estimates for these two methods.

We repeated the simulation procedure for censored normal traits. For ease of computation, we considered sib pairs only. We simulated 800 sib pairs in each data set. To obtain censored trait values, we first simulated latent bivariate normal traits in accordance with the copula model (3) and then censored those values below a threshold of the latent trait distribution. We determined the threshold by the proportion of censoring, denoted by *c*. We considered trait models with combinative parameter values of . Without loss of generality, for all the trait models we considered, the total variance of the latent trait values was set to be 1.0. To determine the impact of censoring on the estimation of covariate effects, we conducted additional simulations including a covariate that was generated from the standard normal distribution. We set the regression coefficient or 1 and simulated 2000 additional replicate data sets in each setting. The trait values were connected with the covariates by an identity link function.

## RESULTS

### Poisson-distributed traits:

#### Empirical type I error and power for detecting linkage:

Figure 2 shows the empirical type I error for Poisson (count)-distributed traits when significance was evaluated using the asymptotic null distribution. The standard VC method gives inflated type I error for testing linkage, especially when the mean parameter of the Poisson distribution, , is small, corresponding to greater departure from normality. For example, when and , the type I error is 2.4% at the 1% significance level. We also found that the type I error for the standard VC method increases as the overall genetic heritability increases. In contrast, the copula VC method gives type I error close to the nominal levels for all four trait models that we considered. The type I error for the regression-based method is slightly higher than that for the copula VC method, but is nearly under control.

The empirical power of the copula and standard VC methods and the regression-based method for testing linkage is shown in Figure 2. Since the standard VC method has inflated type I error when using the asymptotic null distribution, we determined the critical values for testing linkage using the empirical null distributions generated with the same parameters except that was assumed for all three methods. As expected, the power to detect linkage of all three methods increases as the major gene heritability and overall genetic heritability increase. Our results also show that, for all the eight trait models considered, the copula VC method has modest improvement of power to detect linkage compared to the standard VC method and the regression-based method.

#### Trait model parameter and regression coefficient estimates:

The mean parameter estimates and the square root of mean square errors (MSEs) of the major gene heritability and the overall genetic heritability for both VC methods are shown in Figure 3. Our results indicate that the standard VC method underestimates heritability, especially when the mean count of the Poisson distribution is small. Compared to the overall genetic heritability, the major gene heritability appears to be less influenced by discreteness of the Poisson distribution. Mean estimates averaged 79–91% of the true values for the major gene heritability and 75–87% of the true values for the overall genetic heritability. In contrast, the copula VC method yields unbiased parameter estimates for all trait models that we considered. Further, the accuracy of the parameter estimates, as measured by the square root of the MSEs, improves as the major gene heritability and the overall genetic heritability increase.

We also investigated the effects of the discreteness of the Poisson distribution on regression coefficient estimates for the copula VC method (Figure 4). We are not able to evaluate the regression coefficient estimates using the standard VC method, since the covariate is linked with the Poisson-distributed trait values through a log link function, which is different from the identity link assumed by the standard VC method. Hence, the two VC methods are not directly comparable in terms of regression coefficient estimates. Figure 4 indicates that the copula VC method yields unbiased estimates of the regression coefficient for the eight trait models that we considered. As expected, the accuracy of the parameter estimates improves as the major gene and overall trait heritabilities increase.

### Censored normal distributed traits:

#### Empirical type I error and power for detecting linkage:

In Figure 5, we show the empirical type I error for the test of linkage with censored normal data. As noted by Epstein *et al*. (2003), the standard VC method yields inflated type I error, especially when the proportion of censored observations is large. Further, the type I error for the standard VC method increases as the overall genetic heritability increases. For example, the type I error ranges between 1.2% (when and ) and 1.7% (when and ) when significance was assessed at the 1% level. In contrast, the copula VC method yields type I error that is close to the nominal level.

The empirical power of the standard and the copula VC methods for testing linkage is shown in Figure 5. Again, we determined the empirical power by simulating data under the null model to estimate critical values. As expected, the power to detect linkage of all three methods increases as the major gene and overall genetic heritabilities increase. The power to detect linkage of all three methods diminishes as the percentage of censored observations increases, corresponding to less information about the underlying trait distribution. Further, the copula VC method provides a modest increase in power to detect linkage over the standard VC method, consistent with the results of Epstein *et al*. (2003), who developed the Tobit VC model to handle censored normal data. The copula VC method is also more powerful than the regression-based method.

#### Trait model parameter and regression coefficient estimates:

In Figure 6, we show the mean parameter estimates and the square root of the MSEs of the major gene heritability and the overall genetic heritability for both VC methods. As previously noted by Epstein *et al*. (2003), we found that the standard VC method underestimates the true values of the heritability parameters. On average, the estimates of the major gene heritability are 94–99% of the true values and the estimates of the overall genetic heritability are 92–98% of the true values for the eight trait models that we considered. Compared to the overall genetic heritability, the major gene heritability appears to be less influenced by censoring. As expected, the copula VC method yields unbiased estimates for both heritability parameters and the accuracy of parameter estimation improves as the percentage of censored observations decreases.

We also examined the effects of censoring on estimation of regression coefficients for both VC methods (Figure 7). We found that the regression coefficient estimates are notably attenuated toward zero using the standard VC method. The estimated values are ∼75% of the true values for all the eight trait models that we considered. In contrast, the copula VC method not only gives unbiased parameter estimates but also shows less variability.

## DISCUSSION

Many traits of scientific interest have nonnormal distributions. Using Gaussian copulas, we developed a unified likelihood framework that allows the analysis of traits with a wide range of distributions. Unlike the standard VC method, our copula VC method does not require the traits to be normally distributed. In particular, the standard VC method is a special case of the copula VC method when the traits are distributed as multivariate normal. Our method allows the analysis of continuous, discrete, and censored traits, including binary, polychotomous, count, and continuous skewed data. Through the use of link functions, the method can easily incorporate covariates to study the influence of environmental effects. For simplicity, we simulated sibship data in this article, but our method can be employed for the analysis of large pedigrees, although this becomes computationally more challenging.

The copula VC method yields unbiased parameter estimates and the correct type I error for testing linkage with a variety of nonnormal traits. In contrast, the standard VC method gives biased parameter estimates and inflated type I error and therefore requires intensive simulations to evaluate significance levels appropriately. The type I error of the regression-based method (Sham *et al*. 2002) is generally under control; however, that method is less powerful than the copula VC method. We found that for Poisson and censored normal traits, the major gene and overall genetic heritabilities are underestimated using the standard VC method. Further, for censored normal traits, the estimated regression coefficients for covariate effects are attenuated toward zero.

Although our article is focused on linkage analysis, our method can be employed in association studies by including the genetic markers as covariates. It is unclear that Wang and Huang's (2002) and Basrak *et al*.'s (2004) methods can be extended to test for association since the covariates have been regressed out in the data standardization procedure. Our results suggest that the copula VC method could be used to improve power of the association tests in the quantitative trait transmission/disequilibrium test (Abecasis *et al*. 2000), which incorporates the tested markers as covariates in a standard VC model that assumes multivariate normality.

For continuous traits, the copula VC method implicitly assumes that the inverse-normal transformed traits are multivariate normal, an assumption that is also made by the two-stage approach (Wang and Huang 2002; Basrak *et al*. 2004) and the semiparametric approach (Diao and Lin 2005). In contrast to these transformation-based methods, by taking Radon–Nikodym derivatives, we derived the joint probability/density function of the original trait values, including discrete and mixed outcomes. Moreover, by using generalized linear models, the copula VC method can easily accommodate any traits with marginal distributions belonging to the exponential family (McCullagh and Nelder 1989). Compared to other VC-based methods for nonnormal traits (Lange *et al*. 1989; Wan *et al*. 1998; Epstein *et al*. 2003), the method proposed here is more general and flexible.

In this article, we compared the copula VC method with the regression-based method for Poisson and censored normal traits and found that although both methods controlled type I error rates, our method appeared to be slightly more powerful. Chen *et al*. (2004, 2005) used GEEs to develop two robust score tests based on higher moments of the trait distributions and showed that the regression-based method (Sham *et al*. 2002) was equivalent to a robust score test derived from the GEE framework. Using the regression-based method as a benchmark, we expect that the copula VC method might be more powerful than the GEE-based method if the joint distributions are nearly correctly specified. However, if the joint distributions of the traits are hard to specify, then the GEE-based method may be preferred. We also expect that methods that incorporate higher moments of the trait distribution into a robust score test framework may improve the performance of the basic GEE approach so that it performs nearly as well as our maximum-likelihood approach.

The copula VC method assumes that the marginal distributions of the traits are known and, for practical data analysis, this assumption should be confirmed through model diagnostic procedures. One approach is to use the Q–Q plot to validate the parametric distribution assumption. If the parametric assumption is in question, to make our method practical, one could replace with the empirical CDF . The strong law of large numbers ensures that converges to almost surely, which means that the empirical CDF can be used to estimate the marginal distribution of any trait of interest. This approach is not applicable to discrete traits.

In this article, we employed Gaussian copulas to model the joint distributions of the traits. Gaussian copulas are a powerful tool for modeling multiple correlated variables and enjoy the flexibility of allowing arbitrary correlation structures and modeling high-dimensional data, whereas many other copulas are restricted to two dimensions. Moreover, Gaussian copulas are conceptually simple and have intuitive connections with the familiar multivariate normal distribution. We recognize that the Gaussian copulas are just one way to model nonnormally distributed traits. Similar approaches might be employed for other families of copulas, such as the *t*- or Archimedian copulas (Nelson 1999). It might be worth comparing the performance of the Gaussian copulas with these copulas, and such a study might offer even more flexibility in analyzing quantitative traits with different distributions.

In summary, we have developed a unified copula VC approach that allows the analysis of traits with a variety of distributions. Our method relaxes the multivariate normality assumption as employed by the standard VC method. We illustrated the utility of the copula VC method by simulating Poisson and censored normal traits. We simulated sibship data for simplicity, but our method can be employed for the analysis of larger pedigrees. We believe that the copula VC method provides a useful tool for the mapping of genes that influence nonnormally distributed quantitative traits.

## APPENDIX A

We briefly describe the derivation of the joint density function for bivariate censored normal variables. Let follow a bivariate normal distribution with mean and variance–covariance matrixAssume that are censored below a threshold *y** such that if and otherwise, and is similarly defined. Then are censored bivariate normal variables. The density of takes the following form:

If both and are censored at , then both of them can be regarded as coming from a Bernoulli distribution with the probability parameter being the proportion of censoring , . By Equation 4, the joint density function of iswhere is the standard bivariate normal cumulative distribution function with correlation parameter .

If is censored at and is not, then can be regarded as mixed type outcomes with being Bernoulli and being normal. By Equation 5, the joint density function of is

If is censored at and is not, then can be regarded as mixed type outcomes with being Bernoulli and being normal. By Equation 5, the joint density function of is

If both and are not censored, then the joint density of them iswhere is the density function for standard bivariate normal with correlation parameter .

## APPENDIX B

We describe a Gauss–Newton type algorithm (Ruppert 2005) to maximize the likelihood *L* in (6). In the (*l* + 1)th iteration, the parameters are updated bywhere δ is the step-halving term that starts at 1 and halves until at iteration *l*. This algorithm guarantees that the likelihood increases progressively over iterations. The algorithm stops when the increase in the likelihood is no longer possible or the difference between two consecutive updates is smaller than a prespecified precision level.

## Acknowledgments

We thank Wei-Min Chen for helpful discussions. We also thank Karl Broman and two anonymous reviewers for their valuable comments. This research was supported by National Institutes of Health grants HG00376 (to M.B.), HG02651, and EY12562 (to G.R.A.) and by the Natural Sciences and Engineering Research Council of Canada (to P.X-K.S.). M.L. was previously supported by a University of Michigan Rackham predoctoral fellowship.

## Footnotes

Communicating editor: K. W. Broman

- Received December 12, 2005.
- Accepted May 23, 2006.

- Copyright © 2006 by the Genetics Society of America