## Abstract

Almost all drugs that produce a favorable response (efficacy) may also produce adverse effects (toxicity). The relative strengths of drug efficacy and toxicity that vary in human populations are controlled by the combined influences of multiple genes and environmental influences. Genetic mapping has proven to be a powerful tool for detecting and identifying specific DNA sequence variants on the basis of the haplotype map (HapMap) constructed from single-nucleotide polymorphisms (SNPs). In this article, we present a novel statistical model for sequence mapping of two different but related drug responses. This model is incorporated by mathematical functions of drug response to varying doses or concentrations and the statistical device used to model the correlated structure of the residual (co)variance matrix. We implement a closed-form solution for the EM algorithm to estimate the population genetic parameters of SNPs and the simplex algorithm to estimate the curve parameters describing the pharmacodynamic changes of different genetic variants and matrix-structuring parameters. Extensive simulations are performed to investigate the statistical properties of our model. The implications of our model in pharmacogenetic and pharmacogenomic research are discussed.

THE administration of a specific drug to patients produces two different responses, desired therapeutic effects (efficacy) and adverse effects (toxicity). Increasing evidence has been observed for influences of genetic differences on these two responses (reviewed in Evans and Johnson 2001; Johnson and Evans 2002; Evans and McLeod 2003; Weinshilboum 2003). However, the genetic control of both efficacy and toxicity is typically complex, in which multiple genes interact with various biochemical, developmental, and environmental factors in coordinated ways to determine the overall phenotypes (Johnson 2003; Watters and McLeod 2003). With the advent of recent genomic technologies, interindividual differences in drug response can now be explained by DNA sequence variants in genes that encode the metabolism and disposition of drugs and the targets of drug therapy (such as receptors) (Evans and Relling 1999; McLeod and Evans 2001). To understand more comprehensively the genetic bases of efficacy and toxicity and, ultimately, design individualized medications with maximum favorable effects and minimum unfavorable effects, approaches are needed in which new specific genes for each response can be identified.

Two approaches have been developed to detect genes for a complex trait. The first approach is the indirect inference of causal genetic loci or quantitative trait loci (QTL) based on their cosegregating markers (Lander and Botstein 1989). The QTL detected from this approach is hypothetical, whose DNA structure and organization are unknown. High-throughput technologies of single-nucleotide polymorphisms (SNPs) have provided a powerful means for sequencing candidate genes that have been known to affect complex diseases or drug response. The recent development of the haplotype map (HapMap) constructed by anonymous SNPs (International HapMap Consortium 2003) has made it possible to scan genome-wide for the existence and distribution of functional SNPs on the basis of association analysis and narrow down the genomic regions that harbor causal SNPs. Motivated by these developments, Liu* et al*. (2004) proposed a second approach that can directly associate DNA sequence variants with the phenotypic variation on the basis of haplotype analyses. This approach has power to detect the DNA sequence where individuals differ at a single DNA base. In an independent study, Schaid (2004) provided an excellent review of the motivation of using haplotypes to understand the genetic basis of human traits.

Unlike the usual complex traits, drug response has some dynamic characteristic in which individuals respond to varying drug dosages or concentrations. Drug response can be therefore regarded as function-valued or longitudinal traits. The genetic architecture of function-valued traits can be studied using the marker-based *functional-mapping* model, developed by R. Wu and colleagues (Ma* et al*. 2002; Wu* et al.* 2002, 2003, 2004a,b). Functional mapping maps dynamic QTL responsible for a biological process that need be measured at a finite number of time points. In modeling functional mapping, fundamental principles behind biological or biochemical networks described by mathematical functions are incorporated into a QTL mapping framework. Functional mapping estimates parameters that determine shapes and functions of a particular biological network, rather than directly estimating gene effects at all possible points within the network. Because of the connection of these points through mathematical functions, functional mapping strikingly reduces the number of parameters to be estimated and, hence, displays increased statistical power.

Taking advantage of sequence-based association studies and functional mapping, we attempt to propose a new model that can characterize the DNA sequence structure of drug response and compare the genetic differences between efficacy and toxicity at the single-DNA base level. In the next sections, we first introduce basic principles for sequence mapping and functional mapping and then formulate a unifying likelihood function for the detection of specific DNA sequence variants that determine drug efficacy and toxicity. These are followed by the investigation of the statistical properties of this model through extensive simulation studies.

## BASIC PRINCIPLE FOR SEQUENCE MAPPING

Traditional QTL mapping is to study the relationship between putative QTL genotypes and phenotypes, whereas sequence mapping associates configurations of SNP genotypes (*i.e*., diplotypes) with phenotypes. Sequence mapping relies upon the the characterization of SNPs from the entire human genome. Recent studies have shown that the human genome has a haplotype block structure (Patil* et al.* 2001; Dawson* et al*. 2002; Gabriel* et al.* 2002), such that it can be divided into discrete blocks of limited haplotype diversity. In each block, a small fraction of SNPs, referred to as “tag SNPs,” can be used to distinguish a large fraction of the haplotypes.

For simplicity, we first consider two SNPs within a haplotype block that are cosegregating with linkage disequilibrium *D* in a human population at Hardy-Weinberg equilibrium. Each SNP has two alleles 1 and 2 with relative proportions *p*_{1}^{} and *p*_{2}^{} as well as *p*_{1}^{} and *p*_{2}^{}, respectively, where the superscript stands for the identification of the SNP and These two SNPs form four possible haplotypes, 11, 12, 21, and 22, whose frequencies are expressed as where *r*_{1}, *r*_{2} = 1, 2 denote the alleles of the two SNPs, respectively, and (Lynch and Walsh 1998). If the haplotype frequencies are known, then the allelic frequencies and linkage disequilibrium, arrayed by the population genetic parameter vector , can be solved with the above equation.

The random combination of maternal and paternal haplotypes generates 10 distinct diplotypes expressed as [11][11], · · · , [22][22], which are sorted into 9 genotypes, 11/11, · · · , 22/22 (Table 1). The double heterozygotic genotype 12/12 contains two possible diplotypes, [11][22] and [12][21]. We use to denote the diplotype and genotype frequencies, respectively, and *n*_{r1r′1/r2r′2} to denote the observations of various genotypes (* 𝒢*), where

*m*and

*p*describe the maternal and paternal origins of haplotypes and 1 ≤

*r*

_{1}≤

*r*

^{′}

_{1}≤ 2, 1 ≤

*r*

_{2}≤

*r*

^{′}

_{2}≤ 2. The frequencies and observations of all genotypes, except for genotype 12/12, are equivalent to those of the corresponding diplotypes.

We intend to associate diplotypes with interpatient variation in a quantitative trait on the basis of observed phenotypic values (* 𝒴*) assumed to be normally distributed and SNP genotypes (

*) assumed to be multinomially distributed. Without loss of generality, we assume that haplotype 11 is different from the rest of the haplotypes, cumulatively expressed as 1̅1̅, in triggering an effect on the phenotype. We call such a distinct haplotype 11 the*

**𝒢***reference*haplotype. The reference and nonreference haplotypes generate three combinations called the

*composite genotypes*. The genotypic means of the composite genotypes, μ

*(*

_{j}*j*= 2 for [11][11], 1 for , and 0 for ), and common residual variance within the composite genotype, σ

^{2}, that belong to quantitative genetic parameters are arrayed by

**Ω**

_{q}= {μ

*, σ*

_{j}^{2}}.

We constructed two log-likelihood functions, one in a multinomial form and the other in a mixture model form, to estimate the population and quantitative genetic parameters that are, respectively, expressed as 1and 23and

We derived a closed-form solution for estimating the unknown parameters with the EM algorithm. The estimates of haplotype frequencies are based on the log-likelihood function of Equation 1, whereas the estimates of composite genotypic means and residual variance are based on the log-likelihood function of Equation 2. These two types of parameters can be estimated using an integrative EM algorithm.

In the E step, the probability of a double-heterozygous genotype to carry diplotype [11][22] is estimated using Equation 3, whereas the posterior probability (Π* _{i}*) with which subject

*i*carrying the double heterozygous genotype is diplotype [11][22] is calculated by 4

In the M step, the probabilities calculated in the previous iteration are used to estimate the haplotype frequencies using 5678The quantitative genetic parameters are estimated using 9101112where *ṅ* = *n*_{11/12} + *n*_{12/11} and *n̈* = *n*_{11/22} + *n*_{12/22} + *n*_{22/11} + *n*_{22/12} + *n*_{22/22}. Iterations including the E and M steps are repeated among Equations 2–12 until the estimates of the parameters converge to stable values.

## AN INTEGRATIVE SEQUENCE AND FUNCTIONAL MAPPING FRAMEWORK FOR DRUG EFFICACY AND TOXICITY

### The multivariate normal distribution:

Efficacy and toxicity describe how patients respond to different doses or concentrations of drugs. Statistically, these represent a longitudinal problem whose underlying genetic determinants can be mapped using the functional mapping strategy. Here, we integrate the ideas for sequence mapping and functional mapping to directly characterize DNA sequence variants that are responsible for efficacy and toxicity processes.

Consider the same sample as described above in which population genetic parameters for SNPs have been defined. For each patient, drug effects are measured at *C* hallmark dose or concentration levels. While a drug is expected to display favorable effects, it may also be toxic. In this study, some typical physiological parameters reflecting both efficacy and toxicity are longitudinally measured. Let **x*** _{i}* = [

*x*(1), · · · ,

_{i}*x*(

_{i}*C*)] be the efficacy effect vector and

**z**

*= [*

_{i}*z*(1), · · · ,

_{i}*z*(

_{i}*C*)] be the toxicity effect vector for a drug administrated to patient

*i*. We use

**y**

*= (*

_{i}**x**

*,*

_{i}**z**

*) to denote the joint vector combining these two types of drug response. Because the phenotypic measurements of drug efficacy and toxicity are continuously variable, it is reasonable to assume that*

_{i}**y**follows a multivariate normal distribution, as used in general quantitative genetic studies (Lynch and Walsh 1998).

To jointly map the efficacy and toxicity, two log-likelihood functions described in Equations 1 and 2 should be constructed for observed SNP marker and longitudinal phenotypic data. The only difference from single-trait mapping is that the log-likelihood function (2) needs to incorporate a multivariate normal distribution for patient *i* who carries composite genotype *j*, expressed as 13where **y*** _{i}* = [

*x*(1), · · · ,

_{i}*x*(

_{i}*C*),

*z*(1), · · · ,

_{i}*z*(

_{i}*C*)] is a 2

*C*-dimensional vector of observations for efficacy and toxicity measured at

*C*dosages and

**m**

*= [μ*

_{j}*(1), · · · , μ*

_{jx}*(*

_{jx}*C*), μ

*(1), · · · , μ*

_{jz}*(*

_{jz}*C*)] is a vector of expected values for composite genotype

*j*at different doses. At a particular dose

*c*, the relationship between the observation and expected mean can be described by a linear regression model, 1415where ξ

*and ζ*

_{ij}*are the indicator variables denoted as 1 if a particular composite genotype*

_{ij}*j*is considered for individual

*i*and 0 otherwise,

*J*is the number of composite genotypes, and

*e*(

_{ix}*c*) and

*e*(

_{iz}*c*) are the residual errors that are iid normal with the mean of zero and the variances of σ

^{2}

_{x}and σ

^{2}

_{z}, respectively. The errors at two different doses,

*c*

_{1}and

*c*

_{2}, are correlated with the covariances of σ

*(*

_{x}*c*

_{1},

*c*

_{2}) for efficacy, σ

*(*

_{z}*c*

_{1},

*c*

_{2}) for toxicity, and σ

*(*

_{xz}*c*) and σ

*(*

_{xz}*c*

_{1},

*c*

_{2}) between efficacy and toxicity. All these variances and covariances compose the structure of the (co)variance matrix

**Σ**in Equation 13, expressed as 16where

**Σ**

*and*

_{x}**Σ**

*are composed of σ*

_{z}_{x}

^{2}and σ

*(*

_{x}*c*

_{1},

*c*

_{2}), and σ

_{z}

^{2}and σ

*(*

_{z}*c*

_{1},

*c*

_{2}) (1 ≤

*c*

_{1},

*c*

_{2}≤

*C*), respectively; and

**Σ**

*and*

_{xz}**Σ**

*are composed of σ*

_{zx}*(*

_{xz}*c*) and σ

*(*

_{xz}*c*

_{1}

*,*

_{x}*c*

_{2}

*) (1 ≤*

_{z}*c*

_{1}

*≠*

_{x}*c*

_{2}

*≤*

_{z}*C*), and σ

*(*

_{zx}*c*) and σ

*(*

_{xz}*c*

_{1}

*,*

_{z}*c*

_{2}

*) (1 ≤*

_{x}*c*

_{1}

*≠*

_{z}*c*

_{2}

*≤*

_{x}*C*), respectively.

To solve the likelihood functions implemented with response data measured at multiple doses, one can extend the traditional interval mapping approach to accommodate the multivariate nature of dose-dependent traits. However, this extension is limited in two aspects: (1) Individual expected means of different composite genotypes at all doses and all elements in the matrix **Σ** need to be estimated, resulting in substantial computational difficulties when the vector and matrix dimension is large; and (2) the result from this approach may not be clinically meaningful because the underlying medical principle for drug response is not incorporated. Thus, some clinically interesting questions cannot be asked and answered.

### The mapping framework:

Wu* et al*. (2002)(2004a,b) and Ma* et al*. (2002) have constructed a novel statistical framework for functional mapping of QTL that affect growth curves. Here, this framework is extended to jointly model two different longitudinal traits, aimed at simultaneously characterizing the genetic variants responsible for efficacy and toxicity. Two tasks are taken within this framework by modeling the mean vector and the covariance matrix.

#### Modeling the mean vector:

The dose-dependent expected values of composite genotype *j* can be modeled for drug efficacy by the sigmoid *E*_{max} model (Giraldo 2003) and for drug toxicity by the power function (McClish and Roberts 2003). The *E*_{max} model postulates the relationship between drug concentration (*c*) and drug effect (μ), 17where *E*_{0} is the constant or baseline value for the drug response parameter, *E*_{max} is the asymptotic (limiting) effect, EC_{50} is the drug concentration that results in 50% of the maximal effect, and *H* is the slope parameter that determines the slope of the concentration-response curve. Drug toxicity is described by the power function of dose expressed as 18where β determines the shape of the dose-response relationship and α adjusts the dose-related gain to which that shape conforms.

It is possible that efficacy and toxicity have different reference haplotypes, so their composite genotypes should be treated differently. As a result, Equations 17 and 18 together contain six curve parameters for a defined composite genotype *j*_{1} for efficacy and *j*_{2} for toxicity, which are arrayed by . If different composite genotypes have different combinations of these parameters, this implies that this sequence plays a role in governing the differentiation of efficacy and toxicity. Thus, by testing for the difference of **Ω**_{mj1j2} among different genotypes, we can determine whether a specific sequence variant exists that confers an effect on these two drug responses.

#### Modeling the structure of the covariance matrix:

Many statistical approaches have been proposed to model the structure of the covariance matrix for longitudinal traits measured at multiple time points (Diggle* et al.* 2002). Here, our matrix-structuring models are derived on the basis of a commonly used approach, the structured antedependence (SAD)(Zimmerman and Núñez-Antón 2001).

According to the SAD model, an observation at a particular dosage *c* depends on the previous ones, with the degree of dependence decaying with time lag. For drug efficacy and toxicity, the dose-dependent residual errors for subject *i* described by Equations 14 and 15 can be expressed, in terms of the first-order SAD [SAD(1)] model, as 19where φ* _{x}* (or φ

*) and φ*

_{z}*(or φ*

_{x}*) are the unrestricted antedependence parameters induced by trait*

_{z}*x*(or

*z*) itself and by the other trait

*z*(or

*x*), respectively, and ε

*(*

_{xi}*c*) and ε

*(*

_{xi}*c*) are the “innovation” errors for the two traits, respectively, normally distributed as

*N*and

*N*(Núñez-Antón and Zimmerman 2000; Zimmerman and Núñez-Antón 2001). The bivariate SAD(1) model for subject

*i*described by Equation 19 can be expressed in matrix notation as 20where

**e**

*= {*

_{i}*e*

_{1}(1), · · · ,

*e*(

_{i}*C*)}

^{T}, ε

*= {ε*

_{i}*(1), · · ·, ε*

_{i}*(*

_{i}*C*)}

^{T}, and

The covariance matrix of **e*** _{i}* under the bivariate SAD(1) model (20) can be obtained as 21where

and

In our modeling here, the innovative variance is assumed to be constant over different dosages. As shown by Jaffrézic* et al*. (2003), the SAD(1) model with this assumption can still allow for both the variance and the correlation to change with the dose level. In addition, we assume that the correlation (ρ) in innovative errors between the two traits *x* and *z* is stable over dosage. With these assumptions, we need to estimate an array of parameters contained in that model the structure of the covariance matrix under the SAD model.

As can be proven, in the context of the bivariate SAD model, the cross-correlation functions can be asymmetrical, *i.e*., Corr* _{xz}*(

*c*

_{1}

*,*

_{x}*c*

_{2}

*) ≠ Corr*

_{z}*(*

_{zx}*c*

_{1}

*,*

_{z}*c*

_{2}

*). This favorable feature of the bivariate SAD model makes it useful for understanding the genetic correlation between different traits. In practice, innovative variance and correlation can be modeled by a polynomial and exponential function, respectively (Núñez-Antón and Zimmerman 2000).*

_{x}### Computational algorithm:

We implemented the EM algorithm, originally proposed by Dempster* et al.* (1977), to obtain the maximum-likelihood estimates (MLEs) of three groups of unknown parameters in the integrated sequence and function mapping model, that is, the marker population parameters (**Ω**_{p}), the curve parameters that model the mean vector, and the parameters (**Ω**_{v}) that model the structure of the covariance matrix. These unknowns are denoted by . A detailed description of the EM algorithm is given in Wu* et al*. (2002)(2004b) and Ma* et al*. (2002).

As described for single-trait mapping, we implement the E step to calculate the probability of the double heterozygote 12/12 to carry diplotype [11][22] and the posterior probability (Π* _{i}*) of double heterozygotic patient

*i*who carries diplotype [11][22]. In the M step, we use the calculated ϖ and Π

*values to estimate the haplotype frequencies using Equations 5–8. But in this step we encounter a considerable difficulty in deriving the log-likelihood equations for*

_{i}**Ω**

_{mj1j2}and

**Ω**

_{v}because they are contained in complex nonlinear equations. Zhao

*et al.*(2004) implemented the simplex method as advocated by Nelder and Mead (1965) to the estimation process of functional mapping, which can strikingly increase computational efficiency. In this article, the simplex algorithm is embedded in the EM algorithm above to provide simultaneous estimation of haplotype frequencies and curve parameters and matrix-structuring parameters.

### Model for an arbitrary number of SNPs:

The idea for sequencing drug response has been described for a two-SNP model. It is possible that the two-SNP model is too simple to characterize genetic variants for variation in drug response. We have extended this model to include an arbitrary number of SNPs whose sequences are associated with drug response variation. A key issue for the multi-SNP sequencing model is how to distinguish among 2^{r}^{−1} different diplotypes for the same genotype heterozygous at *r* loci. The relative frequencies of these diplotypes can be expressed in terms of haplotype frequencies. The integrative EM algorithm can be employed to estimate the MLEs of haplotype frequencies. Bennett (1954) provided a general formula for expressing haplotype frequencies in terms of allele frequencies and linkage disequilibria of different orders. The MLEs of the latter can be obtained by solving a system of equations.

## HYPOTHESIS TESTS

Different from traditional mapping approaches, our functional mapping for function-valued traits allows for the tests of a number of biologically or clinically meaningful hypotheses. These hypothesis tests can be a test for the existence of significant DNA sequence variants or a test for the genetic effect on the maximal (asymptotic) effect (*E*_{max}), the drug concentration that results in 50% of the maximal effect, and the slopes that determines the steepness of the concentration-response curve.

The existence of specific genetic variants affecting drug efficacy and toxicity should be tested by formulating two alternative hypotheses, 22where H_{0} corresponds to the reduced model, in which the data can be fit by a single drug response curve, and H_{1} corresponds to the full model, in which different dynamic curves exist to fit the data. The test statistic for testing the hypotheses in Equation 23 is calculated as the log-likelihood ratio (LR) of the reduced to the full model, 23where **Ω̃** and **Ω̂** denote the MLEs of the unknown parameters under H_{0} and H_{1}, respectively. The LR is asymptotically χ^{2}-distributed with 15 d.f. An empirical approach for determining the critical threshold is based on permutation tests, as advocated by Churchill and Doerge (1994). By repeatedly shuffling the relationships between marker genotypes and phenotypes, a series of the maximum log-likelihood ratios are calculated, from the distribution of which the critical threshold is determined.

We can also test for the significance of the genetic effect on drug efficacy or toxicity at a particular concentration level (*c**) of interest, expressed as 24which is equivalent to testing the difference of the full model with no restriction and the reduced model with a restriction in H_{0}. Similar restrictions can be taken to test the genetic effect on individual curve parameters, such as *E*_{max}, EC_{50}, *H*, α, and β. The tests of these parameters are important for the design of personalized drugs to control particular diseases.

## RESULTS

We perform Monte Carlo simulation experiments to examine the statistical properties of the model proposed for genetic mapping of drug efficacy and toxicity. Our simulation is based on the bivariate SAD(1) model (Equation 21). For computational simplicity, we consider only the antedependence parameters induced by the trait itself, *i.e*., setting φ* _{x}* = φ

*= 0. We randomly choose 200 individuals from a human population at Hardy-Weinberg equilibrium with respect to haplotypes. Let us consider two SNPs with allele frequencies and linkage disequilibrium given in Table 1. The diplotypes derived from four haplotypes at these two SNPs affect drug efficacy and toxicity. Here we assume that these two types of drug response have different reference haplotypes,*

_{z}*i.e*., 11 for efficacy and 12 for toxicity. Thus, each drug response corresponds to a different set of three composite genotypes.

The efficacy (*E*_{0}, *E*_{max}, EC_{50}, and *H*) and toxicity curve parameters (α and β) for the respective three composite genotypes, given in Table 2, are determined in the ranges of empirical estimates of these parameters from pharmacological studies (Sowinski* et al.* 1995; McClish and Roberts 2003). We use a general clinical design, with drug effects measured at nine different concentrations of drug, 0, 5, 10, 15, 20, 25, 30, 35, and 40; corresponding to concentration levels 1, 2, · · · , 9 (Figure 1); for this simulation study. Using the genetic variance due to the composite genotypic difference in the area under the curve in Figure 1, we calculate the residual variances under different heritability levels (*H*^{2} = 0.1 and 0.4). These residual variances, plus given residual correlations, form a structured residual covariance matrix **Σ** (Equation 22). The phenotypic values of drug effect for 200 random patients are simulated by the summations of genotypic values predicted by the curves and residual errors following multivariate normal distributions, with MVN(0, **Σ**).

The population genetic parameters of the SNPs can be estimated with reasonably high precision using our closed-form solution approach (Table 2). The estimates of these parameters are dependent only on sample size and are not related to the size of heritability. Figure 1 illustrates different forms of the drug efficacy and toxicity curves from the three composite genotypes with a comparison between the hypothesized and estimated curves. The estimated curves are consistent with the hypothesized curves, especially when the heritability is larger (0.4), suggesting that our model can provide reasonable estimates of drug response curves. The parameters for the efficacy and toxicity models of each composite genotype can be estimated accurately and precisely (Table 2). As expected, the estimation precision (assessed by sampling errors) increases remarkably when the heritability increases from 0.1 to 0.4. The estimates of the SAD parameters that model the structure of the covariance matrix **Σ** also display reasonably high precision (Table 2). But it seems that their estimation precision is independent of the size of heritability.

In each of 100 simulations, we calculate the LRs for the hypothesis test of the presence of a genetic variant affecting both drug responses. The LR values in each simulation under both heritability levels are strikingly higher than the critical threshold estimated from 100 replicates of simulations under the null hypothesis that there is no drug response-associated genetic variant. This suggests that our model has 100% power to detect the genetic variant under given SNP, curve, and matrix-structuring parameters for our simulation. With the same set of simulated data, the power to detect significant genetic variants is reduced by ∼10% when drug efficacy or toxicity is modeled separately.

We carry out an additional simulation study to examine the statistical behavior of our model when three different SNPs are used. This three-SNP model uses the same parameters as those assumed under the two-SNP model, except for the inclusion of more SNPs. The results from the three-SNP model are broadly concordant with those from the two-SNP model (results not given), although more population genetic parameters need to be estimated in the former than in the latter.

## DISCUSSION

Drug response is complex in nature. First, no drug is universally effective; efficacy rates for drug therapy of most diseases ranges from 25 to 80% (Spear* et al*. 2001). Second, all drugs that produce an efficacious response may also produce adverse effects (Johnson 2003). Third, both drug efficacy and toxicity that vary in human populations are determined by multiple genetic and environmental factors that interact with one another in complicated ways (Watters and McLeod 2003). It is possible that the genetic polymorphisms associated with a favorable response to therapy may or may not be associated with toxicity risk (Nebert 1997; Wilson* et al.* 2001).

The complexity of drug response is enforced by its dynamic characteristic with high dimension and correlated structure. The genetic analysis of drug response, therefore, needs more sophisticated statistical tools that have the capacity to extract useful information from longitudinal data. In this article, we have proposed a novel statistical model for characterizing the genetic variants that govern drug response in a human population. Our model is constructed with a finite mixture model framework founded on the tenets of our sequence (Lin* et al.* 2005) and functional mapping (Wu* et al.* 2002, 2004a,b; Ma* et al*. 2002). This model has particular power to discern the discrepancy between the genetic mechanisms underlying drug efficacy and toxicity.

Compared to traditional genetic mapping approaches, our model is both genetically and statistically merited. It offers a direct tool to detect DNA sequences that code drug response on the basis of haplotype analyses (reviewed in Schaid 2004). The HapMap constructed from SNPs (International HapMap Consortium 2003) makes the application of our model practically possible. In particular, we adopt clinically meaningful mathematical functions (Giraldo 2003; McClish and Roberts 2003) when modeling dose-dependent drug response. This has two advantages. First, it facilitates statistical analysis and strengthens power to detect significant genetic variants because fewer parameters are needed to be estimated, as opposed to traditional multivariate analysis. Second, the results it produces are close to the biological realm given that the mathematical functions used are founded on a firm understanding of pharmacology (reviewed in Derendorf and Meibohm 1999). The statistical power of our model is further increased by the attempt to structure the covariance matrix with structured antedependence models (Diggle* et al*. 2002).

We have performed different simulation studies to investigate the statistical behavior of our model. The results suggest that it can provide accurate and precise estimates of the response curves for both efficacy and toxicity even when the heritability is modest. Different drug response curves, as simulated for different composite genotypes (Figure 1), can offer scientific guidance for determining the optimal dosage that balances favorable efficacy and unfavorable toxicity on the basis of an individual's genetic background. It should be noted that our model was derived on the basis of a simple clinical design in which a single cohort is assumed, with equal concentration intervals for every patient. This model can be extended to a case-control study by allowing haplotype frequencies and haplotype effects to be different between the case and control groups. A series of hypothesis tests regarding between-group differences can be formulated to detect specific haplotypes that are responsible for drug response. Our model can also be extended to consider various measurement schedules, in particular with uneven concentration lags varying from patient to patient. For these irregular measurement schedules, formulation of individualized likelihood functions is needed when the mean vector and covariance matrix are modeled (see Núñez-Antón and Zimmerman 1994).

With continuing advances in molecular biological technology and the availability of a complete reference sequence of the entire human genome (Patil* et al*. 2001; Dawson* et al*. 2002; Gabriel* et al.* 2002), our model will assist in the discovery and characterization of the genes that influence drug response and lead to a better understanding of how genes function and the subsequent development of new approaches to the diagnosis and treatment of common diseases by medications. Once the network of genes that govern drug responses in humans is defined, it will then be possible to more accurately optimize drug therapy on the basis of each patient's ability to metabolize, transport, and respond to medications.

## Acknowledgments

This work is supported by an Outstanding Young Investigator Award of the National Natural Science Foundation of China (30128017), a University of Florida Research Opportunity Fund (02050259), and a University of South Florida Biodefense grant (7222061-12) to R.W. The publication of this manuscript was approved as journal series no. R-10576 by the Florida Agricultural Experiment Station.

## Footnotes

Communicating editor: M. K. Uyenoyama

- Received December 15, 2004.
- Accepted February 22, 2005.

- Genetics Society of America