## Abstract

Using multiple diallelic markers, variance component models are proposed for high-resolution combined linkage and association mapping of quantitative trait loci (QTL) based on nuclear families. The objective is to build a model that may fully use marker information for fine association mapping of QTL in the presence of prior linkage. The measures of linkage disequilibrium and the genetic effects are incorporated in the mean coefficients and are decomposed into orthogonal additive and dominance effects. The linkage information is modeled in variance-covariance matrices. Hence, the proposed methods model both association and linkage in a unified model. On the basis of marker information, a multipoint interval mapping method is provided to estimate the proportion of allele sharing identical by descent (IBD) and the probability of sharing two alleles IBD at a putative QTL for a sib-pair. To test the association between the trait locus and the markers, both likelihood-ratio tests and *F*-tests can be constructed on the basis of the proposed models. In addition, analytical formulas of noncentrality parameter approximations of the *F*-test statistics are provided. Type I error rates of the proposed test statistics are calculated to show their robustness. After comparing with the association between-family and association within-family (AbAw) approach by Abecasis and Fulker *et al*., it is found that the method proposed in this article is more powerful and advantageous based on simulation study and power calculation. By power and sample size comparison, it is shown that models that use more markers may have higher power than models that use fewer markers. The multiple-marker analysis can be more advantageous and has higher power in fine mapping QTL. As an application, the Genetic Analysis Workshop 12 German asthma data are analyzed using the proposed methods.

IN linkage disequilibrium (LD) mapping or association study, one may use one marker a time. However, the resolution of the single-marker analysis strategy can be low. In addition, utilizing different markers may lead to different results, since the power to detect allelic association depends on specific properties of the markers. This complicates the interpretation of an analysis. It is interesting and important to build models that use multiple markers simultaneously for high-resolution mapping of genetic traits. A unified analysis using multiple markers gives a unique result and may lead to greater resolution. Moreover, large numbers of single-nucleotide polymorphisms (SNPs) are available, and high-throughout genotyping approaches are emerging (International SNP Map Working Group 2001). This encouraging development facilitates high-resolution fine mapping of genetic traits. It is natural and necessary to develop high-resolution multiple-marker-based methods to dissect genetic traits.

In our previous work, variance component models using two markers are proposed for high-resolution linkage and association mapping of quantitative trait loci (QTL) based on population and pedigree data (Zhao* et al.* 2001; Fan and Xiong 2002, 2003; Fan and Jung 2003; Fan* et al.* 2005). The genetic effects are orthogonally decomposed into summation of additive and dominance effects. In Abecasis* et al.* (2000a)(b, 2001), Cardon (2000), Fulker* et al.* (1999), and Sham* et al.* (2000), an association between-family and association within-family (AbAw) approach is proposed to decompose the genetic association into effects of between pairs and within pairs. The models of our previous work differ from the AbAw approach in the following senses: (1) The AbAw approach uses only one marker in analysis, but we use two diallelic markers, and (2) the way of modeling mean coefficients is different. Fan and Jung (2003) compare our method with the AbAw approach and find that our method is advantageous for sib-pair data. In addition, Fan* et al.* (2005) confirm that our approach is more powerful than the AbAw approach for large pedigrees. One may note that it is not clear how to extend the AbAw approach to use more than one marker in analysis (R. Fan and G. R. Abecasis, personal communication).

This article extends our previous work and investigates variance component models in high-resolution linkage and association mapping of QTL using multiple diallelic markers. The models jointly take linkage and linkage disequilibrium information into account. The linkage information is modeled in the variance-covariance matrix, and the linkage disequilibrium information is modeled in mean coefficients of trait values such as the AbAw approach. By modeling the linkage information in the variance-covariance matrix, we may take advantage of much research on variance component models (Haseman and Elston 1972; Amos* et al.* 1989; Goldgar and Oniki 1992; Amos 1994; Fulker* et al.* 1995; Almasy and Blangero 1998; George* et al.* 1999; Pratt* et al.* 2000). In the mean time, the linkage disequilibrium information is incorporated into the mean coefficients via indicator variables of marker genotypes, whose validity can be justified intuitively (Fan and Xiong 2002, pp. 608–609).

Using the models developed in this article, test statistics can be developed for high-resolution association mapping of QTL. The procedure is to perform appropriate linkage analysis on the basis of a sparse genetic map for prior linkage evidence. Then association study can be carried out on the basis of a dense genetic map for high-resolution mapping of QTL in the presence of prior linkage information. Likelihood-ratio tests (LRT) can be carried out in high-resolution association studies. For large-sample data, likelihood-ratio criteria are accurate. On the basis of general theory of linear models, *F*-test statistics can be built to test the association between trait locus and markers in the presence of prior linkage evidence (Graybill 1976). The analytical formulas for the noncentrality parameter approximations are derived for the *F*-test statistics. The merits of the proposed method are investigated by power and sample size comparison. Using the simulation program LDSIMUL kindly provided by G. R. Abecasis, simulation study is performed to explore the power and type I error rates of the proposed test statistics. The proposed methods are compared with the AbAw approach (Abecasis* et al.* 2000a). Moreover, the method is applied to analyze the Genetic Analysis Workshop (GAW) 12 German asthma data (Wjst* et al.* 1999; Meyers* et al.* 2001).

## MODEL

Assume that *k* diallelic markers *M _{j}*,

*j*= 1, · · · ,

*k*, are typed in a region of one chromosome. Suppose a quantitative trait locus

*Q*is located in the region, which has two alleles

*Q*

_{1}and

*Q*

_{2}with frequencies

*q*

_{1}and

*q*

_{2}, respectively. For marker

*M*, there are two alleles

_{j}*M*with frequency

_{j}*P*

_{Mj}and

*m*with frequency

_{j}*P*

_{mj}, respectively. For a nuclear family of

*l*children and two parents, let

**y**= (

*y*

_{f},

*y*

_{m},

*y*

_{1}, · · · ,

*y*)

_{l}^{τ}be their quantitative trait column vector and

*G*= (

_{j}*G*

_{f}

*,*

_{j}*G*

_{m}

*,*

_{j}*G*

_{1}

*, · · · ,*

_{j}*G*)

_{lj}^{τ}be their genotype column vector at the

*j*th marker locus

*M*. Here

_{j}*y*

_{f}is the trait value of the father, and

*G*

_{f}

*is the genotype of the father at the*

_{j}*j*th marker. Likewise, the other notations of the mother and the

*i*th child are defined accordingly with subscripts

*m*and

*i*, respectively. The superscript τ denotes the transpose of a vector or a matrix. Under the assumption of multivariate normality, we consider the mixed-effect model 1(Searle

*et al.*1992; Pinheiro and Bates 2000), where β is the overall mean of fixed effect,

*w*is a row vector of covariates such as sex and age, γ is a column vector of fixed-effect regression coefficients of

_{i}*w*,

_{i}*B*is the familial effect of random effects, and

_{i}*e*is the error term. Assume that

_{i}*e*is normal

_{i}*N*(0, σ

^{2}

_{e}), and

*B*is normal

_{i}*N*, where σ

^{2}

_{e}is error variance, σ

^{2}

_{s}is the variance of shared environment effect, and σ

^{2}

_{Ga}is the variance of additive polygenic effect. Moreover,

*B*and

_{i}*e*are independent. For

_{i}*j*= 1, · · · ,

*k*, α

*and δ*

_{j}*are fixed-effect regression coefficients of the dummy variables*

_{j}*x*and

_{ij}*z*, respectively. Here

_{ij}*x*and

_{ij}*z*are indicator variables and are defined as follows: 2Following the traditional quantitative genetics, the variance-covariance matrix of model (1) is a (

_{ij}*l*+ 2) × (

*l*+ 2) square matrix and is given by where . Here σ

^{2}

_{g}is variance explained by the putative QTL

*Q*. The genetic variance is decomposed into additive and dominance components. is the correlation between the parents. Let be the variance of familial effects that include shared environment variance σ

^{2}

_{s}and half of the additive polygenic variance. is correlation between parents and children; is the correlation between the

*i*th child and the

*j*th child, where π

*is the proportion of alleles shared identical by descent (IBD) at putative QTL*

_{ijQ}*Q*by the

*i*th child and the

*j*th child, and Δ

*is the probability that both alleles at the putative QTL*

_{ijQ}*Q*shared by the

*i*th child and the

*j*th child are IBD (Cotterman 1940; Pratt

*et al.*2000; Zhu and Elston 2000; Lange 2002). On the basis of the above discussion, the log-likelihood function of the mixed-effect model (1) is given by 3where η = (β, γ

^{τ}, α

_{1}, · · · , α

*, δ*

_{k}_{1}, · · · , δ

*)*

_{k}^{τ}is a vector of regression coefficients, and

*X*is the model matrix, accordingly.

One may wonder why we use model (1) to describe the phenotypes. Here we provide an intuitive rationale. Suppose that QTL *Q* coincides with one marker, *e.g.*, marker *M*_{1}, and trait allele *Q*_{1} coincides with marker allele *M*_{1} and allele *Q*_{2} coincides with allele *m*_{1}. Let μ* _{ij}* be the effect of genotype

*Q*,

_{i}Q_{j}*i*,

*j*= 1, 2. Denote genotypic value

*a*= μ

_{11}− (μ

_{11}+ μ

_{22})/2 and

*d*= μ

_{12}− (μ

_{11}+ μ

_{22})/2. The average effect of gene substitution is α

*=*

_{Q}*a*+ (

*q*

_{2}−

*q*

_{1})

*d*,

*i.e.*, the difference between the average effects of the trait locus alleles, and dominance deviation is δ

*= 2*

_{Q}*d*in view of traditional quantitative genetics (Falconer and Mackay 1996). Fan and Xiong (2002) show that

*y*can be expressed as

_{i}*y*= μ

_{i}_{0}+

*x*

_{i}_{1}α

*+*

_{Q}*z*

_{i}_{1}δ

*+*

_{Q}*B*+

_{i}*e*, where μ

_{i}_{0}is overall population mean of the quantitative trait. Hence, marker

*M*

_{1}may fully describe the trait values if it coincides with the QTL

*Q*. In practice, the information of QTL

*Q*is unknown. Instead, model (1) is proposed to describe trait value

*y*using marker information. Two marker models were used in previous work (Fan and Xiong 2002, 2003; Fan and Jung 2003; Fan

_{i}*et al.*2005). Model (1) uses multiple markers and is a natural generalization of model of our previous work. The objective is to use marker information fully for fine high-resolution mapping of QTL. In the following, we show that model (1) and log-likelihood (3) can be used in joint linkage and association mapping of QTL.

## PROPERTY OF REGRESSION COEFFICIENTS AND ASSOCIATION TESTS

Denote the measure of LD between trait locus *Q* and marker *M _{i}* by ,

*i*= 1, · · · ,

*k*and the measure of LD between marker

*M*and marker

_{i}*M*by ,

_{j}*i*<

*j*,

*i*,

*j*= 1, · · · ,

*k*. Let the additive and dominance variance-covariance matrices of the indicator variables defined in (2) be (appendix A) 4In appendix A, the coefficients of model (1) are derived as 5Equations (5) show that the parameters of LD (

*i.e.*,

*D*

_{MiQ}and

*D*

_{MiMj}) and gene effect (

*i.e*., α

*and δ*

_{Q}*) are contained in the mean coefficients. Model (1) simultaneously takes care of the LD and the effects of the putative trait locus*

_{Q}*Q*. The gene substitution effect α

*is contained only in α*

_{Q}*; and the dominance effect δ*

_{i}*is contained only in δ*

_{Q}*,*

_{i}*i*= 1, · · · ,

*k*. Therefore, model (1) orthogonally decomposes genetic effect into summation of additive and dominance effects.

Assume that all markers *M _{i}* and

*M*are in linkage equilibrium (

_{j}*i.e.*, ,

*i*,

*j*= 1, · · · ,

*k*,

*i*≠

*j*). The coefficients of additive and dominance effects are given by , · · ·, , and , · · · , . That means markers

*M*

_{1}, · · · ,

*M*independently contribute to the analysis of the trait values. Usually, the markers

_{k}*M*can be in LD, especially when they are located in a narrow chromosome region. Equations (5) correctly use the LD information of markers

_{i}*M*in the analysis.

_{i}Linkage analysis can be performed by considering a reduced variance component model, 6by using the traditional method of variance component models (Amos* et al.* 1989; Amos 1994; Almasy and Blangero 1998). This initial study can identify prior linkage evidence of the trait values to a specific chromosome region on the basis of a sparse genetic map. Suppose that prior linkage evidence is provided by an initial linkage study. On the basis of a dense genetic map, high-resolution association mapping of the QTL can be carried out by fitting the full model (1). First, assume that linkage is confirmed in a chromosome region by the significant presence of both the gene substitution and dominance effects, *i.e*., α* _{Q}* ≠ 0 and δ

*≠ 0. On the basis of Equations 5, the existence of LD between markers*

_{Q}*M*(

_{i}*i*= 1, · · · ,

*k*) and trait locus

*Q*can be tested by H

_{ad}: α

_{1}= · · · = α

*= δ*

_{k}_{1}= · · · = δ

*= 0. Second, assume that linkage is supported by the significant presence of the gene substitution effect, but not the dominance effect,*

_{k}*i.e.*, α

*≠ 0 and δ*

_{Q}*= 0. The existence of LD can be tested by H*

_{Q}_{a}: α

_{1}= · · · = α

*= 0. Third, assume that linkage is supported by the significant presence of the dominance effect, but not the gene substitution effect,*

_{k}*i.e.*, α

*= 0 and δ*

_{Q}*≠ 0. The existence of LD can be tested by H*

_{Q}_{d}: δ

_{1}= · · · = δ

*= 0.*

_{k}Evidence of association can be evaluated by the LRT procedure. For instance, let *L*_{ad} be the log-likelihood under the alternative hypothesis of H_{ad} and *L*_{0} be the log-likelihood under the null hypothesis H_{ad}. Then, the likelihood-ratio test statistic 2[*L*_{ad} − *L*_{0}] is asymptotically distributed as χ^{2}. The degrees of freedom for this test are determined as follows. Under the null hypothesis H_{ad}, there are only *k* measures of LD, *D*_{M1Q}, · · · , *D*_{MkQ}, of which only *k* − 1 are independent since . Thus, the number of coefficients α* _{i}*, δ

*,*

_{i}*i*= 1, · · · ,

*k*, which is significantly different from 0, should be ≤

*k*− 1 in a data analysis. This number is the degrees of freedom of the likelihood-ratio test statistic 2[

*L*

_{ad}−

*L*

_{0}]. The likelihood-ratio test is accurate and robust for large sample data based on the statistical theory.

Theoretically, it is not easy to evaluate the power of the likelihood-ratio test statistics. The reason is that it is very hard to calculate the approximations of noncentrality parameters of the likelihood-ratio test statistics. Sham* et al.* (2000) performed power analysis of the AbAw approach by deriving the approximations of the noncentrality parameters of the likelihood-ratio test statistics, which is rather complicated in our opinion. In addition to the likelihood-ratio test statistics, we develop an *F*-test procedure based on linear model theory in this article (Graybill 1976). Utilizing the formulas of noncentrality parameters in chapter 6 of Graybill (1976), the approximations of the noncentrality parameters of the *F*-tests are calculated readily. Moreover, we show that the type I error rates and power of the *F*-test are very close to those of the likelihood-ratio test statistics (Tables 2 and 3), which are actually guaranteed by the construction of the *F*-test for large samples (Graybill 1976, pp. 187–188). Therefore, both the likelihood-ratio test procedure and the *F*-test procedure are useful. Before introducing the *F*-test procedure, we discuss the parameter estimations first.

## PARAMETER ESTIMATIONS

### IBD estimations:

Denote the recombination fraction between the trait locus *Q* and marker *M _{i}* by θ

_{MiQ},

*i*= 1, · · · ,

*k*. Likewise, the recombination fraction between markers

*M*and

_{i}*M*is defined by θ

_{j}_{MiMj}. Following Fulker

*et al.*(1995) and Almasy and Blangero (1998), we propose a multipoint interval mapping method to estimate the proportion π

*of allele sharing IBD at a putative QTL*

_{ijQ}*Q*for a sib-pair

*i*and

*j*by 7where π

_{ijMl}is the proportion of alleles shared IBD at the marker

*M*for

_{l}*l*= 1, · · · ,

*k*. The coefficients α

_{π}, β

_{πM1}, · · · , β

_{πMk}are derived in appendix B as follows: And α

_{π}is estimated as . If marker

*M*coincides with QTL

_{l}*Q*, it can be shown that and α

_{π}= 0, . Hence . To estimate Δ

*of the probability of sharing two alleles IBD for a sib-pair, consider 8where Δ*

_{ijQ}_{ijMl}is the probability of sharing two alleles IBD at marker

*M*for

_{l}*l*= 1, · · · ,

*k*. The coefficients

^{τ}are derived in appendix C as follows: The remaining coefficients are given in appendix C by The α in Equation 8 is . Again, if marker

*M*coincides with QTL

_{l}*Q*, it can be shown that .

### Estimations of model coefficients and variance-covariance matrix:

As an example, assume that the data are composed of three subsamples: *n* individuals of a population; *m* trio families, each having both parents and a single child; and *s* nuclear families, each having both parents and two offspring. Furthermore, we assume that *n*, *m*, and *s* are sufficiently large, so that large sample theory applies. We may include data of nuclear families with both parents and more than two offspring. The principle of the following paragraphs can be extended to such families if the number of families is large enough to apply the large sample theory. To estimate the parameters, one may take the method of interval mapping proposed by Fulker* et al.* (1995) and Almasy and Blangero (1998). That is to say, for each location of the QTL on the chromosome with fixed recombination fractions, the IBD estimations are performed first. Then one may estimate parameters of Σ and η as follows.

Consider the overall log-likelihood , *I* = *n* + *m* + *s*, where *L _{i}* is the log-likelihood of trait vector or value

**y**

*of the*

_{i}*i*th family or individual. Let Σ

*be the variance-covariance matrix of trait vector or value*

_{i}**y**

*and*

_{i}*X*be its model matrix. Denote the total trait values by

_{i}**y**= (

**y**

^{τ}

_{1}, · · · ,

**y**

^{τ}

_{I})

^{τ}, the total variance-covariance matrix by Σ = diag(Σ

_{1}, · · · , Σ

*), and the model matrix by*

_{I}*X*= (

*X*

^{τ}

_{1}, · · · ,

*X*

^{τ}

_{I})

^{τ}. Let

*N*=

*n*+ 3

*m*+ 4

*s*be the total number of individuals. On the basis of the log-likelihood , parameters of Σ and η can be estimated by Newton-Raphson or Fisher scoring algorithms (Jennrich and Schluchter 1986). Let Σ̂ = diag(Σ̂

_{1}, · · · , Σ̂

*) be the maximum-likelihood estimates of Σ. Then the estimate of η is For each location of the QTL on the chromosome, the likelihood-ratio test or*

_{I}*F*-test statistics can then be calculated using the estimates Σ̂ and η̂. The location that gives the best result can be treated as the location of the QTL. In practice, some of the parameters (

*e.g.*, the variance parameter σ

^{2}

_{gd}) may not be estimable and identifiable due to the redundancy. For specific types of data, one needs to specify the model carefully.

*F*-TESTS AND NONCENTRALITY PARAMETER APPROXIMATIONS

On the basis of linear regression model theory, one may construct *F*-test statistics of genetic effects and LD coefficients (Graybill 1976). Moreover, the noncentrality parameters of the *F*-test statistics can be calculated readily. To evaluate the power of the *F*-test statistics, it is necessary to calculate the approximations of the noncentrality parameters. The procedure is as follows. First, one may construct an *F*-test statistic for each of three hypotheses:

H

_{ad}: α_{1}= · · · = α= δ_{k}_{1}= · · · = δ= 0;_{k}H

_{a}: α_{1}= · · · = α= 0;_{k}H

_{d}: δ_{1}= · · · = δ= 0._{k}

The noncentrality parameter of each *F*-test statistic can be calculated using the theory in Graybill (1976)(Chap. 6). Assume that there are no covariates. Then the coefficients of model (1) can be written as η = (β, α_{1}, · · · , α* _{k}*, δ

_{1}, · · · , δ

*)*

_{k}^{τ}. For each hypothesis, there is a

*q*× (2

*k*+ 1) matrix

*H*, such that the hypothesis can be written as

*H*η = 0, where

*q*is the rank of

*H*. On the basis of Graybill (1976), the

*F*-test statistic for hypothesis

*H*η = 0 is with a noncentral

*F*(

*q*,

*N*− (2

*k*+ 1), λ) distribution under the alternative hypothesis, where λ is the noncentrality parameter given by λ = (

*H*η)

^{τ}[

*H*(

*X*

^{τ}Σ

^{−1}

*X*)

^{−1}

*H*

^{τ}]

^{−1}(

*H*η).

### Combined analysis of population and family data:

Again, assume that the data are composed of three subsamples: *n* individuals of a population; *m* trio families, each having both parents and a single child; and *s* nuclear families, each having both parents and two offspring. To calculate the approximations of the noncentrality parameters, assume that the sample sizes *n*, *m*, and *s* are large enough that the large-sample theory applies. We show in appendix d the approximation 9where *a*_{1}, *a*_{2}, and *a*_{3} are constants given by Equations (D7) in appendix D.

The additive variance and the dominance variance are expressed in terms of the average effect of gene substitution α* _{Q}* and the dominance deviation δ

*. Let*

_{Q}*I*and

_{k}*I*

_{2}

*be*

_{k}*k*and 2

*k*dimension identity matrices. Moreover, let

*O*

_{k}_{×}

*be a*

_{l}*k*×

*l*zero matrix. To test hypothesis H

_{a}: α

_{1}= · · · = α

*= 0, the test matrix*

_{k}*H*= (

*O*

_{k}_{×1},

*I*,

_{k}*O*

_{k}_{×}

*). Let us denote the test statistic as*

_{k}*F*

_{k}_{,a}. The noncentrality parameter is approximated by To test hypothesis H

_{d}: δ

_{1}= · · · = δ

*= 0, the test matrix*

_{k}*H*= (

*O*

_{k}_{×1},

*O*

_{k}_{×}

*,*

_{k}*I*). Let us denote the test statistic as

_{k}*F*

_{k}_{,d}. The noncentrality parameter is approximated by To test hypothesis H

_{ad}: α

_{1}= · · · = α

*= δ*

_{k}_{1}= · · · = δ

*= 0, the test matrix*

_{k}*H*= (

*O*

_{2}

_{k}_{×1},

*I*

_{2}

*). Let us denote the test statistic as*

_{k}*F*

_{k}_{,ad}. The noncentrality parameter is λ

_{k}_{,ad}≈ λ

_{a}+ λ

_{d};

*i.e.*, λ

_{k}_{,ad}is decomposed into the summation of additive and dominant noncentrality parameters.

### Nuclear family data:

To make a comparison with the results of Abecasis* et al.* (2000a)(Table 4), we consider *I* families, each having both parents and *l* offspring. Let *N* = *I*(*l* + 2) be the total number of individuals. The other notations are defined in a similar way as above. Suppose that variance-covariance matrices of the *I* families are the same, *i.e.*, Σ_{1} = · · · = Σ* _{I}*. Denote . If the sample size

*N*is large enough, we show in appendix E that 10where

*b*

_{1}and

*b*

_{2}are constants given by Equations (E1) in appendix E. The approximation of the noncentrality parameter of statistic

*F*

_{k}_{,a}is

## TYPE I ERROR RATES

To evaluate the type I error rates of the proposed method, simulation program LDSIMUL kindly provided by G. R. Abecasis is used to generate data sets. Nuclear families are generated in simulation. Five test cases are considered in type I error rate calculation, which are taken from Abecasis* et al.* (2000a)(Table 2). Table 1 presents parameters of four test cases. Trait values are constructed by a normal distribution with mean 0 and total variance σ^{2} = 100 except for test case of *Admixture*. Here is the summation of the additive major gene effect σ^{2}_{ga}, the variance of polygenic effect σ^{2}_{Ga}, and the error variance σ^{2}_{e}. In each model except the *Admixture*, a diallelic marker *M*_{1} is simulated with allele frequency . In the test cases of *Null*, *Familiality*, and *Admixture*, no major gene effect is assumed, *i.e.*, . In the test cases of *Linkage* and *Composite*, major gene effect is assumed, and marker *M*_{1} coincides with the QTL *Q*, *i.e*., recombination fraction ; in the meantime, linkage equilibrium is assumed between QTL *Q* and the marker *M*_{1}, *i.e.*, . In the test case of *Admixture*, population admixture is generated by mixing families equally drawn from one of two subpopulations A and B. In both subpopulations A and B, no major gene effect or familial effect is assumed, *i.e.*, . However, the trait mean of subpopulation A is fixed at 10 and the variance is fixed at 100, and the marker allele frequency *P*_{M1} is taken as 0.7 in subpopulation A. The trait mean of subpopulation B is fixed at 0 and the variance is fixed at 100, and the marker allele frequency *P*_{M1} is taken as 0.3 in subpopulation B. Therefore, the total variance in the mixing population is σ^{2} = 125. The admixture contributed to (10 − 0)^{2}/[4 × 125] = 0.20 of the total variance.

To calculate the type I error rates, 1000 data sets are simulated for each test case. Each data set contains a certain number of related pedigrees. For instance, 120 trio families are generated for test case *Null* if the total number of offspring is 120 and the number of offspring in each family is 1; but only 15 families are generated if the number of offspring in each family is 8 and the total number of offspring is 120. Using the data sets, we fit the model where *B _{i}* is normal

*N*(0, σ

^{2}

_{Ga}),

*y*is normal

_{i}*N*(β +

*x*

_{i}_{1}α

_{1}, σ

^{2}), and . The null hypothesis is H

_{1,a}: α

_{1}= 0. Since the QTL

*Q*is in linkage equilibrium with marker

*M*

_{1}, an empirical test statistic that is larger than the cutting point at a 0.05 significance level is treated as a false positive. On the basis of either the likelihood-ratio test or the

*F*-test, type I error rates are calculated as the proportions of the 1000 simulation data sets that give a significant result at the 0.05 significance level based on

*F*

_{1,a}and the likelihood-ratio test statistic, respectively. Table 2 presents type I error rates of likelihood-ratio tests and

*F*-test statistics. The results show that the type I error rates are around the 0.05 nominal significance level in almost all cases. Hence, the proposed model is robust. In addition, the type I error rates of

*F*-tests are similar to those of the likelihood-ratio tests. In an association study, false positives due to population stratifications are usually a big issue. From the results of Table 2, the type I error rates in the

*Admixture*case are reasonable.

Table 2, bottom, shows a notable variability in the range of type I errors when the number of offspring is 8 and the sample sizes are small. For example, the type I error rates of the *F*-test *F̂*_{1,a} are 6.7% for test case of *Composite* when the total number of offspring is 120. This is most likely due to the small sample size and multivariate normality. When the total number of offspring is 120, there are only 15 pedigrees, each consisting of two parents and 8 offspring; and the variance-covariance matrix Σ is a big 10 × 10 square matrix. Hence, the parameter estimations are hardly accurate, which makes the deviation from the nominal level greater. When the sample size increases (*i.e*., the total number of offspring is 240 or 480), the type I error rates are close to the nominal level of 0.05. The results of Table 2 are based on 1000 simulated data sets, which may not be always reliable. To further investigate the issue, we perform a calculation in the next section based on 20,000 simulated data sets for another *Composite* test case in Table 3. The results of Table 3 confirm that the type I error rates are close to the nominal level for large-sample data.

## POWER CALCULATION AND COMPARISON

### Comparison with the AbAw approach:

Denote the heritability by *h*^{2}, which is defined as (Falconer and Mackay 1996). To compare the method proposed in this article with the AbAw approach of Abecasis* et al.* (2000a), we present a power comparison in Table 3. The parameters are the same as those of Abecasis* et al.* (2000a)(Table 4): , *h*^{2} = 0.1, σ^{2} = 100, , , . In addition, and . In the AbAw columns in Table 3, the results are taken from Abecasis* et al.* (2000a)(Table 4). In the (*F*_{1,a}, *F̂*_{1,a}, LRT)^{τ} columns, the power (%) of *F*_{1,a} is calculated on the basis of approximation of noncentrality parameter λ_{1,a} of test statistic *F*_{1,a} at a 0.001 significance level; the power (%) of *F̂*_{1,a} and the LRT are calculated as the proportions of 1000 or 20,000 simulation data sets that give a significant result at the 0.001 significance level based on *F*_{1,a} and the likelihood-ratio test statistic, respectively. For each simulated data set, a certain number nuclear families are simulated via LDSIMUL. For instance, for one sib per family, 480 trio families are simulated in each simulated data set.

The results of Table 3 clearly show that the proposed *F*-tests *F*_{1,a} and likelihood-ratio tests are much more powerful than the AbAw approach. When 25%, it is possible to achieve considerable power. When , the statistic *F*_{1,a} is powerful for a sample with a total number of 480 sibs. In addition, the results of Table 3 show that the empirical power of *F̂*_{1,a} is similar to that of the likelihood-ratio test. This implies that in a large sample the two tests provide similar power (Graybill 1976). The AbAw approach presented in Abecasis* et al.* (2000a) utilized only the trait values of sibships in the model and discarded the trait values of parents. This is, obviously, not an efficient way. The proposed methods, on the other hand, incorporate both parental and sibship phenotypes into the models. This considerably increases the power as shown in Table 3.

In Table 3, the first row of results corresponds to the case when *D*′ is zero, *i.e.*, a situation when the null hypothesis of no association is true. Hence, the power results for all these tests are simply the type I error rates. It can be seen that the type I error rates are close to the nominal level 0.001 = 0.1% when the number of simulated data sets is 20,000. This is consistent with the conclusion of Table 2; *i.e.*, the proposed model is robust. To make a comparison with the results of Abecasis* et al.* (2000a)(Table 4), the results of *F̂*_{1,a} and the LRT of 1000 simulated data sets are also presented. In most cases, the entries are equal to the nominal level 0.001 = 0.1%; *i.e.*, one of the 1000 data sets leads to a significant result, but some entries are 0 since none of the 1000 data sets leads to a significant result.

In Table 3, there is a trend that the power of (*F*_{1,a}, *F̂*_{1,a}, LRT)^{τ} to detect association decreases with the increasing sibship sizes. This is partly because the sample size *N* decreases although the total number of offspring is the same, 480: For 480 trio families of one sib per family, the total number of individuals is *N* = 1440; for 60 families of eight sibs per family, the total number of individuals is *N* = 600. For the AbAw approach presented in Abecasis* et al.* (2000a), the total number of offspring that are used in the model is the same, 480. Since our models use phenotypes of both parents and offspring, the sample sizes *N* are different. On the other hand, for the same total number of typed individuals *N*, families of large sibship sizes contain less LD information than families of small sibship sizes. The readers may note that this result is consistent with findings in Fan and Xiong (2003). In Fan and Xiong (2003)(p. 131, Figure 3), the population-based method is shown to be more powerful than the family-based method for the same number of individuals.

### Comparisons of sample size and power of LD mapping:

Power and sample size calculations are performed to investigate the merits of the proposed method. Figure 1 shows the power curves of the test statistics *F*_{4,a}, *F*_{3,a}, *F*_{2,a}, *F*_{4,d}, *F*_{3,d}, and *F*_{2,d} against the linkage disequilibrium coefficient *D*_{M1Q} at a 0.01 significance level for a dominant mode of inheritance (*a* = *d* = 1.0) and a recessive mode of inheritance (*a* = 1.0, *d* = −0.5). The related parameters are given in the Figure 1 legend. Generally, the power of *F*_{4,a} using four markers in the model is higher than that of *F*_{3,a} using three markers, which in turn is higher than that of *F*_{2,a} using two markers. Hence, multiple-marker analysis is advantageous. The power of *F _{k}*

_{,d}is usually minimal unless the LD between locus

*Q*and marker

*M*

_{1}is very strong for the dominant mode of inheritance. Note the power curves of Figure 1 are not symmetric with respect to

*D*

_{M1Q}. This is due to ,

*i*= 2, 3, 4, ,

*i*≠

*j*, and so the power curves do not have to reach a minimum value when

*D*

_{M1Q}is zero. Instead, they are shifted to the right, so that the minimum is at a point when

*D*

_{M1Q}> 0. Figure 2 provides the power of the test statistics

*F*

_{4,a},

*F*

_{3,a},

*F*

_{2,a},

*F*

_{4,d},

*F*

_{3,d}, and

*F*

_{2,d}against heritability

*h*

^{2}at a 0.01 significance level for a dominant mode of inheritance (

*a*=

*d*= 1.0) and a recessive mode of inheritance (

*a*= 1.0,

*d*= −0.5), respectively. In addition to the merits shown in Figure 1, the power of the test statistics

*F*

_{4,a},

*F*

_{3,a},

*F*

_{2,a}is high when heritability

*h*

^{2}is >0.10 for both modes of inheritance.

Figure 3 shows the power of test statistics *F*_{4,a}, *F*_{3,a}, *F*_{2,a}, and *F*_{1,a} against the trait allele frequency *q*_{1} (Figure 3A) or marker allele frequency *P*_{M1} (Figure 3B) at a 0.01 significance level for an additive mode of inheritance *a* = 1.0, *d* = 0.0, respectively. The other parameters are given in the Figure 3 legend. From Figure 3A, it can be seen that the power of *F _{k}*

_{,a}increases as the trait allele frequency

*q*

_{1}increases. Figure 3B shows that the power of

*F*

_{4,a}and

*F*

_{3,a}is almost constant; in addition, the power of

*F*

_{2,a}increases slowly, and the power of

*F*

_{1,a}increases as the marker allele frequency

*P*

_{M1}increases. In general, the power of

*F*

_{4,a}and

*F*

_{3,a}depends heavily on the trait allele frequency

*q*

_{1}, but not on the marker allele frequency

*P*

_{M1}. At first glance, it is strange that the power of

*F*

_{4,a}and

*F*

_{3,a}does not depend very much on the marker allele frequency

*P*

_{M1}. The mystery is that the LD measures ,

*i*= 2, 3, 4 are already high. That is why the contribution of marker

*M*

_{1}matters not very much to the power of

*F*

_{2,a},

*F*

_{3,a}, and

*F*

_{4,a}. This adds one more piece of information to the advantage of multiple-marker analysis. That is, as long as some markers are in strong linkage disequilibrium with the trait locus, the power to detect the association is high.

Assume that the LD is due to historical mutations *T* generations ago at QTL *Q*. At the initial generation when the mutation occurred, the LD coefficient is , where *P*(*M _{i}Q*)(0) is the frequency of haplotype

*M*. The LD coefficient is reduced by a factor 1 − θ

_{i}Q_{MiQ}in each subsequent generation. The LD between marker

*M*and

_{i}*Q*is at the current generation. Assume that the marker

*M*

_{1}locates at position 0 cM, marker

*M*

_{2}locates at position 1 cM, marker

*M*

_{3}locates at position 2 cM, and marker

*M*

_{4}locates at position 3 cM. Under the assumption of no interference, we may calculate the recombination fraction by Haldane's map function, where Ω

_{MiMj}is the map distance between marker

*M*and marker

_{i}*M*. Similarly, the recombination fraction θ

_{j}_{MiQ}can be calculated by the distance Ω

_{MiQ}between QTL

*Q*and marker

*M*,

_{i}*i*= 1, · · · , 4. Suppose that the QTL

*Q*is located along the horizontal axis;

*i.e.*, it moves from 0 to 3 cM. Figure 4 shows the power curves of the test statistics

*F*

_{4,a},

*F*

_{4,ad},

*F*

_{3,a},

*F*

_{3,ad},

*F*

_{2,a}, and

*F*

_{2,ad}against the location of QTL

*Q*for a dominant mode of inheritance (

*a*=

*d*= 1) and a recessive mode of inheritance (

*a*= 1.0,

*d*= −0.5), respectively. The powers of

*F*

_{4,a}and

*F*

_{4,ad}with four markers in the model are generally high across the location of QTL

*Q*, since at least one marker is close to the QTL

*Q*. The power of

*F*

_{3,a}and

*F*

_{3,ad}using three markers in the model is similar to that of four markers, except that QTL

*Q*locates far above marker

*M*

_{3},

*i.e.*, λ

_{M1Q}≥ 2.3cM. The power of

*F*

_{2,a}and

*F*

_{2,ad}using two markers in the model is high when the QTL is close to markers

*M*

_{1}and

*M*

_{2}. However, once the QTL is far above marker

*M*

_{2}(

*i.e.*, λ

_{M1Q}≥ 1.3cM ), the power of

*F*

_{2,a}and

*F*

_{2,ad}using two markers in the model decreases very quickly. Figure 4 implies that multiple-marker LD analysis has high power in fine mapping of QTL. Moreover, the power of test statistic

*F*

_{k}_{,a}, which tests only the additive effect, is higher than that of

*F*

_{k}_{,ad}, which tests both the additive and dominance effects through the proposed model. The reason is that the degrees of freedom of test statistics increases if the dominance effect is added to the test statistics. Figure 5 shows the power curves of test statistic

*F*

_{4,ad}against the position of markers

*M*

_{1}, · · · ,

*M*

_{4}for different mutation age at a 0.01 significance level. The trait locus

*Q*locates at position 10 cM. The four markers flank the trait locus

*Q*; two markers are on each side of the QTL with equal distance to each other as follows:

*M*

_{2}= 5 +

*M*

_{1}/2,

*M*

_{3}= 15 −

*M*

_{1}/2,

*M*

_{4}= 20 −

*M*

_{1}. Here

*M*also denotes the location in centimorgans of marker

_{i}*M*. As the mutation ages, the power decreases and the power can be high only when the markers are close to the trait locus.

_{i}Figure 6 shows the required number of trio families or families with both parents and two offspring for the test statistics *F*_{4,a}, *F*_{3,a}, *F*_{2,a}, and *F*_{1,a} against heritability *h*^{2} at a significance level 0.01 and power 0.8. For a favorable case (Figure 6, A and C), the parameters are given by , and for *i*, *j* = 1, · · · , 4, *i* ≠ *j*. For a less favorable case (Figure 6, B and D), the parameters are given by *q*_{1} = 0.2, , , and for *i*, *j* = 1, · · · , 4, *i* ≠ *j*. For the favorable case, the required number of families of test statistics *F*_{4,a} and *F*_{3,a} is <200 and that of *F*_{2,a} is <600 if heritability *h*^{2} is >0.1. For the less favorable case, the required number of families of test statistics *F*_{4,a} and *F*_{3,a} is <500 and that of *F*_{2,a} is <700 if heritability *h*^{2} is >0.1. The required number of families of test statistics *F*_{1,a} is very large for both favorable and less favorable cases.

## AN EXAMPLE

The proposed method is applied to the Genetic Analysis Workshop 12 German asthma data (Meyers* et al.* 2001). The data consist of 97 nuclear families, including 415 persons. Seventy-four families have two children, 19 have three children, and 4 have four children. Wjst* et al.* (1999) perform linkage analysis for total serum IgE by a nonparametric statistic of MAPMAKER/SIBS 2.1. Three markers on chromosome 1 are shown to be linked with immunoglobulin E (IGE) level, *i.e.*, marker D1S207 at position 118.1 cM, marker D1S221 at position 146.7 cM, and marker D1S502 at position 151.2 cM. In Fan and Jung (2003), we analyze the data using sibships and confirm the result of Wjst* et al.* (1999). By the method proposed in this article, we analyze the data again. The dominance variance of log(IGE) is significantly >0 at position 149.85 cM (*P*-value, 0.00075; compared with the *P*-value of 0.01 in Fan and Jung 2003). On this basis, we collapse alleles 6, 8, and 10 as allele *M*_{1} at marker D1S207 and others as allele *m*_{1}. At marker D1S221, alleles 5, 6, and 7 are collapsed as allele *M*_{2} and other alleles as allele *m*_{2}. At marker D1S502, we collapse alleles 7, 8, and 12 as allele *M*_{3} and others as allele *m*_{3}. Then, we find that coefficient δ_{2} is significantly different from 0 at position 149.85 cM, with a *P*-value of 0.034 by likelihood-ratio test (compared with the *P*-value of 0.0475 in Fan and Jung 2003) and a *P*-value 0.034 by *F*-test (compared with the *P*-value 0.0484 in Fan and Jung 2003). The estimation is δ̂_{2} = 0.76. Hence, we are able to confirm the result of Wjst* et al.* (1999) and find that marker D1S221 is associated with log(IGE).

Compared with the results of Fan and Jung (2003), the evidence in the above paragraph is stronger since the *P*-values are smaller. There are two reasons for this. In this article, all family members are used in the analysis (compared with only sibships used in Fan and Jung 2003). This article used three markers in the analysis (compared with only two markers used in Fan and Jung 2003). Hence, the proposed model improves the performance of the previous method.

## DISCUSSION

On the basis of multiple diallelic markers, this article proposes variance component models for high-resolution joint linkage and association mapping of QTL. The models extend our previous work using two diallelic markers in analysis and incorporate genetic-marker information into the models (Fan and Xiong 2002, 2003; Fan and Jung 2003; Fan* et al.* 2005). By analytical analysis, it is shown that linkage disequilibrium measures and genetic effects are incorporated in the mean coefficients. On the basis of marker information, a multipoint interval mapping method is provided to estimate the proportion of allele-sharing IBD and probability of sharing two alleles IBD at a putative QTL for a sib-pair. It is shown that recombination fractions, *i.e*., linkage information, are contained in variance-covariance matrices. Therefore, the proposed methods model both association and linkage in a unified model.

In the literature, there is plenty of research for linkage mapping of QTL (Amos 1994; Fulker* et al.* 1995; Almasy and Blangero 1998). The linkage evidence can be detected by fitting model (6) as the first step on the basis of a sparse genetic map. In this article, we put more effort into high-resolution linkage disequilibrium mapping of QTL in the presence of prior linkage evidence. To test the association between the trait locus and the markers, both likelihood-ratio tests and *F*-tests can be constructed on the basis of the proposed models. In addition, analytical formulas of noncentrality parameter approximations of the *F*-test statistics are provided. After comparing it with the AbAw approach, it is found that the method proposed in this article is more powerful and advantageous on the basis of simulation study and power calculation. By power and sample size comparison, it is shown that models that use more markers may have higher power than models that use less markers. Multiple-marker analysis can be more advantageous and has higher power in fine mapping QTL.

In an association study, population stratification can have a huge impact on a study, which leads to high false positives (Ewens and Spielman 1995). Zhao and Xiong (2002) proposed unbiased quantitative population association tests to investigate the issue. In this article, we perform type I error calculations. We allow for the very extreme form of population admixture, in which each family is drawn from a different stratum (Abecasis* et al*. 2000a). Type I error rates of the proposed test statistics are calculated to investigate the behaviors of the test statistics under the null distribution. Five test cases including population admixture are considered to investigate the type I error rates. The results show the proposed models and methods have correct type I error rates for most cases and are robust.

In a QTL mapping study, a strategy may be taken as follows. First, linkage analysis can be carried out using a sparse genetic map. Then, an association study can be performed using a dense genetic map for high-resolution mapping of the trait. The basic idea is to take advantage of linkage analysis for prior linkage information. In the meantime, one can take advantage of the high-resolution association study for fine mapping a genetic trait. It is well known that linkage analysis is robust; *i.e.*, the false-positive rates are not high. However, the resolution of linkage analysis can be low. On the other hand, the resolution of the association study is high. But the association study is prone to false positives caused by population stratifications. Using the method proposed in this article, it is more likely to avoid high false-positive rates by performing an association study in the presence of prior linkage. The low resolution of a prior linkage analysis can be remedied by the follow-up high-resolution association study.

In recent years, there has been great interest in linkage disequilibrium mapping of QTL (Allison 1997; Rabinowitz 1997; Zhang and Zhao 2001). Various methods of joint analysis of linkage and association are proposed by researchers (Almasy* et al.* 1999; George* et al.* 1999; Martin* et al.* 2000). On the basis of variance component models, a combined linkage and association AbAw approach has been developed to decompose association effects into within- and between-family components (Fulker* et al.* 1999; Abecasis* et al.* 2000a,b, 2001; Cardon 2000; Sham* et al.* 2000). However, most research is limited to using one diallelic marker a time to model the association of QTL. This article proposes use of multiple markers to model the association and linkage. The genetic effects are orthogonally decomposed into additive and dominance effects. The method has the advantage of high-resolution dissection of genetic traits in an era in which dense marker maps are available (International SNP Map Working Group 2001; Kong* et al.* 2002). It is hoped that the current research may stimulate more interest in building models for joint linkage disequilibrium and linkage mapping of QTL.

In a genetics study, the first-hand data are usually genotyping information. The methods developed in this article can be directly used in analyzing quantitative and genotyping data of nuclear families by combining linkage and association information together. In the meantime, one may argue the use of haplotype data in an analysis that can be constructed on the basis of genotyping data. The question is an important issue as the haplotype map project will soon be completed and haplotype data will be readily available (International HapMap Consortium 2003; HapMap project, http://www.hapmap.org). The proposed method deals with diallelic markers. When the markers are not diallelic as is the case in the analyzed data, we collapse alleles into two groups to form two allele types. The hidden question is whether this collapsing has any consequence in type I error because the collapsing is not unique, which leads to the selection issue. It is important to develop appropriate models and handy algorithms in linkage and association mapping of complex diseases using haplotype/multiallelic marker data. It would be interesting to see a comparison of the two approaches. In Jung* et al.* (2004), a population-based regression approach is explored for association mapping of QTL using haplotype data. It is important to extend the research to utilize both population and pedigree data based on multiallelic markers/haplotypes.

One potential problem of using multiple markers in analysis is that the degrees of freedom of test statistics can be large, which may lead to low power. Moreover, the number of LD measures can be large. Thus, selection of appropriate markers for analysis is one issue that needs careful consideration. The optimal number of markers needed depends on a specific trait in a study. Also, it depends on the LD measures among the QTL and the markers. In data analysis, the markers that show significance in the model can be included in the final analysis. On the one hand, it would not be a good strategy to use many diallelic markers in the model. More markers will lead to higher degrees of freedom in test statistics. The number of markers that show significance is unlikely to be too large. Usually, using three or four relevant markers in an analysis would be worthwhile, since it may have higher power than a two- or one-marker analysis. In the meantime, the degrees of freedom of test statistics and number of LD measures would not be too big using three or four markers in an analysis. The second problem is the existence of a dominance trait effect. If the dominance effect is present, one may lose power by excluding it from analysis (Fan and Xiong 2002). However, one may get low power by testing hypothesis α_{1} = · · · = α* _{k}* = δ

_{1}= · · · = δ

*= 0, if the dominance effect is not significantly present to influence the trait values, due to the increase of degrees of freedom of test statistics.*

_{k}So far, only one trait locus *Q* is assumed to be located in the chromosome region. Suppose that there are multiple QTL in the region. The mixed-effect model (1) can still be used in QTL mapping. In addition, suppose that the trait value is influenced by unlinked trait loci in different regions. Then model (3) needs to be generalized to use markers from different regions in analysis (Hoh and Ott 2003). If multiple-trait loci are present, other issues such as epistasis need more in-depth investigation. For IBD estimation, we follow the method proposed by Fulker* et al.* (1995) and Almasy and Blangero (1998). If there is LD between the trait and markers, LD among markers would also be expected and needs to be incorporated in estimating IBD. However, it is not clear how to achieve this. This is a very interesting and important research area for future study. Better IBD estimates would lead to a fitted variance-covariance structure that is a better approximation of the true variance-covariance structure. This would improve the performance of the proposed models.

## APPENDIX A

Taking variance-covariance among *x _{ij}*,

*z*,

_{ij}*y*of the mixed-effect model (1) leads to the following variance-covariance equations: A1In a similar way to that in Fan and Xiong (2002)(Appendix A), the following expectations, variance, and covariances can be derived accordingly:

_{i}*Ex*= 0,

_{ij}*Ez*= 0, for

_{ij}*j*,

*l*= 1, · · · ,

*k*,

*j*≠

*l*. Plugging the above quantities into (A1) gives Therefore, the coefficients of (5) are derived.

## APPENDIX B

To simplify notations, we omit subscripts *ij* from π* _{ijQ}*, π

_{ijM1}, · · · , π

_{ijMk}, Δ

_{ijM1}, · · · , Δ

_{ijMk}in appendixes b and c. Taking variance-covariance among π

*, π*

_{Q}_{Mj},

*y*of Equation 7 leads to B1

_{i}From Elston and Keats (1985) and Almasy and Blangero (1998), we have Plugging the above quantities into Equation B1 gives which leads to

## APPENDIX C

Taking variance-covariance among Δ_{Q}, π_{Mj}, Δ_{Ml} of Equation 8 leads to C1As in appendix B, the following covariances are from Elston and Keats (1985), Almasy and Blangero (1998), and Fan and Jung (2003), where . Plugging the above results into the equation (C1), we have a submatrix block equation, where

Therefore, we have from Harville (1997) that The equation leads to Moreover, we have

## APPENDIX D

To derive *a*_{1}, *a*_{2}, *a*_{3} in approximation (9), we assume three subsamples of a population: *n* individuals; *m* trio families, each having both parents and a single child; and *s* nuclear families, each having both parents and two offspring.

For each

*y*of the_{i}*n*individuals, Σ= σ_{i}^{2}and*X*= (1,_{i}*x*_{i}_{1}, · · · ,*x*,_{ik}*z*_{i}_{1}, · · · ,*z*),_{ik}*i*= 1, · · · ,*n*. When the sample size*n*of individuals is large, the large number law leads to Therefore, we have the approximation D1where*V*_{A}and*V*_{D}are additive and dominance variance-covariance matrices defined by (4).For the

*i*th trio family, let (*y*,_{fi}*y*,_{mi}*y*_{i}_{1})^{τ}be the trait values and*X*= (_{i}*X*,_{fi}*X*,_{mi}*X*_{i}_{1})^{τ}be the related model matrix,*i*=*n*+ 1, · · · ,*n*+*m*. In the same way as that of Fan and Xiong (2003)(Appendix A), the covariance matrix between parents and their offspring can be shown to be D2

where *O _{k}* is a zero

*k*×

*k*matrix. For each of the

*m*trio families, the variance-covariance matrix The inverse matrix of Σ

*is By the above formulas, we can show the following: D3*

_{i}

For the

*i*th family that is composed of both parents and two offspring, let (*y*,_{fi}*y*,_{mi}*y*_{i}_{1},*y*_{i}_{2})^{τ}be the trait values and*X*= (_{i}*X*,_{fi}*X*,_{mi}*X*_{i}_{1},*X*_{i}_{2})^{τ}be the related model matrix,*i*=*n*+*m*+ 1, · · · ,*n*+*m*+*s*. In the same way as that of Fan and Xiong (2003)(Appendix C), it can be shown that D4For each of the*s*families, the inverse variance-covariance matrix D5where . Using (D2), (D4), and (D5), we can show D6where the constants are given by . Combining the*n*individuals,*m*trio families, and*s*families with two offspring, the equations (D1), (D3), and (D6) lead to , where D7

## APPENDIX E

Using (D2) and (D4), we can show approximation (10). The constants *b*_{1} and *b*_{2} are given by E1

## Acknowledgments

We thank G. Gibson for kindness and patience in handling this article; and we thank two anonymous reviewers for very detailed and thoughtful critiques, which improved the article greatly. We are grateful to G. R. Abecasis for kindly providing the simulation program LDSIMUL to generate simulated data sets. R. Fan was supported partially by a research fellowship from the Alexander von Humboldt Foundation, Germany, by an international research travel assistance grant, Texas A&M University, and by the National Science Foundation Grant DMS-0505025.

## Footnotes

Communicating editor: G. Gibson

- Received August 18, 2004.
- Accepted February 10, 2005.

- Genetics Society of America