- THIS ARTICLE
-
Abstract
- Full Text (PDF)
-
All Versions of this Article:
genetics.105.045781v1
172/1/647 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Lou, X.-Y.
- Articles by Li, M. D.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Lou, X.-Y.
- Articles by Li, M. D.
Originally published as Genetics Published Articles Ahead of Print on September 19, 2005.
Genetics, Vol. 172, 647-661, January 2006, Copyright © 2006
doi:10.1534/genetics.105.045781
Improvement of Mapping Accuracy by Unifying Linkage and Association Analysis
Xiang-Yang Lou*,
Jennie Z. Ma
,
Mark C. K. Yang
,
Jun Zhu
,
Peng-Yuan Liu**,
Hong-Wen Deng**,
Robert C. Elston
and
Ming D. Li*,1
* Department of Psychiatric Medicine, University of Virginia, Charlottesville, Virginia 22911,
Department of Psychiatry, University of Texas Health Science Center, San Antonio, Texas 78229,
Department of Statistics, University of Florida, Gainesville, Florida 32611,
Department of Agronomy, Zhejiang University, Hangzhou, Zhejiang 310029, People's Republic of China, ** Osteoporosis Research Center, Creighton University Medical Center, Omaha, Nebraska 68131 and 
Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio 44109
1 Corresponding author: 1670 Discovery Dr., Ste. 110, Charlottesville, VA 22911.
E-mail: ml2km{at}virginia.edu
>ABSTRACT
MODEL AND METHOD
SIMULATION STUDIES
APPLICATION
DISCUSSION
APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
It is well known that pedigree/family data record information on the coexistence in founder haplotypes of alleles at nearby loci and the cotransmission from parent to offspring that reveal different, but complementary, profiles of the genetic architecture. Either conventional linkage analysis that assumes linkage equilibrium or family-based association tests (FBATs) capture only partial information, leading to inefficiency. For example, FBATs will fail to detect even very tight linkage in the case where no allelic association exists, while a violation of the assumption of linkage equilibrium will result in biased estimation and reduced efficiency in linkage mapping. In this article, by using a data augmentation technique and the EM algorithm, we propose a likelihood-based approach that embeds both linkage and association analyses into a unified framework for general pedigree data. Relative to either linkage or association analysis, the proposed approach is expected to have greater estimation accuracy and power. Monte Carlo simulations support our theoretical expectations and demonstrate that our new methodology: (1) is more powerful than either FBATs or classic linkage analysis; (2) can unbiasedly estimate genetic parameters regardless of whether association exists, thus remedying the bias and less precision of traditional linkage analysis in the presence of association; and (3) is capable of identifying tight linkage alone. The new approach also holds the theoretical advantage that it can extract statistical information to the maximum extent and thereby improve mapping accuracy and power because it integrates multilocus population-based association study and pedigree-based linkage analysis into a coherent framework. Furthermore, our method is numerically stable and computationally efficient, as compared to existing parametric methods that use the simplex algorithm or Newton-type methods to maximize high-order multidimensional likelihood functions, and also offers the computation of Fisher's information matrix. Finally, we apply our methodology to a genetic study on bone mineral density (BMD) for the vitamin D receptor (VDR) gene and find that VDR is significantly linked to BMD at the one-third region of the wrist.
TWO approaches are commonly used in pedigree- or family-based gene mapping, i.e., linkage analysis (e.g., ELSTON and STEWART 1971; HASEMAN and ELSTON 1972; OTT 1974; LANDER and GREEN 1987; RISCH 1990; WARD 1993; AMOS 1994; KRUGLYAK and LANDER 1995; O'CONNELL and WEEKS 1995; KRUGLYAK et al. 1996; GUDBJARTSSON et al. 2000; ABECASIS et al. 2002) and family-based association tests (FBATs) (e.g., FALK and RUBINSTEIN 1987; SPIELMAN et al. 1993; LAZZERONI and LANGE 1998; LAIRD et al. 2000; RABINOWITZ and LAIRD 2000). Linkage analysis focuses on gene cosegregation that can be characterized by inheritance vectors or gene concordance between related individuals (identical-by-descent, IBD, or identical-in-state, IIS) at each locus, while association tests (which, when due to linkage, are tests of gametic association, also called linkage disequilibrium, LD) directly utilize allele status and linkage phase that record historic events. Pedigree data contain both these components of information that give rise to complementary profiles of the genetic architecture. Either linkage or association analysis alone, however, can capitalize only on the genetic information from one of these components and fails to grasp the whole picture, thereby leading to a loss in mapping accuracy and statistical power.
To illustrate the limitations of applying either a linkage or association approach alone, let us consider the affected sib pair design used in RISCH (1990) and RISCH and MERIKANGAS (1996). First, traditional linkage analysis will give a biased result in the presence of population association. To simplify our exposition, assume there are a diallelic disease locus
with alleles Q and q and a codominant marker locus
with alleles A and a. Alleles Q and A have the same frequency and are in perfect association, and let pQ = pA = pAQ = p. Table 1 lists the assumed probabilities (under no association), the true probabilities, and RISCH's (1990) LOD scores of all six possible sib configurations in the case where marker
is unlinked to a recessively inherited disease gene
. Using RISCH's (1990) EM iterative Equation 4, we can obtain the maximum-likelihood estimates (MLEs) of the posterior probabilities that the affected sib pairs share i marker alleles IBD (i = 0, 1, 2). To illustrate the result, we take p to be a specific value, say p = 0.5, then we have
,
, and
, respectively, and the expected LOD (ELOD) = 0.384. These values deviate substantially from the true IBD sharing scores of 0.25, 0.5, and 0.25, respectively, and exhibit a spuriously excessive allele sharing. This suggests that a false-positive result can occur in allele-sharing analysis. We further demonstrate that, generally, the assumed likelihood is a monotonically decreasing function of the recombination fraction
for
[0, 0.5] (see the APPENDIX). This means that, if the true recombination fraction
0
0, we may still obtain an estimate of zero.
|
Second, neglecting to take account of information on association may cause loss of statistical power. As pointed out by RISCH and MERIKANGAS (1996), the allele-sharing method is much less powerful than the transmission/disequilibrium test (TDT) method in the cases they considered, i.e., when there is no recombination and the alleles at the two loci are perfectly associated. This arises because the linkage statistic, the mean allele sharing, fails to consider the allele-specific IBD sharing. Actually, allele A (increasing disease risk) contributes more allele sharing to the statistic, whereas allele a contributes less, so that the overall mean allele sharing is diluted. Our simulations of model-based linkage-only analysis support this theoretical argument, i.e., the plausible bias and the reduced power (see SIMULATION STUDIES).
Because they fail to incorporate information on linkage, FBATs are inherently conservative, and so they cannot detect linkage even when two or more siblings are available, unless there is also population association. The conclusion by RISCH and MERIKANGAS (1996) was drawn from the ideal circumstance where the marker is the disease gene itself. In such a situation, FBATs reach their maximum potential power. In practice, however, it may not be true that a marker happens to have the same variant frequencies as, and be perfectly associated with, the disease gene of interest, even for fine mapping, as there are always many polymorphic SNPs within a gene whereas only a few may be responsible for the change of its function. Both theoretical and empirical studies (e.g., KRUGLYAK 1999; HINDS et al. 2005) have shown that the founder LD within a small region has usually been largely disrupted by various population forces, such as recombination, gene conversion, and/or mutation accumulated over time, so that high-LD regions with little genetic shuffling, termed haplotype blocks, span only a very short distance, implying that strong LD is not inevitable with tightly linked loci. HapMap studies also indicate that the frequencies of variants change from one SNP to another largely within a block (INTERNATIONAL HAPMAP CONSORTIUM 2003). In practical application, FBATs can therefore lose their theoretical power even with closely linked loci, owing to the violation of such an ideal assumption. Furthermore, association may extend over a great distance, even to nonsyntenic loci because of factors other than linkage, such as population subdivision and admixture, population bottlenecks, mutation, gene conversion, meiotic drive, sampling or ascertainment bias, nonrandom mating, and coancestry. Caution is also required in that a positive result from an FBAT does not necessarily imply the presence of tight linkage; i.e., an FBAT alone cannot distinguish strong association and loose linkage from weak association and tight linkage (ELSTON 1998; WHITTAKER et al. 2000).
Therefore, it is of great interest to remedy the above limitations. A judicious way is to take both these pieces of information into consideration in gene mapping. Such an idea was conceived in earlier literature (e.g., MACLEAN et al. 1984) and adopted in some computer software such as LINKAGE (LATHROP and LALOUEL 1984; LATHROP et al. 1985). Unfortunately, the bonus from joint mapping was not recognized, so this remarkable idea has been buried for several years (XIONG and JIN 2000). Recently, ZHAO et al. (1998) proposed a semiparametric method for a combined linkage and linkage disequilibrium analysis. XIONG and JIN (2000) advocated a likelihood-based parametric method for joint analysis with nuclear family data. CANTOR et al. (2005) further extended XIONG and JIN's (2000) method for general pedigrees. LI et al. (2005) suggested an approach that identifies associated and potentially causal SNPs through joint modeling of linkage and association. Parallel to parametric ones, variance components (e.g., ALLISON et al. 1999; FULKER et al. 1999; ABECASIS et al. 2000) and nonparametric (HUANG and JIANG 1999; WICKS 2000; WICKS and WILSON 2000; LAZZERONI 2002) methods have also been developed. However, those methods work mostly for specific data structures and types such as affected sib pairs, nuclear families, and categorical traits and/or can provide a solution only for specific problems such as single-point analysis. The bonus of combined mapping has also not been thoroughly explored. By invoking a data augmentation technique and the EM algorithm, we have evolved a general likelihood-based statistical framework for integrating linkage and association analyses (LOU et al. 2005). In the present article, we further extend this model-based approach for general pedigrees. This approach allows us to simultaneously perform segregation, linkage, and association analyses, i.e., to estimate penetrance functions, genetic distances, and association parameters, as well as to carry out the corresponding hypothesis tests within a unified framework. More appealingly, it adds several unique strengths to existing parametric methods (e.g., XIONG and JIN 2000; CANTOR et al. 2005; LI et al. 2005). First, this framework is conceptually straightforward, flexible, easy to generalize, and also comprehensive, so that it covers a wide range of cases with multiple loci and/or multiple alleles. Multilocus mapping and epistatic QTL mapping can be implemented as well under the same concept. Second, our new approach is computationally efficient and powerful. We formulated the closed-form solutions for MLEs implemented with EM iteration and thus avoid the computational difficulty of high-order multidimensional searches, leading to less computational time per iteration and quick convergence. Third, due to the advantage of the EM algorithm over the simplex algorithm and Newton-type methods in the context of a mapping study, as pointed out by some authors (e.g., LANDER and GREEN 1987), our new approach is numerically stable, as compared with existing methods. In our experience, a wide range of initial values appears to give good convergence. Finally, we offer the computation of Fisher's information matrix and hence can provide the estimation precision of MLEs. Although this article emphasizes a demonstration of the improvement in mapping accuracy using a two-locus model, i.e., one marker and one trait gene, we use an interval mapping model to describe our new approach in the MODEL AND METHOD section for readers to have a clearer picture about it. After presenting the theory, we use simulation studies to compare the power of an FBAT, of the pure linkage method, and of our new approach and the estimation precision of the latter two. An application to the genetic study of bone mineral density (BMD) is used to demonstrate this new methodology. Finally, we discuss some relevant issues to provide further insights into this approach.
ABSTRACT
>MODEL AND METHOD
SIMULATION STUDIES
APPLICATION
DISCUSSION
APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
, bracketed by a pair of flanking markers,
and
, respectively. Let A, a, Q, q, B, and b be the alleles at the three loci, respectively. All the alleles together form eight haplotypes, AQB, AQb, AqB, Aqb, aQB, aQb, aqB, and aqb. These haplotypes unite to generate a total of 36 diplotypes, AQB/AQB, AQB/AQb, ... , and aqb/aqb, where the "/" denotes the separation of the maternally and paternally derived gametes. The 36 diplotypes are collapsed into 27 zygote genotypes, each with an identical allelic combination at all the loci, and further, into 9 marker genotypes and 3 QTL genotypes. Owing to the fact that genotypes are conflated data that ignore the linkage phases of diplotypes, some of the genotypes consist of >1 diplotype. For example, all 4 diplotypes AQB/aqb, AQb/aqB, AqB/aQb, and Aqb/aQB exhibit the same genotype, AaQqBb. To express the relationship between diplotypes and genotypes, we denote by
,
, and
the manyone mapping operators taking the genotypes at all loci, the marker loci and the QTL, of a diplotype in parentheses, respectively. Thus,
, and
.
We use pAQB, pAQb, ... , paqb and PAQB/AQB, PAQB/AQb, ... , Paqb/aqb to denote the frequencies of the haplotypes AQB, AQb, ... , aqb and diplotypes AQB/AQB, AQB/AQb, ... , aqb/aqb, respectively, in the population studied. If the population is at HardyWeinberg equilibrium, we have
![]() |
![]() |
Crossing over between a pair of contiguous loci may take place during meiosis. Either recombination (R) or nonrecombination (N) between each of the pairs of adjacent loci (i.e.,
and
,
and
) will give rise to four recombination configurations described by NN, NR, RN, or RR. The frequency of a new haplotype is a function of the recombination fraction(s) associated with its recombination configuration(s). For simplicity, we here ignore crossover interference during gametogenesis. Let
AQ and
BQ be the recombination fractions between loci
and
and between
and
, respectively. The frequencies of these four configurations can be expressed in terms of
AQ and
BQ, i.e., (1
AQ)(1
BQ), (1
AQ)
BQ,
AQ(1
BQ), or
AQ
BQ corresponding to NN, NR, RN, or RR, respectively. Furthermore, the conditional probability of a zygote randomly formed by the haplotypes generated from a pair of parents is a product of the frequencies of paternally and maternally original haplotypes.
For any complex trait, either continuous or discrete, there is no oneone correspondence between genotype and phenotype. The conditional probability of observing a phenotype given a specified genotype, termed the penetrance function, is thus used to characterize the relationship between genotype and phenotype. Because the phenotype is genetically determined by the genotypes at locus
, the penetrance function, given diplotype
, can be expressed as
![]() |
is the genotypic mean of QTL genotype
and
2 is the residual variance. For a categorical trait the penetrance
is defined as the probability that individuals with genotype
manifest phenotype y. We may specify different penetrance functions to mothers, fathers, and children on the basis of the inheritance pattern of the trait under investigation. To make this presentation terser, here we assume the same penetrance for the parental and offspring generations. However, it is not difficult to recast the methodology to be applicable to the case with different penetrance functions. Mendelian trait(s) and marker(s) can be viewed as specific examples with full penetrance. Then the methodology developed hereinafter is also applicable to their analysis. In a gene-mapping study aimed at estimating parameters of penetrance, association, and position (usually measured by the recombination fractions), a major challenge is that latent data exist, also referred to as missing data, that cannot be directly observed, such as disease genotype, diplotype, and recombination configuration. We hypothesize the observed data, i.e., marker genotypes and phenotypes, together with the latent data, i.e., diplotypes and recombination configurations, as complete data, also termed augmented data. Correspondingly, the observed data alone are called incomplete data. The observed data can be viewed as mixtures of complete data and then we can use a mixture model to tackle the issue of parameter estimation.
The complete data likelihood:
Denote marker, diplotype/haplotype, recombination configuration, and phenotype data by M,
/
,
, and y, respectively. Observed marker and phenotypic data are in boldface type while the missing data for parent and child diplotypes and child recombination configurations are in script type. We first use nuclear family data, in which there is no phenotypic covariance between parents and children, to demonstrate parameter estimation within a unified framework of interval mapping and LD mapping, and then extend the method to general pedigree data.
With N unrelated nuclear families randomly drawn from a general population, the overall likelihood is the product of individual family likelihoods, denoted L1, L2, ... , LN. Let us present an example to demonstrate how to build the likelihood function. In the example, family i consists of a mother with diplotype AQB/AQB (
) and phenotype yim, a father with AQB/Aqb (
) and yif, and two children with diplotypes and recombination configurations AQB/AQB and NN/RN (
) and AQB/AQb and NN/NR (
), respectively, and phenotypes
and
, respectively. The likelihood can be expressed by a three-level hierarchical model,
![]() |
is the vector of unknown parameters containing three subsets of population genetic parameters (haplotype frequencies,
P), penetrance parameters (e.g., genotypic values and the residual variance,
Q), and position parameters (recombination fractions,
R), related to the parental diplotype distribution, the phenotype density functions, and
, respectively.
represents the conditional probability of child j of family i having diplotype
and recombination configuration
given parental diplotypes
and
. The overall likelihood can be represented as
![]() | (1) |
's are the diplotype vectors;
is the recombination configuration for the children; Ni is the number of children within family i; nAQB, nAQb, ... , naqb are the numbers of haplotypes AQB, AQb, ... , aqb appearing in parental diplotypes, respectively;
and
are the numbers of recombinants and nonrecombinants between loci
and
existing in the recombination configurations, respectively; and
and
are those between
and
, respectively. In many cases, information is partial because of experimental errors, financial limitations, or other practical constraints, as often occurs in studies of late-onset diseases such as Alzheimer's disease where parents are unavailable. Since missing phenotypic observations can be treated by simply setting the corresponding f(y|D)'s equal to 1 wherever they occur in the above likelihood, Equation 1 automatically covers the likelihoods of family data with missing phenotypes like TDT-type data. For data with missing diplotypes such as sibship data, instead of Equation 1 we can use a form of mixture model summing over all plausible diplotypes and/or recombination configurations compatible with the available data to represent such likelihoods and so address the statistical analysis within the EM framework described in The incomplete data likelihood section.
Equation 1 can be generalized to the case of N pedigrees,
![]() | (1') |
![]() |
represents the probability of either child j given the parental diplotypes or founder j within pedigree i;
and
are the parental diplotypes of nonfounder j within pedigree i, respectively; yF and yN are the founder and nonfounder phenotypic vectors;
and
are the founder and nonfounder diplotype vectors;
is the recombination configuration for nonfounders, respectively; NiF and NiN are the numbers of founder(s) and nonfounder(s) within pedigree i; nAQB, nAQb, ... , naqb are the numbers of haplotypes AQB, AQb, ... , aqb appearing in founder diplotypes, respectively; and
,
, and
are the numbers of recombinants and nonrecombinants between loci
and
and between
and
across all N pedigrees, respectively.
The maximum-likelihood estimator can be derived through differentiating the log-likelihood with respect to
and then setting each derivative equal to 0 and solving the set of simultaneous equations. Define the identity indicators
![]() |
![]() | (2) |
|
|
{QQ, Qq, qq}; and indicators I(y = yim), I(y = yif), and I(y = yijo) are 1 when y = yim, y = yif, and y = yijo, respectively, and 0 otherwise. The MLEs of the recombination parameters for likelihood (1') are the same as those for likelihood (1), and the other MLEs have similar forms,
![]() | (2') |
![]() |
Unlike the traditional approach, for flexibility we make here no assumption such as that the recombination fraction between the two markers can be known a priori. If the recombination fraction between the two markers (
AB) is available, however, the corresponding terms with respect to one of the recombination fractions,
AQ and
BQ, will disappear from the above estimation procedure since any one of the two is a function of the other one and of
AB. A grid search procedure can also be used for estimating QTL position on the basis of the preceding methodology.
The incomplete data likelihood:
In practice, only marker genotype and phenotype data are observed, whereas the data on diplotypes, recombination events, and QTL genotypes are hidden. The observed data are mixtures of component complete data, and the statistical analysis becomes a typical mixture issue. Let us go back to the above example again and assume that only marker genotypes AABB (Mim), AABb (Mif), AABB (
), and AABb (
) and phenotypes
, and
are available for the mother, father, and two children of family i, respectively. Now
is a mixture of diplotypes AQB/AQB, AQB/AqB, and AqB/AqB; and Mif is composed of diplotypes AQB/AQb, AQB/Aqb, AQb/AqB, and AqB/Aqb; and both
and
also consist of unidentified diplotype(s) together with recombination configuration(s) nested within the paired parental diplotypes. The likelihood can be formulated as
![]() |
denotes summation over all recombination configuration(s) by taking "*" as either recombination or nonrecombination that is compatible with parent and child diplotypes; Li(AQB/AQB, AQB/AQb), Li(AQB/AQB, AQB/Aqb), ... , are probabilities of the mother and father of family i with diplotypes AQB/AQB and AQB/AQb, AQB/AQB and AQB/Aqb, ... , respectively; and
denotes summation over all pairs of
compatible with the observed marker phenotypes in family i. The partial derivative of the log-likelihood of family i is
![]() |
and
are the posterior probabilities that the mother and father of family i have diplotypes
and
and that child j from family i has diplotype
and reduced recombination
produced by the mother and father diplotypes
and
, respectively; e.g.,
![]() |
![]() |
![]() | (3) |
Differentiating the log-likelihood of Equation 3 leads to
![]() | (4) |
, and
are the expected numbers of recombinants and nonrecombinants between
and
and between
and
, respectively; and sums are taken over all diplotypes and recombination configurations consistent with the marker genotypes.
Similarly, the pedigree-based likelihood is
![]() | (3') |
denotes summation over all diplotype(s) and/or recombination configuration(s) compatible with the observed data. The partial derivative is
![]() | (4') |
We implement the EM algorithm (DEMPSTER et al. 1977) to estimate the parameters of the likelihood function, i.e., haplotype frequencies
P, QTL genotypic effects and residual variance or penetrances
Q, and recombination fractions
R. In the E-step, we update the posterior probabilities and expected numbers conditional on the initial values or the estimates of the current iteration. In the M-step, substituting expected numbers nAQB*, nAQb*, ... , naqb*,
, and
and posterior probabilities
, and
for
,
, and
in Equations 2 for likelihood (3), respectively, G
{QQ, Qq, qq}, we compute the next cycle of MLEs of the unknown parameters. Likewise, we perform a similar M-step in (2') for the pedigree-based likelihood (3'). These two steps are repeated until convergence is attained. Allele frequencies and linkage disequilibria, QTL additive and dominance effects, and relative locations on the chromosome can be calculated from the haplotype frequencies, QTL genotypic effects, and recombination fractions, respectively.
The asymptotic variancecovariance matrix of the MLEs:
LOUIS' (1982) procedure or the supplemented EM (SEM) (MENG and RUBIN 1991) that embeds the computation of the observed information within the EM iteration can be adopted to obtain the asymptotic variancecovariance matrix for MLEs of haplotype frequencies, genotypic effects, residual variance, and recombination fraction(s). In our computer program we use the improved equations of LOUIS' (1982) method, LOU et al.'s (2005) (C3) and (C5), to compute the observed information matrix of the parameters (i.e., the haplotype frequencies, penetrance or genotypic effects plus residual variance, and recombination fractions). The information on other parameters can be calculated with that for these basic parameters. The variancecovariance matrix for genetic effects and allele frequencies can be calculated easily since they are linear functions of the haplotype frequencies or genotypic effects. The approximate variances of the linkage disequilibria can be found by the delta method, based on their Taylor series expansions. If the parameter vector
is a function of the basic parameter
, i.e.,
= f(
), then the approximate variancecovariance of
is given by
![]() | (5) |
![]() |
![]() |
Hypothesis testing:
The following hypotheses are tested sequentially: (1) the existence of a trait gene and (2) various submodel hypotheses. The existence of a trait gene with significant effects can be tested by calculating a log-likelihood ratio (LR) test statistic under the null (H0: there is no trait-causing gene) and alternative hypotheses (H1: there is a trait-causing gene) as
![]() |
![]() |
2-distributed with corresponding degrees of freedom for a fixed set of frequencies and relative position of the putative gene. However, because these are nuisance parameters under H0, the regularity conditions required for the
2-distribution of the LR statistic are violated. Parametric or nonparametric bootstrap (e.g., the permutation procedure proposed by CHURCHILL and DOERGE, 1994) can be adopted to determine a critical threshold for declaring the presence of a gene at a given significance level.
After rejecting the hypothesis of no gene, the tests for particular subsets of hypotheses regarding gene action mode, gene position, and/or LD coefficient(s) can be conducted in tandem with the corresponding LR statistics that are approximately distributed as
2-statistics with degrees of freedom equal to the relevant numbers of parameters being tested.
Benefiting from making full use of both complementary components of information on correlated transmission within pedigrees and correlated occurrence at the population level, the proposed approach is expected to have greater analytical accuracy and testing power. To validate our theoretical expectation, we conducted a series of simulations under a variety of disease models and degrees of LD to compare the performance of three methods: an FBAT, pure linkage (PL) analysis, and the combined linkage and association analysis (LLD).
ABSTRACT
MODEL AND METHOD
>SIMULATION STUDIES
APPLICATION
DISCUSSION
APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
To confirm that PL analysis may result in a biased estimation in the presence of association while our new approach can remedy this limitation, we first conducted a set of simulations for a comparison between LLD and PL. Although such a case may represent an extreme one, for full exposition we concentrate here on a completely penetrant codominant disease model and, theoretically, the general conclusions from this will also be valid for complex models. TDT-type (including parent and child marker genotypes and child phenotypes) and sib-type (including sibling marker genotypes and phenotypes) data were simulated on a sample consisting of 300 nuclear families with two children and 200 families with three children at two LD levels,
= 0.1 (normalized LD,
' = 0.417) and
= 0.2 (
' = 0.833), and two linkage levels,
= 0.05 and
= 0.2, respectively. Only the results on the MLE and MSE of the recombination fraction are shown in Table 2, since the MLEs of the other parameters, such as allele frequencies and LD coefficient (for LLD), have an excellent accuracy and the statistical power is very high. Table 2 shows that PL yields a large bias (
) in both TDT-type and sib-type designs. For example, the bias and the root MSE of the estimated recombination fraction are 0.065 and 0.069 for TDT-type design and 0.149 and 0.150 for sib-type design, respectively, when true parameters are
= 0.2 and
= 0.2. This implies that the result from linkage-only analysis is less reliable when association is present. As expected, however, LLD has highly precise estimation. All the absolute values of the bias from LLD are <5% of the parameter values, a conventional criterion for unbiased estimation, and all the MSEs are much less than their counterparts from PL. The bias and the root MSE are 0.001 and 0.017 for the TDT-type design and 0.001 and 0.023 for the sib-type design, respectively, when
= 0.2 and
= 0.2.
|
To demonstrate that LLD can give an unbiased estimate of the recombination fraction and further test an arbitrary null hypothesis, say H0:
= 0.1, in such a way that it has an advantage over FBATs in being capable of identifying tight linkage, we carried out simulations on the basis of a classic TDT-type design consisting of 500 nuclear families with a single child per family under a fully penetrant codominant model. As before, we considered two LD levels,
= 0.1 (
' = 0.417) and
= 0.2 (
' = 0.833), and two tight linkage levels,
= 0 and
= 0.05, respectively. Powers were calculated for the hypotheses H0:
= 0.5 and H0:
0.1, respectively, in LLD analysis. The MLE and MSE of the recombination fraction and the corresponding powers are presented in Table 3. As shown in Table 3, LLD gives an accurate estimate and high power for both null hypotheses at
' = 0.833; e.g., the bias and the root MSE are 0.007 and 0.014, the powers for both H0:
= 0.5 and H0:
0.1 are 1.0, in the case of
= 0, and the bias and the root MSE are 0.004 and 0.024, and the powers are 0.995 and 0.645, in the case of
= 0.05, respectively. LLD has reasonable estimation accuracy and test power at
' = 0.417. These results suggest that LLD can offer the possibility of distinguishing strong association and loose linkage from weak association and tight linkage, even in the case of only one child per family.
|
Next we consider a more common case where the disease gene affects a quantitative phenotype. TDT-type data were generated on a sample that consists of 300 families with two children each and 200 families with three children each under an additive model (no dominance effect, i.e., µQQ = µ + a, µQq = µ, and µqq = µ a, where µ and a are the mean and additive effect, respectively). We assumed that a marker locus is completely linked to the disease susceptibility locus but with varying degrees of LD (from 0 to 0.1) and heritability (from 0.1 to 0.4). The results of the power comparison of the three methods are summarized in Figures 1 and 2, while only the estimated parameters from LLD and PL are shown in Table 4, because the nonparametric FBAT approach cannot perform parameter estimation. Figure 1 shows power plotted against LD, where a, b, c, and d are for heritabilities 0.1, 0.2, 0.3 and 0.4, respectively. Figure 2 shows power plotted against heritability, where a, b, c, d, and e are for no LD (
= 0),
= 0.025 (
' = 0.104),
= 0.05 (
' = 0.208),
= 0.075 (
' = 0.313), and
= 0.1 (
' = 0.418), respectively.
|
|
|
Clearly, the power profiles shown in Figures 1 and 2 support our expectation. As the degree of LD increases, so does the power of the FBAT and LLD, whereas that of PL is almost unchanged or increases little (Figure 1, ad). The power also increases with heritability for most cases, but when there is no LD, the FBAT has no power regardless of the value of the heritability (Figure 2a). Generally speaking, it appears that LLD is the most powerful, followed by PL and then the FBAT when LD is absent or weak (
' <
0.2; Figure 2, a and b) or by the FBAT and then PL when LD is strong (Figure 2, ce). Other than the cases of no LD, where PL has power close to that of LLD, LLD is much more powerful than PL. Also, LLD always performs better than the FBAT, even under situations with strong LD (
'
0.313), where the power of the FBAT approaches that of LLD. This is not surprising, because more than one sibling is available in our simulations and hence, in theory, information on allele sharing between siblings should contribute to detecting linkage except for the case of no linkage, where there is no practical importance as there is no interest in testing for linkage with a type I error. The power comparison indicates that the union of two complementary components of information allows LLD to be more powerful.
Unlike FBATs, our new approach can also achieve parameter estimation for gene effects, allele frequencies, LD coefficient, and recombination fractions, so that it can provide more knowledge regarding disease etiology. Table 4 lists some typical results on the comparison between LLD and PL, but the results are not shown when LD is absent or weak, as LLD has an estimated result similar to that of PL. In the latter situation, both LLD and PL gave unbiased estimates, although LLD appeared to have slightly larger MSE, but the difference was very small. Such results are highly consistent with our expectation because the assumption of linkage equilibrium is indeed satisfied for linkage-only analysis while one needs to estimate one more unknown parameter for LLD. But for cases with slightly stronger LDs, such as
'
0.208, LLD gained much improvement in estimation accuracy, which is reflected by bias and MSE, over linkage-only analysis (see Table 4). The bias and MSE of the estimated parameters from LLD are almost uniformly less than their counterparts from PL. This is in good agreement with theoretical expectations. In general, the estimation accuracy increases with LD, and LLD improves more than PL. The magnitude of improvement differs with the various parameter values. The estimates of recombination fraction and genetic effects are greatly affected by LD level, while those of population mean and variance are less affected. In some cases, ignoring LD may result in a large bias and MSE in PL. The comparison of parameter estimation strongly indicates that it is necessary to capitalize on the information from population association to get a better and more reliable estimation.
ABSTRACT
MODEL AND METHOD
SIMULATION STUDIES
>APPLICATION
DISCUSSION
APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
The results analyzed by FBAT and our LLD approach are presented in Tables 5 and 6 for P-values, MLEs, and standard errors (SEs), respectively. The three P-values in Table 5 are for null hypotheses
= 0.5,
0.2, and
0.1, respectively, in LLD analysis, while the P-values are for
= 0.5 in FBAT. After correction for multiple testing, the LR statistic still remains highly significant (minimum P = 0.001 for H0:
= 0.5), whereas the FBAT statistic shows only marginal significance (P = 0.040) for ss12568583. Furthermore, the results of parameter estimation show that all the three SNPs are very tightly linked to the putative disease gene, i.e., have near zero estimated recombination fractions and small SEs, but different frequencies from those of this gene (see Table 6). All estimates from the three SNPs are very consistent, which indicates that, very likely, a gene responsible for BMD is located within or near the VDR gene but the genotyped SNPs do not seem to be the causal variant. The MLEs of
' are 0.045, 0.177, and 0.021 for SNPs ss12568610, ss12568583, and ss12568608, respectively, suggesting that the associations between the gene and the SNPs are weak. This may be the reason why this gene can elude most FBAT gene-hunting strategies such as QTDT and FBAT. Our approach also gave estimates of the penetrance parameters. As shown in Table 6, the gene has a large genetic effect and displays an incompletely dominant mode of inheritance. In summary, our results indicate that the VDR gene is significantly linked to that for BMD, especially for SNP ss12568583.
|
|
ABSTRACT
MODEL AND METHOD
SIMULATION STUDIES
APPLICATION
>DISCUSSION
APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
Another contribution of this article is that it shows, through systematic simulation studies and an application, the important conclusion that a mapping bonus can be obtained by combined linkage and association analysis without any increase in experimental expense. This is highly consistent with theoretical expectation. First, the improvement in mapping resolution arises from the marriage of linkage and association. In a gene-hunting context, latent data exist such as disease genotype, inheritance vector, and linkage phase. Parameter estimation and statistical inference rely on accurate genetic reconstruction of such ambiguous data, i.e., statistical imputation. Violation of the assumption of linkage equilibrium leads to inaccurate imputation in pure linkage analysis so that it may give a biased result, as demonstrated in this article. On the other hand, the assumption of linkage equilibrium also affects imputation precision, owing to its resulting in a likelihood that retains maximum uncertainty about which component of the mixture distribution generates the data, and hence is least informative for the recombination parameter
. Theoretically, integrating both complementary components increases imputation accuracy, leading to improvement in mapping accuracy, precision, and power over traditional linkage analysis. Intuitively, linkage induces more gene concordance between related individuals having similar phenotypes, while the opposite holds true for those with disparate phenotypes. Incorporating this information, which FBATs fail to do, gives our LLD approach a higher power than that of FBATs. This type of phenomenon has also been widely observed with comparisons of the TDT, the conventional affected sib pairs, and combined methods (HUANG and JIANG 1999; WICKS and WILSON 2000; LAZZERONI 2002). Both our simulation and real data studies support our theoretical expectation.
Second, the improvement may also come from two other potential sources, although they are not explored in this article. The LLD approach integrates population-based association analysis and pedigree-based linkage analysis into a coherent framework so that it can handle diverse types of data, including full sibs, half sibs, cousins, nuclear families, extended nuclear families, complex pedigrees, and singletons, as well as their mixtures. Unlike those of HUANG and JIANG (1999), WICKS and WILSON (2000), and LAZZERONI (2002), which require only affected pairs with at least one heterozygous parent, our approach allows for analyzing any type of data structure, including singletons and pedigrees without any informative meioses, which do not contribute information to linkage parameter(s) but do inform association parameter(s). Without dropping any type of mapping data, we make use of data to the maximum extent, leading to the possibility of improving mapping performance. Furthermore, our flexible framework is easily applied to a multipoint analysis. It has been well documented that multipoint analyses can extract more statistical information than pairwise ones and thus may substantially increase the power and reduce spurious results (LATHROP et al. 1984, 1985). Conceivably, unifying multilocus linkage and association mapping will further improve mapping resolution.
Our current version of the program is capable of handling five to six loci on a PC computer if only limited amounts of data are missing. Although it allows for more loci and alleles on a workstation or a PC cluster with more memory and storage, computation can be very time-consuming for a large number of loci and alleles because the required memory and time exponentially increase with the number of loci. To avoid a formidable computational burden, the simulation-based versions of the EM algorithm such as stochastic EM and Markov chain Monte Carlo (MCMC) EM (THOMPSON 1994; CELEUX et al. 1996), based on estimating conditional posterior probabilities in the E-step rather than computing them exactly, can be used. The approximate methods based a composite likelihood (e.g., RANNALA and SLATKIN 2000) also seem to be the feasible ways to tackle this problem. The relevant study is under way.
Moreover, unlike the model-free approaches such as allele sharing and FBATs, which can tell us only whether linkage or association exists but fail to provide any estimates of what values they have, the proposed LLD approach can simultaneously provide parameter estimation of genetic distance, allelic association, and genotypephenotype relationship and also perform various types of hypothesis testing. For example, we can perform a comparison of the analyses, including or not the LD information to assess the validity of the LD model assumptions. Thus, this approach exposes more genetic mechanisms than FBATs to genetic etiology and hence increases the predictability of gene mapping.
Finally, as pointed out by PÉREZ-ENCISO (2003), given the diversity of genetic architectures and population histories, it is unlikely that a single statistical approach will be valid for all cases. The approach described here is subject to the same limitations faced by all model-based methods, i.e., the requirement of a correct, or close to correct, model for the trait under study. If the model for predicting disease status from phenotypes is not sufficiently well known, this approach cannot perform well. Therefore, this model-based LLD approach should serve as a supplement to model-free methods in tracking the gene(s) underlying complex diseases, once model-free methods have suggested how many loci are involved and their approximate locations in the genome (ELSTON 1998).
ABSTRACT
MODEL AND METHOD
SIMULATION STUDIES
APPLICATION
DISCUSSION
>APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
![]() | (A1) |
, and
, respectively; and
and
0 (a specific value) are the recombination fractions between the marker and the disease gene. The partial derivative with respect to
, the score, is
|
| (A2) |
![]() |
takes, the assumed likelihood is a monotonically decreasing function of
in the interval [0, 0.5], and hence
is the MLE, even if there is no linkage; i.e.,
0 = 0.5.
ABSTRACT
MODEL AND METHOD
SIMULATION STUDIES
APPLICATION
DISCUSSION
APPENDIX
>ACKNOWLEDGEMENTS
LITERATURE CITED
ABSTRACT
MODEL AND METHOD
SIMULATION STUDIES
APPLICATION
DISCUSSION
APPENDIX
ACKNOWLEDGEMENTS
>LITERATURE CITED
ABECASIS, G. R., L. R. CARDON and W. O. COOKSON, 2000 A general test of association for quantitative traits in nuclear families. Am. J. Hum. Genet. 66: 279292.[CrossRef][Medline]
ABECASIS, G. R., S. S. CHERNY, W. O. COOKSON and L. R. CARDON, 2002 Merlinrapid analysis of dense genetic maps using sparse gene flow trees. Nat. Genet. 30: 97101.[CrossRef][Medline]
ALLISON, D. B., M. HEO, N. KAPLAN and E. R. MARTIN, 1999 Sibling-based tests of linkage and association for quantitative traits. Am. J. Hum. Genet. 64: 17541763.[CrossRef][Medline]
AMOS, C. I., 1994 Robust variance-components approach for assessing genetic linkage in pedigrees. Am. J. Hum. Genet. 54: 535543.[Medline]
CANTOR, R. M., G. K. CHEN, P. PAJUKANTA and K. LANGE, 2005 Association testing in a linked region using large pedigrees. Am. J. Hum. Genet. 76: 538542.[Medline]
CELEUX, G., D. CHAUVEAU and J. DIEBOLT, 1996 Stochastic versions of the EM algorithm: an experimental study in the mixture case. J. Stat. Comput. Simul. 55: 287314.
CHURCHILL, G. A., and R. W. DOERGE, 1994 Empirical threshold values for quantitative trait mapping. Genetics 138: 963971.[Abstract]
DEMPSTER, A. P., N. M. LAIRD and D. B. RUBIN, 1977 Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39: 138.
ELSTON, R. C., 1998 Linkage and association. Genet. Epidemiol. 15: 565576.[CrossRef][Medline]
ELSTON, R. C., and J. STEWART, 1971 A general model for the genetic analysis of pedigree data. Hum. Hered. 21: 523542.[CrossRef][Medline]
FALK, C. T., and P. RUBINSTEIN, 1987 Haplotype relative risks: an easy reliable way to construct a proper control sample for risk calculations. Ann. Hum. Genet. 51(3): 227233.[Medline]
FULKER, D. W., S. S. CHERNY, P. C. SHAM and J. K. HEWITT, 1999 Combined linkage and association sib-pair analysis for quantitative traits. Am. J. Hum. Genet. 64: 259267.[CrossRef][Medline]
GUDBJARTSSON, D. F., K. JONASSON, M. L. FRIGGE and A. KONG, 2000 Allegro, a new computer program for multipoint linkage analysis. Nat. Genet. 25: 1213.[CrossRef][Medline]
HASEMAN, J. K., and R. C. ELSTON, 1972 The investigation of linkage between a quantitative trait and a marker locus. Behav. Genet. 2: 319.[CrossRef][Medline]
HINDS, D. A., L. L. STUVE, G. B. NILSEN, E. HALPERIN, E. ESKIN et al., 2005 Whole-genome patterns of common DNA variation in three human populations. Science 307: 10721079.
HUANG, J., and Y. JIANG, 1999 Linkage detection adaptive to linkage disequilibrium: the disequilibrium maximum-likelihood-binomial test for affected-sibship data. Am. J. Hum. Genet. 65: 17411759.[CrossRef][Medline]
INTERNATIONAL HAPMAP CONSORTIUM, 2003 The International HapMap Project. Nature 426: 789796.[CrossRef][Medline]
KRUGLYAK, L., 1999 Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat. Genet. 22: 139144.[CrossRef][Medline]
KRUGLYAK, L., and E. S. LANDER, 1995 Complete multipoint sib-pair analysis of qualitative and quantitative traits. Am. J. Hum. Genet. 57: 439454.[Medline]
KRUGLYAK, L., M. J. DALY, M. P. REEVE-DALY and E. S. LANDER, 1996 Parametric and nonparametric linkage analysis: a unified multipoint approach. Am. J. Hum. Genet. 58: 13471363.[Medline]
LAIRD, N. M., S. HORVATH and X. XU, 2000 Implementing a unified approach to family-based tests of association. Genet. Epidemiol. 19(Suppl. 1): S36S42.
LANDER, E. S., and P. GREEN, 1987 Construction of multilocus genetic linkage maps in humans. Proc. Natl. Acad. Sci. USA 84: 23632367.
LANGE, K., and R. C. ELSTON, 1975 Extensions to pedigree analysis I. Likelihood calculations for simple and complex pedigrees. Hum. Hered. 25: 95105.[CrossRef][Medline]
LANGE, C., and N. M. LAIRD, 2002 Power calculations for a general class of family-based association tests: dichotomous traits. Am. J. Hum. Genet. 71: 575584.[CrossRef][Medline]
LANGE, C., D. L. DEMEO and N. M. LAIRD, 2002 Power and design considerations for a general class of family-based association tests: quantitative traits. Am. J. Hum. Genet. 71: 13301341.[CrossRef][Medline]
LATHROP, G. M., and J. M. LALOUEL, 1984 Easy calculations of lod scores and genetic risks on small computers. Am. J. Hum. Genet. 36: 460465.[Medline]
LATHROP, G. M., J. M. LALOUEL, C. JULIER and J. OTT, 1984 Strategies for multilocus linkage analysis in humans. Proc. Natl. Acad. Sci. USA 81: 34433446.
LATHROP, G. M., J. M. LALOUEL, C. JULIER and J. OTT, 1985 Multilocus linkage analysis in humans: detection of linkage and estimation of recombination. Am. J. Hum. Genet. 37: 482498.[Medline]
LAZZERONI, L. C., 2002 Allele sharing and allelic association I: sib pair tests with increased power. Genet. Epidemiol. 22: 328344.[Medline]
LAZZERONI, L. C., and K. LANGE, 1998 A conditional inference framework for extending the transmission/disequilibrium test. Hum. Hered. 48: 6781.[CrossRef][Medline]
LI, M., M. BOEHNKE and G. R. ABECASIS, 2005 Joint modeling of linkage and association: identifying SNPs responsible for a linkage signal. Am. J. Hum. Genet. 76: 934949.[CrossRef][Medline]
LIU, P. Y., Y. Y. ZHANG, Y. LU, J. R. LONG, H. SHEN et al., 2005 A survey of haplotype variants at several disease candidate genes: the importance of rare variants for complex diseases. J. Med. Genet. 42: 221227.
LOU, X.-Y., G. CASELLA, R. C. LITTELL, M. C. YANG, J. A. JOHNSON et al., 2003 A haplotype-based algorithm for multilocus linkage disequilibrium mapping of quantitative trait loci with epistasis. Genetics 163: 15331548.
LOU, X.-Y., G. CASELLA, R. J. TODHNUTER, M. YANG and R. WU, 2005 A general statistical framework for unifying interval and linkage disequilibrium mapping: towards high-resolution mapping of quantitative traits. J. Am. Stat. Assoc. 100: 158171.[CrossRef]
LOUIS, T. A., 1982 Finding the observed information matrix when using the EM algorithm. J. R. Stat. Soc. Ser. B 44: 226233.
MACLEAN, C. J., N. E. MORTON and S. YEE, 1984 Combined analysis of genetic segregation and linkage under an oligogenic model. Comput. Biomed. Res. 17: 471480.[CrossRef][Medline]
MENG, X.-L., and D. B. RUBIN, 1991 Using EM to obtain asymptotic variance-convariance matrices: the SEM algorithm. J. Am. Stat. Assoc. 86: 899909.[CrossRef]
O'CONNELL, J. R., and D. E. WEEKS, 1995 The VITESSE algorithm for rapid exact multilocus linkage analysis via genotype set-recoding and fuzzy inheritance. Nat. Genet. 11: 402408.[CrossRef][Medline]
OTT, J., 1974 Estimation of the recombination fraction in human pedigrees: efficient computation of the likelihood for human linkage studies. Am. J. Hum. Genet. 26: 588597.[Medline]
PÉREZ-ENCISO, M., 2003 Fine mapping of complex trait genes combining pedigree and linkage disequilibrium information: a Bayesian unified framework. Genetics 163: 14971510.
RABINOWITZ, D., and N. LAIRD, 2000 A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary missing marker information. Hum. Hered. 50: 211223.[CrossRef][Medline]
RANNALA, B., and M. SLATKIN, 2000 Methods for multipoint disease mapping using linkage disequilibrium. Genet. Epidemiol. 19(Suppl. l): S71S77.
RISCH, N., 1990 Linkage strategies for genetically complex traits. III. The effect of marker polymorphism on analysis of affected relative pairs. Am. J. Hum. Genet. 46: 242253.[Medline]
RISCH, N., and K. MERIKANGAS, 1996 The future of genetic studies of complex human diseases. Science 273: 15161517.
SHIBATA, K., T. ITO, Y. KITAMURA, N. IWASAKI, H. TANAKA et al., 2004 Simultaneous estimation of haplotype frequencies and quantitative trait parameters: applications to the test of association between phenotype and diplotype configuration. Genetics 168: 525539.
SPIELMAN, R. S., R. E. MCGINNIS and W. J. EWENS, 1993 Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am. J. Hum. Genet. 52: 506516.[Medline]
THOMPSON, E. A., 1994 Monte Carlo likelihood in genetic mapping. Stat. Sci. 9: 355366.
WARD, P. J., 1993 Some developments on the affected-pedigree-member method of linkage analysis. Am. J. Hum. Genet. 52: 12001215.[Medline]
WHITTAKER, J. C., M. C. DENHAM and A. P. MORRIS, 2000 The problems of using the transmission/disequilibrium test to infer tight linkage. Am. J. Hum. Genet. 67: 523526.[CrossRef][Medline]
WICKS, J., 2000 Exploiting excess sharing: a more powerful test of linkage for affected sib pairs than the transmission/disequilibrium test. Am. J. Hum. Genet. 66: 20052008.[CrossRef][Medline]
WICKS, J., and S. R. WILSON, 2000 Evaluating linkage and linkage disequilibrium: use of excess sharing and transmission disequilibrium methods in affected sib pairs. Ann. Hum. Genet. 64: 419432.[Medline]
XIONG, M., and L. JIN, 2000 Combined linkage and linkage disequilibrium mapping for genome screens. Genet. Epidemiol. 19: 211234.[CrossRef][Medline]
ZHAO, L. P., C. ARAGAKI, L. HSU and F. QUIAOIT, 1998 Mapping of complex traits by single-nucleotide polymorphisms. Am. J. Hum. Genet. 63: 225240.[CrossRef][Medline]
Communicating editor: C. HALEY
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
-
All Versions of this Article:
genetics.105.045781v1
172/1/647 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Lou, X.-Y.
- Articles by Li, M. D.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Lou, X.-Y.
- Articles by Li, M. D.






























