- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Cheng, R.
- Articles by Li, M. D.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Cheng, R.
- Articles by Li, M. D.
Nonparametric Disequilibrium Mapping of Functional Sites Using Haplotypes of Multiple Tightly Linked Single-Nucleotide Polymorphism Markers
Rong Cheng1,a, Jennie Z. Maa,b, Fred A. Wright2,c, Shili Lind, Xin Gaoc, Daolong Wang2,c, Robert C. Elstone, and Ming D. Liaa Department of Psychiatry, The University of Texas Health Science Center, San Antonio, Texas 78229,
b Center for Biostatistics and Epidemiology, The University of Texas Health Science Center, San Antonio, Texas 78229,
c Division of Human Cancer Genetics, Ohio State University, Columbus, Ohio 43210
d Department of Statistics, Ohio State University, Columbus, Ohio 43210
e Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio 44109
Corresponding author: Ming D. Li, Mail Code 7792, 7703 Floyd Curl Dr., San Antonio, TX 78229., lim2{at}uthscsa.edu (E-mail)
Communicating editor: Z-B. ZENG
| ABSTRACT |
|---|
As the speed and efficiency of genotyping single-nucleotide polymorphisms (SNPs) increase, using the SNP map, it becomes possible to evaluate the extent to which a common haplotype contributes to the risk of disease. In this study we propose a new procedure for mapping functional sites or regions of a candidate gene of interest using multiple linked SNPs. Based on a case-parent trio family design, we use expectation-maximization (EM) algorithm-derived haplotype frequency estimates of multiple tightly linked SNPs from both unambiguous and ambiguous families to construct a contingency statistic S for linkage disequilibrium (LD) analysis. In the procedure, a moving-window scan for functional SNP sites or regions can cover an unlimited number of loci except for the limitation of computer storage. Within a window, all possible widths of haplotypes are utilized to find the maximum statistic S* for each site (or locus). Furthermore, this method can be applied to regional or genome-wide scanning for determining linkage disequilibrium using SNPs. The sensitivity of the proposed procedure was examined on the simulated data set from the Genetic Analysis Workshop (GAW) 12. Compared with the conventional and generalized TDT methods, our procedure is more flexible and powerful.
MOST human disorders of interest likely result from the cumulative effect of alleles at multiple susceptibility loci, none of which on its own is either necessary or sufficient to cause the disease. Because of this, the classical strategies of analyzing monogenic disorders have been unsuccessful. Alternative approaches such as genome-wide linkage and association analysis have been proposed and utilized in many studies (e.g., see ![]()
![]()
![]()
![]()
![]()
![]()
Many haplotype analysis methods in the literature require phase information inferred from genotype data. However, as the number of loci increases, the information loss due to haplotype ambiguity would increase rapidly (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Haplotype frequency estimates from tightly linked multilocus genotyping data have been used for linkage-disequilibrium (LD) analysis (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
| METHODS |
|---|
We describe a nonparametric method of detecting functional SNP site(s) of haplotypes associated with the disease of interest in a LD study. The method naturally incorporates tightly linked multilocus genotype data, in the sense that multiallelic loci are utilized and the existing disequilibrium among markers (in both haplotypes that are transmitted and those that are not transmitted to affected offspring) is built into the test. The approach searches for evidence of ancestral haplotypes that are shared more often among chromosomes transmitted than among chromosomes not transmitted to affected offspring, followed by comparison of the observed data to the distribution of data expected under the null hypothesis.
EM algorithm estimation:
When a haplotype segment with certain marker(s) occurs in a chromosome transmitted to affected offspring with a frequency higher than that in the chromosome not transmitted, there exists an association of the haplotype with the disease. This finding would be considered striking and consistent with the hypothesis that the marker positions are near a disease gene or located at a disease locus (e.g., a functional site lies within the sequence of a disease gene), explaining the haplotype as being identical by descent (IBD) with that from a common founding ancestor. We consider N randomly sampled trios, each containing an affected offspring along with two parents, with each member genotyped for multiple tightly linked markers. In some trios, it may be impossible to unambiguously reconstruct haplotypes from the given genotypes. In such a design with data from an affected singleton offspring and both parents, if the haplotypes can be reconstructed from the known genotypes unambiguously, we let H be the total number of possible haplotypes and nj and n'j be numbers of a particular haplotype hj (j = 1, 2, ... , H) transmitted and not transmitted to the affected offspring, respectively (see Table 1). If the haplotypes cannot be reconstructed unambiguously, then, to fully use the information of the multiple tightly linked markers, the EM algorithm is employed to estimate the transmitted and not transmitted haplotype frequencies fj and f'j (j = 1, 2, ... , H). In Table 1, we use 2N x fj and 2N x f'j to represent directly counted numbers nj and n'j, respectively, even when some of the actual numbers are ambiguous.
|
Suppose Gi (= gi, gi,m, gi,f) is the trio for the ith family, where gi, gi,m, and gi,f represent the genotypes of the child, the mother, and the father, respectively. Let
- hu be paternal haplotype transmitted to the affected offspring
- hv be paternal haplotype not transmitted to the affected offspring
- hl be maternal haplotype transmitted to the affected offspring
- hm be maternal haplotype not transmitted to the affected offspring.
Conditional on Gi, the probability or weight to a transmitted haplotype hj from the ith family is

where (hu, hv, hl, hm)
Gi denotes the haplotype group (hu, hv, hl, hm) that is compatible with the genotype group of the ith family and the factor cjul(i) (= 0, 1, or 2) depends on the counts of haplotype frequency fj occurring in the pair of haplotype frequencies (fu, fl). The summation in the denominator is over all haplotype groups that are compatible with Gi. The w(i)j are defined as the estimated weight of the ith family when the parent(s) has haplotype (hu=j, hv) and/or (hl=j, hm), and hj is transmitted. Similarly,

is the weight for a not-transmitted haplotype hj.
Given the genotypes, the likelihood function for the data is

where

So the log-likelihood is

Next we consider estimation of the haplotype frequencies fj and f'j (j = 1, 2, ... , H). The EM algorithm (see, e.g., ![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()

In principle, the maximum-likelihood (ML) estimates of the haplotype frequencies could be found analytically by solving a set of equations with the Lagrange multipliers
1 and
2:

and the additional partial derivatives with respect to
1 and
2. We obtain the ML estimators of fj and f'j (j = 1, 2, ... , H) in the (p + 1)st EM iteration,

Initially, all haplotypes are set equally frequent, so that all possible complementary haplotype pairs are equally likely. In the (p + 1)st expectation step (E-step), the weights of the transmitted and not-transmitted haplotypes, w(i)(p+1)j and w'(i)(p+1)j, can be obtained from the given genotype information and current estimates of the haplotype frequencies,
(p)j and
'(p)j. The (p + 1)st maximization step (M-step) gives the maximum-likelihood estimates of
(p+1)j and
'(p+1)j. When the difference between the values of the haplotype estimates in the previous M-step and current one becomes less than a predetermined quantity (e.g., 10-7), the iteration is stopped and the final estimates are obtained. The estimated numbers
j (=2N x
j) and
'j (=2N x
'j) of transmitted and not-transmitted haplotypes are then used for further computing the statistic S and estimating the empirical 100(1 -
) percentiles and P value by simulation permutation (![]()
Statistic S:
The key feature of this method is encompassed by defining a statistic S(a, b) (![]()

The statistic S is chosen to reflect a striking association of all haplotypes with the disease. In other words, for an arbitrary statistic S and at position x, we search over all haplotype widths (containing x) to find the haplotype in most striking association with the disease. Under the null hypothesis, this haplotype has no association with the disease. Under the assumption that haplotypes are sampled independently, we compute a P value for the
2 contingency table test of haplotype vs. disease status and refer to the statistic S(a, b) = -log10 (P value) as S. The P value corresponds to the k x 2 standard
2 contingency table test of k (1 < k
H) unique haplotypes (with k - 1 d.f.) that begin at position a and end at b vs. disease status, under the assumption of independence. The number k of unique haplotypes will depend on the choice of a and b.
Permutation for empirical P value:
To evaluate type I error accurately, an empirical P value that appropriately corrects for the testing of multiple marker locations is obtained from the EM estimates of haplotype numbers and frequencies (
j,
'j,
j, and
'j; j = 1, 2, ... , H), by permuting the transmitted status of individual haplotypes, and computing S*(x) for each permutation (![]()
) percentile and appropriate P value for the maximum of S* over the region with multiple tightly linked markers. A computer program for the proposed method is available upon request.
| RESULTS |
|---|
We applied the proposed procedure to the simulated sequence data set of GAW 12. This data set contains computer-simulated sequence data with multiple SNP markers for seven candidate genes in 23 extended pedigrees with a total of 1497 individuals for two populations: a general and an isolated population. There were 50 replicates for each population model. We randomly sampled 10 replicates from the 50 replicates of the general population and separated them into two groups each with five replicates. Group 1 contained replicates 1, 10, 18, 42, and 48, while group 2 contained replicates 21, 23, 33, 34, and 38. For each extended pedigree in a replicate, we randomly sampled a trio with one affected offspring and two parents (affected or unaffected), for whom genotyping information was available. In total, 95 and 97 trios were in groups 1 and 2, respectively. Last, we pooled the two samples together to form sample 3, with a total of 192 trios. All the SNP variants present in a sample were counted, and only those with a frequency >1% were chosen as possible SNP sites (![]()
Theoretically, the EM algorithm can be applied to an unlimited number of loci with any number of alleles. However, in practice, implementation of this algorithm is limited by the need to store the estimated haplotype frequencies for every possible haplotype contained in the sample. These storage requirements increase exponentially with the number of loci under investigation. For example, the numbers of SNPs for candidate gene 1 are 155, 158, and 152 in samples 1, 2, and 3, respectively. If any individual is heterozygous at 150 loci, then the number of possible haplotypes in that sample is 2150. Also, as the number of markers increases, there will be an increased variance in the estimates. In this study, we first selected SNP markers at
1000-bp intervals and set the window width at 5 (i.e., m = 5; five tightly linked markers) to scan every selected SNP marker for all candidate gene sequences. The procedure of window scanning implemented in the program can be described as follows: a pattern is set up where the first window consists of markers 15 (i.e., 1
a
x
b
5) and the second window of markers 26 (i.e., 2
a
x
b
6), and the shifting of the window continues until all of the SNPs of a gene or genomic fragment have been scanned. The chromosome locations, lengths of sequence data, and number of SNPs for these candidate genes are shown in Table 2.
|
Linkage disequilibrium analysis was performed for all selected SNPs of these seven candidate genes. The distributions of the statistic S* for these candidate genes are shown in Fig 1. The highest peak locations of S* and their corresponding P values for all candidate genes are summarized in Table 3. As described in METHODS, for an arbitrary statistic S and at the position x, we searched over all haplotype widths. If we set m = 5 and x = 5, the varied pairs of (a, b) containing the x will be (1, 5), (2, 6), (3, 7), (4, 8), (5, 9), (2, 5), (3, 6), (4, 7), (5, 8), (3, 5), (4, 6), (5, 7), (4, 5), (5, 6), and (5, 5), and the statistic S calculated from the last pair of (5, 5) is equivalent to a conventional single-marker test. Some of the peak locations of S* in Table 3 contained two linked markers with an equal maximum S*, while others might contain three, four, or five linked markers. One functional SNP was identified for candidate genes 1 and 6. No functional SNP site was detected at the 0.01 significance level for the other five candidate genes (Table 3). In fact, for candidate gene 2 multiple simulated functional sites directly affected a quantitative trait, but no single functional site directly affected disease status. For the other four candidate genes, no functional sites were simulated in the original model (![]()
|
|
Next, we compared our procedure to the generalized TDT (![]()
![]()
![]()
|
Fig 3 compares the results obtained from our procedure with the generalized and conventional TDT methods for candidate gene 1 in sample 3 (N = 192 trios) under different SNP densities. A total of 41, 23, 8, and 5 SNPs were selected from the original SNPs for candidate gene 1, which gave an average SNP interval of
500, 1000, 3000, and 5000 bp, respectively. The simulated functional site (i.e., 557 bp) was included for only the first two high densities. As shown in Fig 3, the highest peaks of statistic S* detected by these methods are all at or around the 557-bp functional site. Compared to what is found with high densities, the highest peaks of statistic S*'s detected at low densities are smaller. Similarly, we compared the results from these three methods for candidate gene 6 under three different SNP densities. As shown in Fig 4, the functional site is detected by all three methods for candidate gene 6 under the SNP density of 1000- and 2000-bp intervals, with the highest peak at functional site 5782 bp for 1000-bp intervals and at SNP site 7332 bp (which is 1550 bp away from the functional 5782-bp site) for 2000-bp intervals. Maximum statistics S* obtained from our procedure are always greater than those obtained from the generalized or conventional TDT (see Fig 4A and Fig B). When 5 SNPs at
4000-bp intervals were used, no linkage disequilibrium was detected by any method, but our procedure and the generalized TDT method still produced smaller P values than those from the conventional TDT (Fig 4C).
|
|
| DISCUSSION |
|---|
To date,
2.1 million SNPs have been identified in the human genome (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
In this study, we propose a new procedure for nonparametric disequilibrium mapping at a functional site (or region) within a candidate gene by using multiple tightly linked SNPs. For an arbitrary statistic S at position x, we search over all haplotype widths (containing x) and find that some of the peak locations of the maximum statistic S* contain two linked markers while others might contain three, four, or five linked markers (see Table 3). With our procedure, researchers do not need to know prior to analysis what window width of haplotype is more appropriate for detecting a functional site or region. In contrast, the methods proposed by ![]()
![]()
m). Although it is generally thought that a multiple-locus approach may be more powerful than single-locus analysis, this does not mean that this is true in all cases. For example, for candidate gene 1 in the simulated GAW 12 data set, it seems that the single-marker approach is better than that of multiple markers. In contrast, for candidate gene 6, we found that using the haplotype transmission data of two linked markers appears to be better than using those of a single marker, or three, four, or five linked markers.
The simulation model implemented in the GAW 12 data set was rather complex (![]()
![]()
![]()
![]()
Using the procedure reported in this study, linkage disequilibrium due to the two simulated functional sites of candidate genes 1 and 6 (defined as MG6 and MG1, respectively, in the original model) were detected. Multiple alleles were in candidate gene 2; all changes in regulatory elements or in the first or second base pair of a codon leading to amino acid substitutions were functional. However, this was not the case for candidate gene 6. Furthermore, candidate gene 2 directly contributes to Q5 from multiple functional sites and then indirectly affects the affection status.
On the basis of the mapping results from our method and the generalized and conventional TDT methods on the GAW 12 simulated data set, it appears that our procedure is as sensitive and powerful as the TDT methods for candidate gene 1, but is more powerful than the TDT methods for candidate gene 6. Additionally, we investigated how SNP density may affect the performance of our procedure. A functional SNP site for candidate gene 1 was detected by all three methods, regardless of SNP density and whether the functional SNP site was included or not. In contrast, a functional SNP site for candidate gene 6 was detected only at high SNP densities, not at low density. To explain this difference, we examined selected SNPs used under each density for both genes and found that a SNP at the 189-bp position of gene 1, which is only 368 bp away from the functional SNP site, was used for all analyses. However, for candidate gene 6 at the 4000-bp intervals, the closest SNP sites used in the analysis were at the 4848- and 9952-bp positions, which are either loosely linked or too far away from the functional site (5782 bp). Regardless of which SNP density was used, overall, we found that our procedure and the generalized TDT method are less dependent on the SNP density than the conventional TDT method is. Moreover, our program is more user friendly and less time consuming than the TRANSMIT program with respect to file preparation before and after mapping analysis. For example, for candidate gene 1 under a density of 500-bp intervals, only 1 file was needed for our procedure, while for TRANSMIT we had to prepare 40, 39, 38, and 37 files for window widths of two, three, four, and five linked SNPs, respectively. Although these files can be prepared using a shell script for each selected window width, we still have to summarize a significant number of output files for comparing the results.
Theoretically, our search procedure can cover as many loci as one wishes without limitation, except for the computer storage requirement. As needed, our method can be used for regional or genome-wide scanning to study LD using SNP markers. This represents another advantage of our program compared to other currently available procedures. It should be pointed out that if genome-wide high-density SNP data were available, then the size of the data file would become significantly large. The newly proposed haplotype block partition algorithms (![]()
![]()
![]()
![]()
![]()
An alternative approach to EM in inferring haplotypes has been suggested by ![]()
![]()
![]()
![]()
2 testing in LD analysis. ![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
In the current version of our program, the missing genotype of a parent and/or the affected offspring at any SNP site is assigned a code that differs from the codes assigned to the two possible SNPs at the position. As expected, this approach will yield rare haplotypes, which may eventually affect the power of the test to detect functional sites.
The chi-square statistic is highly sensitive to small cell counts. Recently, ![]()
The number of possible haplotypes increases rapidly with the size of the searching window. To avoid very rare haplotypes, we compared sample sizes from 100 to 1000 and found that the maximum searching window width should be set in the range approximately four to eight tightly linked SNPs (equivalent to
16256 possible haplotypes). By empirical study, ![]()
![]()
![]()
![]()
An example has been given for the first five selected SNP loci of candidate gene 1 to illustrate our procedure. Table 4 shows the estimated numbers of haplotypes found by using the EM algorithm for the first five selected SNPs of candidate gene 1 and P values for the maximum statistic S* obtained from 10,000 randomized permutations in all samples. The nonancestral and ancestral sequence variants of candidate gene 1 were coded as alleles 1 and 2, respectively. On the basis of the estimated haplotype numbers, it is easy to see evidence of ancestral haplotypes that are shared more often among the haplotypes transmitted than among those not transmitted to affected offspring. In this case, allele 2 at base pair 557 is associated with the disease.
|
Recently, ![]()
![]()
![]()
The procedure proposed in this study is nonparametric and robust. For a complex disease controlled by multiple genes, the effect of a functional site accounting for as little as 4.6% of the variance in liability (e.g., candidate gene 6) could be detected with the sample size studied. Covariate factors such as gender, age, environment, the disease mode (dominant, recessive, additive, or multiplicative), and penetrance, were not considered in the proposed method. Also, the power of a haplotype-based test study depends on the data structure, the trait of interest, the polymorphism information content of marker loci, the degree of LD among the markers, the allele frequencies, the amount of population stratification, and the marker density. Our approach makes the implicit assumption that the underlying population is homogeneous. In a non-admixed population, if linkage disequilibrium exists across a region, then recombination must be quite infrequent and probably can be safely ignored. If population migration, admixture, or stratification is present, this should affect the estimates of allele or haplotype frequencies and decrease the power of LD detection. A model-based procedure implementing some appropriate parameters may then improve the power of the method. Also, it remains to be determined in a future study whether our idea can be extended to other data structures, such as nuclear families with multiple sibs and/or one or both parents missing, or to extended pedigrees.
| FOOTNOTES |
|---|
1 Present address: Columbia Genome Center, Columbia University, 1150 S. Nicholas Ave., New York, NY 10032. ![]()
2 Present address: Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599. ![]()
| ACKNOWLEDGMENTS |
|---|
The authors thank the editor, anonymous reviewers, and Dr. Hongyu Zhao for their valuable comments and suggestions on the manuscript. Also, we thank Dr. David Clayton for allowing us to use his TRANSMIT program and providing advice on using it. The use of the GAW 12 simulated data was permitted by the Southwest Foundation for Biomedical Research, which was supported by a GAW grant, GM31575 from National Institute of General Medical Sciences. We are grateful to Dr. Jean MacCluer and Ms. Vanessa Olmo for handling our request for the GAW 12 data. This research was supported in part by National Institutes of Health (NIH) grant DA12844 (to M.D.L.); GM28356, RR03655, and DK57292 (to R.C.E.); GM58934 (to F.A.W.); National Science Foundation grant DMS-9971770 (to S.L.); and a General Clinical Research Center grant from the NIH (M01-RR00211) awarded to the University of Tennessee Health Science Center.
Manuscript received October 10, 2002; Accepted for publication March 24, 2003.
| LITERATURE CITED |
|---|
ALMASY, L., J. D. TERWILLIGER, D. NIELSEN, T. D. DYER, and D. ZAYKIN et al., 2001 GAW12: simulated genome scan, sequence, and family data for a common disease. Genet. Epidemiol. 21(Suppl. 1):S332-S338.
BRODER, S. and J. C. VENTER, 2000 Sequencing the entire genomes of free-living organisms: the foundation of pharmacology in the new millennium. Annu. Rev. Pharmacol. Toxicol. 40:97-132.[Medline]
CHIANO, M. and D. CLAYTON, 1998 Fine genetic mapping using haplotypes and the missing data problem. Ann. Hum. Genet. 62:55-60.[Medline]
CHURCHILL, G. A. and R. W. DOERGE, 1994 Empirical threshold values for quantitative trait mapping. Genetics 138:963-971.[Abstract]
CLARK, A. G., 1990 Inference of haplotypes from PCR-amplified samples of diploid populations. Mol. Biol. Evol. 7:111-122.[Abstract]
CLAYTON, D. G., 1999 A generalization of the transmission/disequilibrium test for uncertain-haplotype transmission. Am. J. Hum. Genet. 65:1170-1177.[Medline]
CLAYTON, D. G. and H. JONES, 1999 Transmission/disequilibrium tests for extended marker haplotypes. Am. J. Hum. Genet. 65:1161-1169.[Medline]
COLLINS, A., C. LONJOU, and N. E. MORTON, 1999 Genetic epidemiology of single-nucleotide polymorphisms. Proc. Natl. Acad. Sci. USA 96:15173-15177.
CORDELL, H. J. and R. C. ELSTON, 1999 Fieller's theorem and linkage disequilibrium mapping. Genet. Epidemiol. 17:237-252.[Medline]
DEMPSTER, A., N. LAIRD, and D. RUBIN, 1977 Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39:1-38.
DUDBRIDGE, F., B. P. C. KOELEMAN, J. A. TODD, and D. G. CLAYTON, 2000 Unbiased application of the transmission/disequilibrium test to multilocus haplotypes. Am. J. Hum. Genet. 66:2009-2012.[Medline]
EXCOFFIER, L. and M. SLATKIN, 1995 Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol. Biol. Evol. 12:921-927.[Abstract]
EXCOFFIER, L. and M. SLATKIN, 1998 Incorporating genotypes of relatives into a test of linkage disequilibrium. Am. J. Hum. Genet. 62:171-180.[Medline]
FALK, C. T. and P. RUBINSTEIN, 1987 Haplotype relative risks: an easy reliable way to construct a proper control sample for risk calculations. Ann. Hum. Genet. 51:227-233.[Medline]
FALLIN, D. and N. J. SCHORK, 2000 Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data. Am. J. Hum. Genet. 67:947-959.[Medline]
FALLIN, D., A. COHEN, L. ESSIOUX, I. CHUMAKOV, and M. BLUMENFELD et al., 2001 Genetic analysis of case/control data using estimated haplotype frequencies: application to APOE locus variation and Alzheimer's disease. Genome Res. 11:143-151.
GABRIEL, S. B., S. F. SCHAFFNER, H. NGUYEN, J. M. MOORE, and J. ROY et al., 2002 The structure of haplotype blocks in the human genome. Science 296:2225-2229.
GAO, X. and F. A. WRIGHT, 1999 Nonparametric disequilibrium mapping when haplotypes are available. Am. J. Hum. Genet. 65(Suppl.):A250.
HAWLEY, M. E. and K. K. KIDD, 1995 HAPLO: a program using the EM algorithm to estimate frequencies of multi-site haplotypes. J. Hered. 86:409-411.
HODGE, S. E., M. BOEHNKE, and M. A. SPENCE, 1999 Loss of information due to ambiguous haplotyping SNPs. Nat. Genet. 21:360-361.[Medline]
KRUGLYAK, L., 1999 Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat. Genet. 22:139-144.[Medline]
LANDER, E. S., L. M. LINTON, B. BIRREN, C. NUSBAUM, and M. C. ZODY et al., 2001 Initial sequencing and analysis of the human genome. Nature 409:860-921.[Medline]
LAZZERONI, L. C. and K. LANGE, 1998 A conditional inference framework for extending the transmission/disequilibrium test. Hum. Hered. 48:67-81.[Medline]
LI, J., D. WANG, J. DONG, R. JIANG, and K. ZHANG et al., 2001 The power of transmission disequilibrium tests for quantitative traits. Genet. Epidemiol. 21(Suppl. 1):S632-637.
LONG, J. C., R. C. WILLIAMS, and M. URBANEK, 1995 An E-M algorithm and testing strategy for multiple-locus haplotypes. Am. J. Hum. Genet. 56:799-810.[Medline]
MARTIN, E. R., N. L. KAPLAN, and B. S. WEIR, 1997 Tests for linkage and association in nuclear families. Am. J. Hum. Genet. 61:439-448.[Medline]
MARTIN, E. R., S. A. MONKS, L. L. WARREN, and N. L. KAPLAN, 2000 A test for linkage and association in general pedigrees: the pedigree disequilibrium test. Am. J. Hum. Genet. 67:146-154.[Medline]
NIU, T., Z. S. QIN, X. XU, and J. S. LIU, 2002 Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. Am. J. Hum. Genet. 70:157-169.[Medline]
OTT, J., 1989 Statistical properties of the haplotype relative risk. Genet. Epidemiol. 6:127-130.[Medline]
PATIL, N., A. J. BERNO, D. A. HINDS, W. A. BARRETT, and J. M. DOSHI et al., 2001 Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294:1719-1723.
QIN, Z. S., T. NIU, and J. S. LIU, 2002 Partition-ligation-expectation-maximization algorithm for haplotype inference with single-nucleotide polymorphisms. Am. J. Hum. Genet. 71:1242-1247.[Medline]
REICH, D. E., M. CARGILL, S. BOLK, J. IRELAND, and P. C. SABETI et al., 2001 Linkage disequilibrium in the human genome. Nature 411:199-204.[Medline]
RISCH, N. J., 2000 Searching for genetic determinants in the new millennium. Nature 405:847-856.[Medline]
SACHIDANANDAM, R., D. WEISSMAN, S. C. SCHMIDT, J. M. KAKOL, and L. D. STEIN et al., 2001 A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409:928-933.[Medline]
SELTMAN, H., K. ROEDER, and B. DEVLIN, 2001 Transmission/disequilibrium test meets measured haplotype analysis: family-based association analysis guided by evolution of haplotypes. Am. J. Hum. Genet. 68:1250-1263.[Medline]
SCHAID, D. J., 1996 General score tests for associations of genetic markers with disease using cases and their parents. Genet. Epidemiol. 13:423-449.[Medline]
SCHAID, D. J., C. M. ROWLAND, D. E. TINES, R. M. JACOBSON, and G. A. POLAND, 2002 Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am. J. Hum. Genet. 70:425-434.[Medline]
SHAM, P., 1997 The transmission/disequilibrium tests for multiallelic loci. Am. J. Hum. Genet. 61:774-778.[Medline]
SHAM, P. C. and D. CURTIS, 1995 An extended transmission/disequilibrium test (TDT) for multi-allele marker loci. Ann. Hum. Genet. 59:323-336.[Medline]
SLATKIN, M. and L. EXCOFFIER, 1996 Testing for linkage disequilibrium in genotypic data using the EM algorithm. Heredity 76:377-383.
SPIELMAN, R. S. and W. J. EWENS, 1996 The TDT and other family-based tests for linkage disequilibrium and association. Am. J. Hum. Genet. 59:983-989.[Medline]
SPIELMAN, R. S. and W. J. EWENS, 1998 A sibship test for linkage in the presence of association: the sib transmission/disequilibrium test. Am. J. Hum. Genet. 62:450-458.[Medline]
SPIELMAN, R. S., R. E. MCGINNIS, and W. J. EWENS, 1993 The transmission test for linkage disequilibrium: the insulin gene and insulin-dependent diabetes mellitus (IDDM). Am. J. Hum. Genet. 52:506-516.[Medline]
STEPHENS, M., N. J. SMITH, and P. DONNELLY, 2001 A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet. 68:978-989.[Medline]
SUN, F., W. D. FLANDERS, Q. YANG, and M. J. KHOURY, 1999 Transmission disequilibrium test (TDT) when only one parent is available: the 1-TDT. Am. J. Epidemiol. 150:97-104.
TENG, J. and N. RISCH, 1999 The relative power of family-based and case-control designs for linkage disequilibrium studies of complex human diseases. II. Individual genotyping. Genome Res. 9:234-241.
TERWILLIGER, J. D. and J. OTT, 1992 A haplotype-based "haplotype relative risk" approach to detecting allelic associations. Hum. Hered. 42:337-346.[Medline]
THOMSON, G., 1995 Mapping disease genes: family-based association studies. Am. J. Hum. Genet. 57:487-498.[Medline]
THOMSON, G., W. P. ROBINSON, M. K. KUHNER, S. JOE, and W. KLITZ, 1989 HLA, insulin gene, and Gm associations with IDDM. Genet. Epidemiol. 6:155-160.[Medline]
TISHKOFF, S. A., A. J. PAKSTIS, G. RUANO, and K. K. KIDD, 2000 The accuracy of statistical methods for estimation of haplotype frequencies: an example from the CD4 locus. Am. J. Hum. Genet. 67:518-522.[Medline]
TOIVONEN, H. T., P. ONKAMO, K. VASKO, V. OLLIKAINEN, and P. SEVON et al., 2000 Data mining applied to linkage disequilibrium mapping. Am. J. Hum. Genet. 67:133-145.[Medline]
VENTER, J. C., M. D. ADAMS, E. W. MYERS, P. W. LI, and R. J. MURAL et al., 2001 The sequence of the Human Genome. Science 291:1304-1351.
WILSON, S. R., 1997 On extending the transmission/disequilibrium test (TDT). Ann. Hum. Genet. 61:151-161.[Medline]
XIONG, M. and S. W. GUO, 1997 Fine-scale genetic mapping based on linkage disequilibrium: theory and applications. Am. J. Hum. Genet. 60:1513-1531.[Medline]
ZHANG, K., M. DENG, T. CHEN, M. S. WATERMAN, and F. SUN, 2002a A dynamic programming algorithm for haplotype block partitioning. Proc. Natl. Acad. Sci. USA 99:7335-7339.
ZHANG, K., P. CALABRESE, M. NORDBORG, and F. SUN, 2002b Haplotype block structure and its applications to association studies: power and study designs. Am. J. Hum. Genet. 71:1386-1394.[Medline]
ZHANG, S. L., A. J. PAKSTIS, K. K. KIDD, and H. Y. ZHAO, 2001 Comparisons of two methods for haplotype reconstruction and haplotype frequency estimation from population data. Am. J. Hum. Genet. 69:906-911.[Medline]
ZHAO, H. Y., S. L. ZHANG, K. R. MERIKANGAS, M. TRIXLER, and D. B. WILDENAUER et al., 2000 Transmission/disequilibrium tests using multiple tightly linked markers. Am. J. Hum. Genet. 67:936-946.[Medline]
ZHENG, C. and R. C. ELSTON, 1999 Multipoint linkage disequilibrium mapping with particular reference to the African-American population. Genet. Epidemiol. 17:79-101.[Medline]
ZHU, X. and R. C. ELSTON, 2000 Power comparison of regression methods to test quantitative traits for association and linkage. Genet. Epidemiol. 18:322-330.[Medline]
ZHU, X. and R. C. ELSTON, 2001 Transmission/disequilibrium tests for quantitative traits. Genet. Epidemiol. 20:57-74.[Medline]
This article has been cited by other articles:
![]() |
Y. V. Sun, D. M. Jacobsen, and S. L. R. Kardia ChromoScan: a scan statistic application for identifying chromosomal regions in genomic studies Bioinformatics, December 1, 2006; 22(23): 2945 - 2947. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. S. Orloff, S. K. Iyengar, C. A. Winkler, K. A. B. Goddard, R. A. Dart, T. S. Ahuja, M. Mokrzycki, W. A. Briggs, S. M. Korbet, P. L. Kimmel, et al. Variants in the Wilms' tumor gene are associated with focal segmental glomerulosclerosis in the African American population Physiol Genomics, April 14, 2005; 21(2): 212 - 221. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Lin, A. Chakravarti, and D. J. Cutler Haplotype and Missing Data Inference in Nuclear Families Genome Res., August 1, 2004; 14(8): 1624 - 1632. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Cheng, R.
- Articles by Li, M. D.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Cheng, R.
- Articles by Li, M. D.
S(a, b))





