Abstract
In association studies searching for genes underlying complex traits, the results are often inconsistent, and population admixture has been recognized qualitatively as one major potential cause. Hardy-Weinberg equilibrium (HWE) is often employed to test for population admixture; however, its power is generally unknown. Through analytical and simulation approaches, we quantify the power of the HWE test for population admixture and the effects of population admixture on increasing the type I error rate of association studies under various scenarios of population differentiation and admixture. We found that (1) the power of the HWE test for detecting population admixture is usually small; (2) population admixture seriously elevates type I error rate for detecting genes underlying complex traits, the extent of which depends on the degrees of population differentiation and admixture; (3) HWE testing for population admixture should be performed with random samples or only with controls at the candidate genes, or the test can be performed for combined samples of cases and controls at marker loci that are not linked to the disease; (4) testing HWE for population admixture generally reduces false positive association findings of genes underlying complex traits but the effect is small; and (5) with population admixture, a linkage disequilibrium method that employs cases only is more robust and yields many fewer false positive findings than conventional case-control analyses. Therefore, unless random samples are carefully selected from one homogeneous population, admixture is always a legitimate concern for positive findings in association studies except for the analyses that deliberately control population admixture.
COMPLEX traits refer to diseases and quantitative traits with complex and multiple genetic and environmental determinations. Association studies that depend on linkage disequilibrium between markers and genes underlying complex traits have helped to decipher some genetic basis of variation of quantitative traits and the differential susceptibility to complex diseases (e.g., Chagnonet al. 1998). In association studies, usually, case-control analyses have been employed for complex diseases by comparing genotype or allele frequencies of candidate genes in unrelated cases and controls (e.g., Blum et al. 1990, 1991; Holden 1994). For quantitative traits, analyses of variance are usually conducted for random individuals to test the difference of the trait means among different genotypes or alleles (e.g., Boerwinkle et al. 1986, 1987; Denget al. 1999; Page and Amos 1999). However, despite extensive efforts, the results of independent association studies often fail to reach consensus and result in controversy. Such examples are the association between the dopamine D2 receptor gene and alcoholism (Blum et al. 1990, 1991; Gelernteret al. 1993; Patoet al. 1993; Holden 1994) and the association between vitamin D receptor genotypes and bone mass (Morrisonet al. 1994; Eisman 1995; Peacock 1995; Gonget al. 1999).
One of the most important causes that may underlie the inconsistent results from association studies is population admixture (Chakraborty and Smouse 1988; Lander and Schork 1994; Weir 1996; Deng and Chen 2000). If a population is composed of a recent admixture of different ethnic groups that differ in marker allele frequencies and disease frequencies (or the quantitative trait means), spurious associations may result between the marker genotypes (or alleles) and the complex traits. However, although the qualitative effect of population admixture has long been well recognized, the quantitative effects of population admixture under various degrees of admixture and population differentiation in the marker allele frequencies and disease frequencies have rarely been systematically investigated and are largely unknown. Investigation of such detailed effects is necessary to assess quantitatively the impact of population admixture on association studies and the general utility of the association study approach to identifing genes underlying complex traits. The results from such quantitative studies may also be useful in assessing the robustness of the results from association studies in relation to the samples employed.
Family-based analyses such as the transmission disequilibrium test (TDT; Spielmanet al. 1993) have been developed specifically to control for population admixture in association studies to identify genes underlying complex traits. However, compared with the case-control studies that employ random samples of individual cases and controls, the samples for the family-based studies such as the TDT are generally much more difficult to obtain. This is particularly true in light of the fact that only nuclear families (parents and children) with at least one parent heterozygous at marker loci are eligible for the TDT type analyses. Therefore, case-control studies are still commonly used (e.g., Denget al. 1999) and advocated (e.g., Risch and Teng 1998) with the hope that carefully selected and tested (through Hardy-Weinberg equilibrium, see below) samples may come from a homogeneous population and thus population admixture is not a concern.
It is well known that population admixture can lead to deviation of genotype frequencies from what are expected on the basis of the Hardy-Weinberg (HW) law (Crow and Kimura 1970). It has been proposed that the HW equilibrium (HWE) test should be routinely performed at the candidate gene(s) as a method for assessing the potential population admixture (Tiret and Cambien 1995) in association studies with an aim of effectively reducing false positive findings of genes underlying complex traits. Testing HWE at candidate gene(s) is also a common practice in association studies to provide the evidence that population admixture is weak or absent (e.g., Denget al. 1999). However, the critical questions are the following: What is the power of the HWE test in detecting population admixture? How do various degrees of population admixture and population differentiation affect the power of the HWE test? What samples and markers should be employed to test the HWE for population admixture? Most importantly, how useful is the HWE test in association studies for reducing the rate of false positive findings of genes underlying complex traits? Additionally, the HWE test is also an important and general tool in population and evolutionary genetics in validating the assumptions of the HWE, such as random mating (e.g., Hebert 1987; Lynch and Spitze 1994; Deng and Lynch 1996), although the usefulness and the power of this important tool is generally unknown.
In this article, through analytical and/or computer simulation approaches, first we quantify the power of the HWE test under various degrees of population differentiation (as reflected by different population allele and disease frequencies) and various degrees of population admixture (as reflected by the different proportions that populations admix). Second, we quantify the effects of various degrees of population differentiation and population admixture on the outcome of association studies. Two types of analyses for complex diseases [a conventional one that employs cases and controls and a recently developed one (Federet al. 1996; Nielsenet al. 1999; Denget al. 2000) that employs cases only] are investigated. Third, we examine choices of samples and markers for the HWE test to detect population admixture. Finally, we investigate the utility of testing HWE for population admixture in association studies for reducing the error rate of false positive findings of genes underlying complex traits.
THEORY AND METHODS
In this section, we first present our theoretical investigation and then outline our simulation methods. For simplicity, we focus our investigation on association studies of complex diseases in a population (P) admixed of two differentiated large subpopulations (P1 and P2). In the P1 and P2 populations, HWE holds at a marker locus in which alleles can be classified into two classes, M and m. The frequencies of M in P1 are f1 and in P2 are f2. The disease prevalences are, respectively, φ1 in population P1 and φ2 in P2. The disease and the marker locus are not associated by any cause in the P1 and P2 populations. A proportion k of individuals in population P come from population P1; the rest (1 — k) come from P2. The frequencies of the M allele (f) and the disease (φ) in population P are then, respectively, f = kf1 + (1 — k)f2 and φ = kφ1 + (1 — k)φ2.
The power of HWE test for population admixture at marker loci: To focus our investigation on the power of the HWE test for population admixture, we assume that in population P, HW disequilibrium is entirely due to the population admixture. The HW disequilibrium can be measured by the deviations of genotype frequencies from those expected under the HWE (Weir 1996),
Differentiation between populations can be measured by various indices in population genetics (Crow 1983; Hartl and Clark 1989). One frequently employed index is the GST (Nei 1975; Crow 1983), which measures the relative reduction of heterozygosity (H) due to isolation of differentiated populations—the well-known Wahlund phenomenon (Hartl and Clark 1989). It can be shown (appendix b) that
The effects of admixture of differentiated populations on the outcome of association studies: To focus on quantifying the effects of admixture on association studies, we assume that the marker locus does not underlie the disease susceptibility in populations P1 and P2 and any association between the marker locus and the disease in population P will be entirely due to the admixture of the two differentiated populations.
Two types of tests are investigated, both depending on the basis that the marker locus is a disease gene per se or that it is in linkage disequilibrium with a disease gene. The first one is the χ2-test employed in the conventional case-control studies (Weir 1996) to test for the association between frequencies of marker alleles and diseases. The null hypothesis is that the distributions of marker allele frequencies are the same in the cases (individuals with the disease) and controls (individuals without the disease). The test statistic is
The null hypothesis to be tested in association studies for disease genes is that the marker alleles are not causally associated with the disease; i.e., the marker locus and a disease gene are not linked. For this null hypothesis, the power of the
The second analysis investigated is developed by Feder et al. (1996) and Nielsen et al. (1999). Deng et al. (2000) extended this method for fine-mapping QTL. For a complex disease, this method can be employed to test the association of a marker locus and a disease by testing for HWE in the cases only (Nielsenet al. 1999). The power of this test and several other linkage disequilibrium tests has been compared (Denget al. 2000). Here, we investigate the effects of population admixture on the type I error rate of this test of HWE (
The dependency of the type I error of the
From Equations 3 and 8, we can obtain the following relationship between the noncentrality parameter for the test of HWE in random population samples for detecting admixture and that in the cases for only detecting linkage disequilibrium between a marker locus and disease genes. Assume that both tests employ the same sample sizes,
Choice of population samples and marker loci for the HWE test: Ideally, random samples (for any locus) or marker loci not associated with the disease (for any sample) should be employed to detect population admixture (
To investigate these questions, we performed two types of simulations. The first type is to investigate the power to detect HW disequilibrium with cases and/or controls in large randomly mating populations, when the marker locus is at or closely linked to a disease susceptibility locus. For the null hypotheses of no population admixture, this power is in fact the rate of false positive findings (type I error rate, ϵ) for HW disequilibrium that is due to nonrandom choices of samples (cases and/or controls) and the statistical sampling error (a prespecified significance level, α). The simulation procedures are detailed in Nielsen et al. (1999) and Deng et al. (2000). Briefly, evolving populations segregating for a biallelic disease locus and biallelic marker loci are simulated. We consider a set of marker loci that are positioned at every 0.20 cM and span 0–2 cM on one side of the disease locus and a marker locus that is not linked to the disease locus (with the additive or recessive model, Figure 3, a and b). The disease prevalence in the population is 0.08. In simulations, recombinations between the QTL and marker loci are independent; i.e., there is no interference. The recombination rate is obtained from the physical distance between the disease locus and the marker locus using Haldane's map function (Ott 1991). Under a specific genetic model, the population started at the 0th generation with complete association between allele A1 at the disease locus (with frequency 0.1) and a marker allele M (with frequency 0.2). Then the population evolved for 50 generations under random mating and genetic drift. The population size is 15,000. The genetic drift under such a population size is extremely small (Crow and Kimura 1970; Denget al. 2000). At the end of the simulation, 200 cases and 200 controls are sampled from the population. Then the
The second type of simulations is to compare the power to detect population admixture with controls only and that with both cases and controls in an admixed population. A population P admixed with P1 (with f1 = 0.1, φ1 = 0.1) and P2 (with f1 = 0.3, φ2 = 0.3) is simulated with various k in Figure 3c. In Figure 3d, a population P (f = 0.2) is simulated from admixture (k = 0.5) of two populations P1 and P2 that have allele M frequencies that differ by δf. In the P1 and P2 populations, the disease and the marker are not associated by any means. A total of 200 controls and 200 cases are sampled from the P population. Then the
Testing HWE for population admixture in reducing false positive findings in association studies: In the first two sections, through the analytical approach, we study separately the power of the HWE test for population admixture and the effects of admixture of differentiated populations on elevating the type I error rate in association studies. In this section, through computer simulations, we investigate the effect of testing HWE at candidate genes for population admixture in association studies, a practice (e.g., Denget al. 1999) and a recommendation (Tiret and Cambien 1995) for reducing false positive findings due to admixture. We also corroborate with our simulations the analytical results obtained in the first two sections.
A population (P) admixed of two differentiated populations (P1 and P2) is simulated. A proportion k of individuals of the population P comes from P1 and the rest from P2. In the P1 and P2 populations, HWE holds at the marker locus in which alleles can be classified into two classes M and m. The frequencies of M in P1 are f1 and in P2 are f2. The disease prevalences are, respectively, φ1 in population P1 and φ2 in P2. The disease and the marker locus are independent in the P1 and P2 populations. For a specific parameter set, a sample of 3N individuals is simulated from the P population, with 2N cases and N controls. The HWE test (Equation 2) is performed to detect population admixture with the N controls only (see results, Choices of population samples and marker loci for HWE test). If the test is significant, further testing of association between the marker and the disease will not be pursued to avoid confounding of association results due to admixture. If the test is not significant, i.e., if the test fails to reveal population admixture, tests of association that employ N random cases and N controls (Equation 5) and those that employ 2N cases (Equation 7) are conducted. This sampling scheme ensures that the test of HWE is performed on the basis of the same sample of controls for the test employing cases and controls (Equation 5) and the tests that are based on cases only (Equation 7). It also ensures that the two tests of associations have the same sample sizes of 2N so that the comparison of false positive findings of these two tests will not be confounded by the different sample sizes employed. N = 200 in our investigations. Note that this sampling design is used for the purpose of simulations only and is in no way intended as a design in collecting data in practice.
The proportion of the simulations with significant associations and a nonsignificant HWE test is the type I error of the association study approach that is aided with the HWE test to guard against confounding from population admixture. This type I error includes both the specified type I error rate in the statistical testing (α) and that inflated due to population admixture that failed to be revealed by the HWE test. For comparison, the type I error rate of association studies with and without (computed as indicated in the second section) the aid of the HWE test for population admixture is contrasted in Figures 1 and 2. In simulations, we corroborated that the results for the power or type I error based upon the analytical approach in the first two sections are accurate (to avoid repetitiveness the results are not shown).
RESULTS
The sample size required and the power of the HWE test for population admixture (Tables 1,2,3,4): It can be seen from Tables 1 and 2 that the sample size (n) required to detect population admixture by the HWE test is generally quite large, except when the degree of population admixture is large (i.e., k ∼ 0.5) and the differentiation of populations P1 and P2 is large (i.e., when δf is large). Generally speaking, when δf = 0.2 (i.e., the frequencies of the allele M differ by 0.2 in the populations P1 and P2), n required is >2000 even with the largest degree of population admixture (k = 0.5) and n is >20,000 if the degree of population admixture is small (k = 0.1). These sample sizes well exceed those feasible and typically employed in association studies. When δf gets larger and k gets closer to 0.5, n gets smaller. Generally speaking, only when δf > 0.4, and when k > 0.2, can the population admixture be detected by the sample sizes typically employed in association studies (<1000).
For samples sizes 200 and 400 that are typically feasible, Tables 3 and 4 list the power to detect population admixture via the HWE test under various degrees of population differentiation and admixture. It can be seen that the power depends on both k and δf. Generally speaking, if δf < 0.2, there is little power to detect population admixture via the HWE test regardless of k. Only when δf is quite large (>0.4) and k > 0.2 is the power relatively high. When δf = 0.8, the power is almost always 100%. However, δf > 0.4 is probably rather rare in natural populations, especially in humans for candidate genes.
The effects of admixture of differentiated populations on the outcome of association studies (Figures 1
and 2): It can be seen (Figure 1) that when δf and δφ increase, the false findings of association studies (the type I error rate, ϵ) increase rapidly for the
Sample sizes required to detect population admixture with 90% power by the HWE test under different admixtures (k) for two populations with allele M frequencies indicated as f
Noticeable (Figure 2) is the fact that, over a range of δf, δφ, and k, ϵ remains fairly stable and close to the specified significance level α (0.05) for the
Sample sizes required to detect population admixture with 80% power by the HWE test under different admixtures (k) for two populations with allele M frequencies indicated as f
Choice of population samples and marker loci for HWE test (Figure 3): In large randomly mating populations (Figure 3, a and b), if the marker locus is in linkage disequilibrium with the disease locus due to linkage, testing HWE with both the cases and controls (selected for population association studies) will result in false positive findings of population admixture at a rate (ϵ) higher than the specified statistical type I error rate α (0.05). ϵ increases dramatically with increasing levels of linkage disequilibrium. When the marker locus is not linked to a disease locus, ϵ remains at the specified α of 0.05. However, if tested only in controls, ϵ remains at the level close to 0.05 whether the locus is linked to a disease locus or not.
In an admixed population P (Figure 3, c and d), if the marker locus is not linked to a disease locus, combining cases and controls for the HW test will generally have higher power to detect population admixture than testing in the controls alone, due to larger sample sizes. Testing the HWE with controls only has similar power to the testing with random samples of the same sizes. Testing HWE with combined sample sizes of cases and controls has similar or slightly greater power to detect population admixture than the test employing random samples of the same sizes. This is probably due to the elevated level of HW disequilibrium in cases due to population admixture. Although the marker is not linked to a disease locus in subpopulations P1 and P2, linkage disequilibrium between the marker and the disease is created upon admixture of P1 and P2 that differ in disease and marker frequencies. Such linkage disequilibrium leads to the elevated level of HW disequilibrium in cases.
The power to detect population admixture with 200 individuals sampled from a population under different admixtures (k) with two differentiated populations
Testing HWE for population admixture in association studies (Figures 1 and 2): By contrasting the ϵ's for the association studies that do and do not employ the HWE test for population admixture, it can be seen easily (Figures 1 and 2) that those employing the HWE test will suffer reduced levels of ϵ. However, the reduction of ϵ is generally small by accepting only those significant associations in samples with a nonsignificant HWE test. Therefore, the utility of testing HWE in reducing false positive findings due to population admixture is generally limited. This is consistent with earlier results on the limited power of the HWE test for population admixture.
The power to detect population admixture with 400 individuals sampled from a population under different admixtures (k) with two differentiated populations
DISCUSSION
With random population samples, extensive association studies have been conducted to search for genes underlying complex traits through linkage disequilibrium of these genes with markers. It is well known (Chakraborty and Smouse 1988; Lander and Schork 1994; Weir 1996) that if there is population stratification, spurious association may result between marker loci and complex traits in association studies. Although the qualitative effects of population stratification have long been recognized, the detailed quantitative effects of various degrees of population stratification on various linkage disequilibrium methods have seldom, if ever, been investigated. It is a usual practice (e.g., Denget al. 1999) and it is suggested (Tiret and Cambien 1995) to use the HWE test at candidate genes for population admixture in association studies with an aim to guard against false positive findings of markers with diseases.
Through analytical and computer simulation approaches, we quantified the power of the HW test for population admixture and the effects of population admixture on increasing the false positive findings (type I error, ϵ) in association studies under various scenarios of population admixture and population differentiation. We found that (1) the power of the HWE test for detecting population admixture is usually small, even with large samples, unless the degrees of population admixture and population differentiation are rather large; (2) population admixture seriously elevates ϵ for detecting genes underlying complex traits, the extent depending on the degrees of population admixture and population association; (3) HWE testing for population admixture should be performed with random samples, or only with controls at candidate genes, or the test may be performed for combined samples of cases and controls at marker loci that are not linked to the diseases under study; (4) testing HWE for population admixture generally reduces false positive findings of genes underlying complex traits but the effect is generally small due to the limited power to detect population admixture by the HWE test; and (5) compared with the conventional case-control analyses (
—The rate of false positive findings of association between a marker locus and a disease (ϵ) with the -test under various degrees of population differentiation (f2 and φ2) and admixture (k). A total of 200 cases and 200 controls are employed in the statistical tests; f1 = 0.2, φ1 = 0.01. The solid lines indicate ϵ unadjusted for the HWE test, and the dashed lines are for ϵ adjusted for the HWE test for population admixture. Squares, φ2 = 0.06; circles, φ2 = 0.04; triangles, φ2 = 0.02.
In this study, we focus on studying ϵ in a common practice (e.g., Tiret and Cambien 1995; Denget al. 1999) in association studies where the HWE test is employed at candidate gene(s) to guard against spurious association due to population admixture. Although, as revealed here, such a practice has some minor effects on decreasing ϵ in detecting disease genes in the presence of population admixture, it is also intuitive that in the absence of population admixture, such a practice will decrease the power to detect diseases genes. This is simply because the spurious population admixture will be detected by HWE tests that are entirely due to sampling error (at a rate specified by the level of the test significance α) in the absence of population admixture. Such spurious findings of population admixture may erroneously halt the testing for disease loci at the candidate genes. Recently, Pritchard and Rosenberg (1999) proposed employing a series of marker loci unlinked to the candidate genes to test for population admixture/stratification through contingency table χ2-tests in cases and controls as a means to effectively reduce ϵ in disease gene searching in association studies. Although combining a series of unlinked marker loci may increase the power to detect population admixture in the HWE test, the increase in power requires that each marker locus included in analyses is differentiated in subpopulations—a valuable piece of information that is generally unknown for markers in most admixed populations. Including markers that are not differentiated among subpopulations will generally decrease the power to detect population admixture. Most importantly, it is the differentiation of disease frequencies and allele frequencies at candidate genes in subpopulations of an admixed population that affects ϵ in disease gene testing at candidate genes. This can be easily seen analytically via the noncentrality parameters of association test statistics (Equations 6 and 8). Various loci across the human genome may be differentiated to various degrees in subpopulations of an admixed population. If the candidate genes are not differentiated, but the unlinked marker loci selected are differentiated in subpopulations and population admixture is detected by HWE tests at these unlinked marker loci, we will suffer substantial loss of power by stopping to test candidate loci for detecting disease genes. On the other hand, if the marker loci are not differentiated but the candidate genes are differentiated in subpopulations of an admixed population, we will still suffer inflated ϵ due to admixture of subpopulations differentiated at the candidate genes to be tested. The above problem may also undermine the usefulness of the approach of Pritchard et al. (2000a,b) for inferring population structure using multilocus genotype data to perform association studies of candidate genes. By applying the Bayesian approach, Devlin and Roeder (1999) developed a genomic control method for single nucleotide polymorphism (SNP) data densely sampled from the whole genome in case-control studies. This method is supposed to be able to, in whole genome case-control studies with SNP, control the type I error rate to desired levels by choosing appropriate tuning parameters in implementation.
—The rate of false positive findings of association between a marker locus and a disease (ϵ) with the -test under various degrees of population differentiation (f2 and φ2) and admixture (k). A total of 400 cases are employed. For other parameters, see the legend to Figure 1.
Pritchard and Rosenberg (1999) focus on the situation where there is no prior reason to suspect population admixture, and association studies have been conducted and positive results have been generated. This is because, as they correctly pointed out, case-control studies are often criticized under such circumstances for potential confounding effects of possible population admixture. Therefore, they suggest genotyping additional unlinked markers to test for population admixture in the presence of positive association results. Our study starts from a slightly different angle. Our study is stimulated by the general practice and suggestion (Tiret and Cambien 1995) that testing population admixture via the HWE test should proceed at the candidate gene before association tests and by the general perception that this procedure can effectively reduce the type I error due to population admixture. Testing HWE for population admixture at a candidate gene conditional on a significant case-control test may be of limited use in reducing the type I error due to population admixture in disease gene identification. This is because of the small power of the HWE test at a single candidate gene in detecting population admixture as demonstrated by the results here.
—The rate (ϵ) of the false positive findings of HW disequilibrium in randomly mating populations in marker loci in linkage disequilibrium with a disease locus (a and b) and the power to detect (η) population admixture by HWE test at a candidate gene locus (c and d) in admixed populations. (a) Recessive genetic model at the disease locus. The penetrances for the three genotypes A1A1, A1A2, A2A2, are, respectively, 1.00, 0.07, and 0.07. (b) Additive genetic model at the disease locus. The penetrances for the three genotypes A1A1, A1A2, A2A2, are, respectively, 0.98, 0.49, and 0.00. In a and b, data plotted are the mean and 1 SD, the solid squares are data for the tests with combined samples of 200 cases and 200 controls, and the open circles are for the tests with 200 controls. x-axes in a and b are expected average linkage disequilibrium coefficient between a marker locus and the disease locus defined as D = fMD — fMfD, where fMD is the haplotype frequency of the marker allele M and disease allele D, and fM and fD are the marker allele M frequency and disease allele D frequencies, respectively. In c and d, the solid and open squares are, respectively, for the tests with random samples of 200 and 400 individuals; the solid and open circles are, respectively, for the tests with 200 random cases and 200 random controls and for the tests with 200 random controls. In simulations for c, f1 = 0.1, f2 = 0.3, φ1 = 0.10, φ2 = 0.07. In simulations for d, k = 0.5, f = 0.2, φ1 = 0.10, φ2 = 0.07.
HWE is a fundamental topic in population genetics. Issues related to HWE have been subjected to extensive studies and have various applications in many research areas. Examples are the propositions of various tests of HWE (e.g., Louis and Dempster 1987; Hernandez and Weir 1989; Eguchi and Matsuura 1990; Guo and Thompson 1992) and HWE tests in stratified populations (e.g., Nam 1997); characterization of HW disequilibrium (Shoemakeret al. 1998); and testing of genes underlying complex traits through the HWE test in extreme samples of populations (Nielsenet al. 1999; Denget al. 2000). Schaid and Jacobsen (1999) proposed testing for disease genes in association studies by correcting the existent HW disequilibrium to avoid the inflated type I error due to population admixture/stratification; however, we (Deng and Chen 2000) found that their correction approach is generally not feasible in practice.
Chakraborty and Smouse (1988) and Briscoe et al. (1994) found that the level of linkage disequilibrium in a population P admixed of P1 and P2 populations for two marker loci is D = k(1 — k)δfδp, where k and δf are defined earlier and δp is the difference of the allele frequency of the second locus. The two loci are assumed to be in linkage equilibrium in the P1 and P2 populations. In this study, we assume that a locus and a disease are not associated in the P1 and P2 populations. The association in the P population is entirely due to the “disequilibrium” between the marker locus and the disease created by admixture. The degree of such disequilibria may be measured as D′ = k(1 — k)δfδφ. It is noted that the power to detect an association between the marker and the disease created by admixture critically depends on D′ as reflected by Equations 6a and 6b for the
Population association studies that depend on linkage and strong linkage disequilibrium between marker loci and loci underlying complex traits have been conducted extensively and have helped in deciphering some genetic bases of complex traits (e.g., Chagnonet al. 1998). Population association studies have advantages such as being powerful and relatively easy to recruit study subjects. However, the results generated so far from population association studies are largely inconsistent and controversial. Quantitative studies of the detailed mechanisms of various potential causes underlying the inconsistent results may not only provide a basis for correct implementation of association studies but also form a basis on which to correctly interpret the significant results obtained under various designs and analyses. For example, it is noted that association studies in cases only (Equation 7) may be more robust in identifying genes underlying complex traits than the conventional case-control analyses (Equation 5). In addition, the HWE test may reduce the false positive findings, but the effects are small. Therefore, while association studies can be a useful tool with which to generate hypotheses in gene identification, the hypotheses may also need to be substantiated by methods (Spielmanet al. 1993; Allison 1997; Xionget al. 1998) that are robust to population admixture, unless the samples are known from a homogeneous population.
Finally, it should be pointed out that although we examine the detection of HW disequilibrium due to population admixture in the context of localizing genes underlying complex diseases, some issues investigated here should be of general interest in genetics. For example, it is noted here for the first time that the degree of population differentiation as measured by GST has a direct relationship with the noncentrality parameter (and thus the power) of the test to detect HW disequilibrium (Equation 4). In addition, it is a general practice in population and evolutionary genetics to test for HW disequilibrium as a means to substantiate the assumptions for HW equilibrium (such as population admixture, inbreeding, and assortative mating). Nonsignificant results are generally interpreted as an indication of random mating in the study populations (e.g., Hebert 1987; Lynch and Spitze 1994; Deng and Lynch 1996). However, such a practice may not be reliable in that the test has limited power in detecting deviation from HW equilibrium due to population migration, etc., as demonstrated here. Therefore, the practice that employs the HW disequilibrium test to substantiate the assumptions of HW equilibrium may need to be treated with caution unless the sample size is very large (e.g., >1000).
Acknowledgments
We are grateful to Professor Asmussen and the two anonymous reviewers for providing careful comments that helped to improve the manuscript. This study was partially supported by grants from the National Institutes of Health, the Health Future Foundation, and HuNan Normal University and by a graduate student tuition waiver to Wei-Min Chen from Creighton University.
APPENDIX A: COMPUTATION OF THE STATISTICAL POWER BASED ON THE NONCENTRALITY PARAMETER OF THE χ HW 2 -STATISTIC
The power (η) of the
APPENDIX B: THE RELATIONSHIP OF λHW WITH G ST
GST = (HT — H̄S)/HT, where HT is the heterozygosity if all the isolated populations were converted into a single randomly mating population. H̄S measures the average heterozygosity of isolated subpopulations. In a population P admixed of populations P1 and P2 with a proportion k from P1 and (1 — k) from P2, for a locus with two alleles M and m with frequencies of M being f1 in P1 and f2 in P2 and the frequency of M in P is f, HT = 1 — f 2 — (1 — f)2, where f is defined in the text. If the average heterozygosity of P1 and P2 is computed by weighting the heterozygosity in P1 and P2, respectively, by their relative contributions to population P, H̄S = 2kf1(1 — f1) + 2(1 — k)f2(1 — f2), then
APPENDIX C: FREQUENCIES OF MARKER ALLELES AND GENOTYPES IN CASES AND CONTROLS IN AN ADMIXED POPULATION
Assume that the marker locus is not causally associated with the disease and assume that the marker genotypes (or alleles) and the disease are not associated in populations P1 and P2; the association between the marker and the disease in population P is then due entirely to the admixture. In population P, the expected frequency of the allele M in cases is
Footnotes
-
Communicating editor: M. A. Asmussen
- Received January 4, 2000.
- Accepted October 30, 2000.
- Copyright © 2001 by the Genetics Society of America