To determine whether and how common polymorphisms are associated with natural distributions of iris colors, we surveyed 851 individuals of mainly European descent at 335 SNP loci in 13 pigmentation genes and 419 other SNPs distributed throughout the genome and known or thought to be informative for certain elements of population structure. We identified numerous SNPs, haplotypes, and diplotypes (diploid pairs of haplotypes) within the OCA2, MYO5A, TYRP1, AIM, DCT, and TYR genes and the CYP1A2-15q22-ter, CYP1B1-2p21, CYP2C8-10q23, CYP2C9-10q24, and MAOA-Xp11.4 regions as significantly associated with iris colors. Half of the associated SNPs were located on chromosome 15, which corresponds with results that others have previously obtained from linkage analysis. We identified 5 additional genes (ASIP, MC1R, POMC, and SILV) and one additional region (GSTT2-22q11.23) with haplotype and/or diplotypes, but not individual SNP alleles associated with iris colors. For most of the genes, multilocus gene-wise genotype sequences were more strongly associated with iris colors than were haplotypes or SNP alleles. Diplotypes for these genes explain 15% of iris color variation. Apart from representing the first comprehensive candidate gene study for variable iris pigmentation and constituting a first step toward developing a classification model for the inference of iris color from DNA, our results suggest that cryptic population structure might serve as a leverage tool for complex trait gene mapping if genomes are screened with the appropriate ancestry informative markers.
IRIS pigmentation is a complex genetic trait that has long interested geneticists, anthropologists, and the public at large. However, it is yet to be completely understood. Eumelanin (brown pigment) is a light-absorbing polymer synthesized in specialized melanocyte lysosomes called melanosomes. Within the melanosomes, the tyrosinase (TYR) gene product catalyzes the rate-limiting hydroxylation of tyrosine to 3, 4-dihydroxyphenylanine (DOPA), and the resulting product is oxidized to DOPAquinone to form the precursor for eumelanin synthesis. Although TYR is centrally important for this process, pigmentation in animals is not simply a Mendelian function of TYR or of any other single protein product or gene sequence. In fact, study of the transmission genetics for pigmentation traits in humans and various model systems suggests that variable pigmentation is a function of multiple heritable factors whose interactions appear to be quite complex (Brauer and Chopra 1978; Bitoet al. 1997; Box et al. 1997, 2001; Akeyet al. 2001; Sturmet al. 2001). For example, unlike human hair color (Sturmet al. 2001), there appears to be only a minor dominance component for mammalian iris color determination (Brauer and Chopra 1978), and minimal correlation exists among skin, hair, and iris color within or between individuals of a given population. In contrast, between-population comparisons show good concordance; populations with darker average iris color also tend to exhibit darker average skin tones and hair colors. These observations suggest that the genetic determinants for pigmentation in the various tissues are distinct and that these determinants have been subject to a common set of systematic and evolutionary forces that have shaped their distribution in world populations.
At the cellular level, variable iris color in healthy humans is the result of the differential deposition of melanin pigment granules within a fixed number of stromal melanocytes in the iris (Imeschet al. 1997). The density of granules appears to reach genetically determined levels by early childhood and usually remains constant throughout later life, although a small minority of individuals exhibit changes in color during later stages of life (Bitoet al. 1997). Pedigree studies in the mid-1970s suggested that iris color variation is a function of two loci: a single locus responsible for depigmentation of the iris, not affecting skin or hair, and another pleiotropic gene for reduction of pigment in all tissues (Brues 1975). Most of what we have learned about pigmentation since has been derived from molecular genetics studies of rare pigmentation defects in humans and model systems such as mouse and Drosophila. For example, dissection of the oculocutaneous albinism (OCA) trait in humans has shown that many pigmentation defects are due to lesions in the TYR gene, resulting in their designation as TYR-negative OCAs (Oetting and King 1991, 1992, 1993, 1999; see albinism database at http://www.cbc.umn.edu/tad/). TYR catalyzes the rate-limiting step of melanin biosynthesis and the degree to which human irises are pigmented correlates well with the amplitude of TYR message levels (Lindseyet al. 2001). Nonetheless, the complexity of OCA phenotypes illustrates that TYR is not the only gene involved in iris pigmentation (Leeet al. 1994). Although most TYR-negative OCA patients are completely depigmented, dark-iris albino mice (C44H) and their human type IB oculocutaneous counterparts exhibit a lack of pigment in all tissues except for the iris (Schmidt and Beermann 1994). Study of a number of other TYR-positive OCA phenotypes has shown that, in addition to TYR, the oculocutaneous 2 (OCA2; Hamabeet al. 1991; Gardneret al. 1992; Durham-Pierre et al. 1994, 1996), tyrosinase-like protein (TYRP1; Abbottet al. 1991; Chintamaneniet al. 1991; Boissyet al. 1996), melanocortin receptor (MC1R; Robbinset al. 1993; Smithet al. 1998; Flanaganet al. 2000), and adaptin 3B (AP3B) loci (Ooiet al. 1997), and other genes (reviewed by Sturmet al. 2001) are necessary for normal human iris pigmentation. Each of these genes is part of the main (TYR) human pigmentation pathway. In Drosophila, iris pigmentation defects have been ascribed to mutations in >85 loci contributing to a variety of cellular processes in melanocytes (Ooiet al. 1997; Lloydet al. 1998), but mouse studies have suggested that ∼14 genes preferentially affect pigmentation in vertebrates (reviewed in Sturmet al. 2001) and that disparate regions of the TYR and other OCA genes are functionally distinct for determining the pigmentation in different tissues. Human pigmentation genes break out into several biochemical pathways, including those for tyrosinase enzyme complex formation on the inner surface of the melanosome, hormonal and environmental regulation, melanoblast migration and differentiation, the intracellular routing of new proteins into the melanosome, and the proper transportation of the melanosomes from the body of the cell into the dendritic arms toward the keratinocytes. Nonetheless, the study of human OCA mutants suggests that the number of highly penetrant phenotypically active pigmentation loci is surprisingly small.
Although research on pigment mutants has made clear that a small subset of genes is largely responsible for catastrophic pigmentation defects in mice and humans, it remains unclear whether or how common single-nucleotide polymorphisms (SNPs) in these genes contribute toward (or are linked to) natural variation in human iris color. A brown-iris locus was localized to an interval containing the OCA2 and MYO5A genes (Eiberg and Mohr 1996), and specific polymorphisms in the MC1R gene have been shown to be associated with red hair and blue iris color in relatively isolated populations (Robbinset al. 1993; Valverdeet al. 1995; Koppulaet al. 1997; Smithet al. 1998; Schiothet al. 1999; Flanaganet al. 2000). An ASIP polymorphism is reported to be associated with both brown iris and hair color (Kanetskyet al. 2002). However, the penetrance of each of these alleles appears to be low and, in general, they appear to explain but a very small amount of the overall variation in iris colors within the human population (Spritz 1995). However, single-gene studies have not provided a sound basis for understanding the complex genetics of human iris color. Because most human traits have complex genetic origins, wherein the whole is often greater than the sum of its parts, innovative genomics-based study designs and analytical methods for screening genetic data in silico that are respectful of genetic complexity are needed—for example, the multifactorial and/or phase-known components of dominance and epistatic genetic variance. The first step, however, is to define the complement of loci that on a sequence level explain variance in trait value and, of these, those that do so in a marginal or penetrant sense will be the easiest to find. It is toward this goal that we have performed the present study.
We have applied a nonsystematic, hypothesis-driven genome-screening approach to identify various SNPs, haplotypes, and diplotypes marginally (i.e., independently) associated with iris color variation. Our results show that a surprisingly large number of polymorphisms in a large number of genes are associated with iris colors, suggesting that the genetics of iris color pigmentation are quite complex. The sequences we have identified constitute a good first step toward developing a classifier model for the inference of iris colors from DNA, and the nature of some of these as markers of population structure might have implications for the design of other complex trait gene-mapping studies.
MATERIALS AND METHODS
Specimens: Specimens for resequencing were obtained from the Coriell Institute in Camden, New Jersey. Specimens for genotyping were of self-reported European descent, of different age, sex, hair, iris, and skin shades and they were collected using informed consent guidelines under Investigational Review Board guidance. Donors checked a box for blue, green, hazel, brown, black, or unknown/not clear iris colors, and each had the opportunity to identify whether iris color had changed over the course of their lives or whether the color of each iris was different. Individuals for whom iris color was ambiguous or had changed over the course of life were eliminated from the analysis. In addition, for 103 of the subjects, iris colors were reported using a number from 1 to 11 as well, where 1 is the darkest brown/black and 11 is the lightest blue, identified using a color placard. For these subjects, we obtained digital photographs of the right iris, where subjects peered into a box at one end at the camera at the other end to standardize lighting conditions and distance and from which a judge assigned the sample to a color group. Comparing the results of the two methods of classification, 86 of the classifications matched. Of the 17 that did not, 6 were brown/hazel, 7 were green/hazel, and 4 were blue/green discrepancies although none were gross discrepancies such as brown/green, brown/blue, or hazel/blue. Although such an error is tolerable for identifying sequences marginally associated with iris colors, the use of the sequences described herein for iris color classification would therefore likely require digitally quantified iris colors (which we have begun to accumulate and will present elsewhere).
SNP discovery: We obtained candidate SNPs from the National Center for Biotechnology Information (NCBI) Single Nucleotide Polymorphism Database (dbSNP), which generally provided more candidate SNPs than were possible to genotype. We focused on human pigmentation and xenobiotic metabolism genes, selected on the basis of their gene identities, not their chromosomal position. For some genes, the number of SNPs in the database was low and/or some of the SNPs were strongly associated with iris colors, warranting a deeper investigation. For these genes we performed resequencing and of the genes discussed in this article, 113 SNPs were discovered in CYP1A2 (7 gene regions, 5 amplicons, 10 SNPs found), CYP2C8 (9 gene regions, 8 amplicons, 15 SNPs found), CYP2C9 (9 gene regions, 8 amplicons, 24 SNPs found), OCA2 (16 gene regions, 15 amplicons, 40 SNPs found), TYR (5 gene regions, 5 amplicons, 10 SNPs found), and TYRP1 (7 gene regions, 6 amplicons, 14 SNPs found). Resequencing for these genes was performed by amplifying the proximal promoter (average 700 bp upstream of transcription start site), each exon (average size 1400 bp), the 5′ and 3′ ends of each intron (including the intron-exon junctions, average size ∼100 bp), and 3′ untranslated region (UTR; average size 700 bp) sequences from a multi-ethnic panel of 672 individuals (450 individuals from the Coriell Institute's DNA Polymorphism Discovery Resource, 96 additional European Americans, 96 African Americans, 10 Pacific Islanders, 10 Japanese, and 10 Chinese; these 672 individuals represented a set of samples separate from that used for the association study described herein). PCR amplification was accomplished using pfu Turbo polymerase according to the manufacturer's guidelines (Stratagene, La Jolla, CA). We developed a program (T. Frudakis, M. Thomas, Z. Gaskin, K. Venkateswarlu, K. Suresh Chandra, S. Ginjupalli, S. Gunturi, S. Natrajan, V. K. Ponnuswamy and K. N. Ponnuswamy, unpublished results) to design resequencing primers in a manner respectful of homologous sequences in the genome, to ensure that we did not coamplify pseudogenes or amplify from within repeats. BLAST searches confirmed the specificity of all primers used. Amplification products were subcloned into the pTOPO (Invitrogen, San Diego) sequencing vector and 96 insert-positive colonies were grown for plasmid DNA isolation (the use of 670 individuals for the amplification step reduced the likelihood of an individual contributing more than once to this subset of 96 selected). We sequenced with an ABI3700 using PE Applied Biosystems BDT chemistry and we deposited the sequences into a commercial relational database system (iFINCH, Geospiza, Seattle). PHRED-qualified sequences were imported into the CLUSTAL X alignment program and the output of this was used with a second program that we developed (T. Frudakis, M. Thomas, Z. Gaskin, K. Venkateswarlu, K. Suresh Chandra, S. Ginjupalli, S. Gunturi, S. Natrajan, V. K. Ponnuswamy and K. N. Ponnuswamy, unpublished results) to identify quality-validated discrepancies between sequences. We selected those for which at least two instances of PHRED identified variants that scored ≥24, and each of these SNPs discovered through resequencing were used for genotyping.
Genotyping: For most of the SNPs, a first round of PCR was performed on the samples using the high-fidelity DNA polymerase pfu Turbo and the appropriate resequencing primers. Representatives of the resulting PCR products were checked on an agarose gel, and first-round PCR product was diluted and then used as template for a second round of PCR. The two rounds were necessary due to the fact that many of the genes we queried were members of gene families, the SNPs resided in regions of sequence homology, and our genotyping platform required short (∼100 bp) amplicons. For those remaining, only a single round of PCR was performed. Genotyping was performed for individual DNA specimens using a single base primer extension protocol and an SNPstream 25K/ultra-high throughput (UHT) instrument (Beckman Coulter, Fullerton, CA, and Orchid Biosystems, Princeton, NJ). Genotypes were subject to several quality controls: two scientists independently pass/fail inspected the calls, requiring an overall UHT signal intensity >1000 for >95% of genotypes and clear signal differential between the averages for each genotype class (i.e., clear genotype clustering in two-dimensional space using the UHT analysis software).
Statistical methods: To test the departures from independence in allelic state within and between loci, we used the exact test, described in Zaykin et al. (1995). Haplotypes were inferred using the Stephens et al. (2001) haplotype reconstruction method. To determine the extent to which extant iris color variation could be explained by various models, we calculated R2 values for SNPs, haplotypes, and multilocus genotype data by first assigning the phenotypic value for blue eye color as 1, green eye color as 2, hazel eye color as 3, and brown eye color as 4. Biogeographical ancestry admixture proportions were determined using the methods of Hanis et al. (1986) and Shriver et al. (2003) within the context of a software program we developed for this purpose, which will be presented elsewhere (T. Frudakis, Z. Gaskin, M. Thomas, V. Ponnuswamy, K. Venkateswarlu, S. Gunjupulli, C. Bonilla, E. Parra and M. Shriver, personal communication). For R2 computation, we used the following function: Adj-R2 = 1 – [n/(n – p)](1 – R2), where n is the model degrees of freedom and n – p is the error degrees of freedom. To correct for multiple tests, we used the empirical Bayes adjustments for multiple results method described by Steenland et al. (2000). Linkage disequilibrium (LD) for pairs of SNPs within a gene was determined using the Zaykin exact test and a cutoff value of |D′| ≥ 0.05 (P value < 0.05; Zaykinet al. 1995).
To identify SNP loci associated with variable human pigmentation, we genotyped for 754 SNPs: 335 SNPs within pigmentation genes (AP3B1, ASIP, DCT, MC1R, OCA2, SILV, TYR, TYRP1, MYO5A, POMC, AIM, AP3D1, and RAB; Table 1), and 419 other SNPs distributed throughout the genome. Alleles for these latter SNPs were known to be informative for certain elements of population structure; 73 were selected from a screen of the human genome because they were exceptional ancestry informative markers (AIMs, based on high δ values) for Indo-European, sub-Saharan African, Native American, and East Asian biogeographical ancestry (BGA; Shriveret al. 2003; T. Frudakis, Z. Gaskin, M. Thomas, V. Ponnuswamy, K. Venkateswarlu, S. Gunjupulli, C. Bonilla, E. Parra and M. Shriver, unpublished observations). The rest were found in or around xenobiotic metabolism genes, which we have previously shown exhibit dramatic sequence variation as a function of BGA (Frudakiset al. 2003). Genotypes for these 754 candidate SNPs were scored for 851 European-derived individuals of self-reported iris colors (292 blue, 100 green, 186 hazel, and 273 brown). Before screening these genotypes for association with iris colors, we used the 73 nonxenobiotic metabolism AIMs to determine BGA admixture proportions for each sample and we tested for correlation between BGA admixture and iris colors. This test showed that each of our 851 Caucasian samples was of majority Indo-European BGA, and although 58% of the samples were of significant (>4%) non-Indo-European BGA admixture, there was no correlation among low levels (<33%) of East Asian, sub-Saharan African, or Native American admixture and iris colors. For more extensively admixed individuals, we observed no correlation between higher levels (>33% but <50%) of Native American admixture and iris colors, although there was a weak association between higher levels of East Asian and sub-Saharan African admixture and darker iris colors (data not shown).
It was unclear from the outset whether we would have better success considering iris color in terms of four colors (blue, green, hazel, and brown) or in terms of groups of colors. One method of grouping colors is light = blue + green and dark = hazel + brown, and this grouping would seem to more clearly distinguish individuals with respect to the detectible level of eumelanin (brown pigment). Given that our iris color data were self-reported, partitioning the sample into brown and not brown, or blue and not blue, could provide greater power to detect significant associations, particularly for alleles associated with blue or brown irises. To take advantage of each of these four methods, we considered all of them when screening SNPs for associations; we calculated the δ value, chi square, and exact test P values for (a) all four colors, (b) shades, using light (blue and green) vs. dark (hazel and brown), (c) blue vs. brown, and (d) brown vs. not brown (blue, green, and hazel) groupings. We fixed significance levels at 5%, and the alleles of 20 SNPs were found to be associated with specific iris colors, 19 with iris color shades, 19 with blue/brown color comparisons, and 18 using the brown/not brown comparison. The overlap among these SNP sets was high but not perfect. In all, 27 SNPs were significantly associated with iris pigmentation using at least one of the four criteria, and we refer to these as “marginally” associated. When multiple simultaneous hypotheses are tested at set P values, there is the possibility of enhanced type I error, so we used the correction procedure of Steenland et al. (2000) with adjusted residuals to compensate for this risk. We found that most of the associations were still significant after this correction (those with asterisks in Table 2), and since the analysis was conducted using adjusted residuals, some new associations were observed (i.e., MAOA marker 2 had a chi-square P value of 0.24 but was associated using the corrected testing procedure; Table 2). Most of the marginally associated SNPs were found within the pigmentation genes OCA2 (n = 10), TYRP1 (n = 4), AIM (n = 3), MYO5A (n = 2), and DCT (n = 2) although some associations were found within nonpigmentation genes such as CYP2C8 at 10q23, CYP2C9 at 10q24, CYP1B1 at 2p21, and MAOA at Xp11.3. No significant SNP associations within the pigmentation genes SILV, MC1R, ASIP, POMC, RAB, or TYR were found, although TYR had one SNP with a P = 0.06. The most strongly associated of the marginally associated SNPs were from the OCA2, TYRP1, and AIM genes, in order of the strength of association, which is the same order as that provided using the number of marginally associated SNPs, rather than their strength.
Since most of the SNPs identified from this approach localized to discrete genes or chromosomal regions, we grouped all of the SNPs from each locus and tested inferred haplotypes for association with iris colors using contingency analysis. We did not confine this higher-order analysis to those genes with marginal SNP associations, but we grouped all of the high-frequency SNPs tested for each gene. For each gene, we inferred haplotypes and used contingency analyses to determine which haplotypes were statistically associated with iris colors. From the chi-square and adjusted residuals, we found 43 haplotypes for 16 different loci to be either positively (agonist) or negatively (antagonist) associated with iris colors (Table 3). The strongest associations were observed for genes with SNPs that were marginally associated (Table 2) and most of the genes with marginal SNP associations had haplotypes and diplotypes (sometimes referred to as multilocus gene-wise genotypes or diploid pairs of haplotypes) positively (agonist) or negatively (antagonist) associated with at least one iris color (Table 3). A few of the genes/regions not harboring a marginally associated SNP had haplotypes and diplotypes positively and/or negatively associated with iris colors (ASIP gene, 1 haplotype; MC1R gene, 2 haplotypes; Tables 2 and 3). In other words, their SNPs were associated with iris colors only within the context of gene haplotypes or diplotypes. For some, associations with iris colors were found only within the context of diplotypes, but not at the level of the SNPs or the haplotype (i.e., SILV and GSTT2 genes located at 22q11.23). At the level of the haplotype, each gene or region had unique numbers and types of associations. For example, OCA2, AIM, DCT, and TYRP1 harbored haplotypes both positively associated with blue irises and negatively associated with brown irises (OCA2 haplotypes 1, 37, 38, 42; AIM haplotype 1; DCT haplotype 2; and TYRP1 haplotype 1; Table 3). Others genes such as AIM, OCA2, and TYRP1 harbored haplotypes positively associated with brown but negatively associated with blue color (AIM haplotype 2; OCA2 haplotypes 2, 4, 45, 47; TYRP1 haplotype 4; Table 3) while others, such as the MYO5A, OCA2, TYRP1, and CYP2C8 genes located at 10q23, harbored haplotypes positively associated with one color but not negatively associated with any other color (MYO5A haplotype 5 and haplotype 10, OCA2 haplotype 19, TYRP1 haplotype 3, and CYP2C8 haplotype 1; Table 3). The MC1R gene harbored haplotypes associated only with green color in our sample and the POMC gene harbored a single SNP with genotypes weakly associated with iris colors (no significant haplotypes or diplotypes were found). Overall, the diversity of haplotypes associated with brown irises was similar to that of haplotypes associated with blue irises. Most of the haplotypes were even more dramatically associated with iris colors in a multiracial sample (data not shown), because many of the SNPs comprising them are good AIMs and variants associated with darker iris colors were enriched in those ancestral groups of the world that are of darker average iris color (Frudakiset al. 2003; data not shown). Most of the SNPs within a gene or region were in LD with others in that gene or region (|D′| ≥ 0.05); only 32 SNP pairs—in the MC1R (1 pair), OCA2 (27 pairs), TYR (2 pairs), and TYRP1 (2 pairs) genes—were found to be in linkage equilibrium (not shown).
These analyses resulted in the identification of 61 SNPs in 16 genes/chromosomal regions associated with iris colors on one level or another; details for each and whether the SNP is marginally associated or associated within the context of the haplotype and/or diplotype are shown in Table 2. The minor allele frequency for most of these SNPs was relatively high (average F minor allele = 0.22) and most of them were in Hardy-Weinberg equilibrium (HWE; those for which HWE P > 0.05, 28/34; Table 3). Nine were not and of these 2 were of relatively low frequency with weak evidence for disquilibrium (P value close to 0.05). Lack of HWE is usually an indication of a poorly designed genotyping assay, but none of the remaining 7 SNPs exhibited genotyping patterns that we have previously associated with such problems (such as the complete absence of an expected genotype class or all genotypes registering as heterozygotes). Indeed, one of those for which the evidence of lack of HWE was the strongest was validated as a legitimate SNP through direct DNA sequencing (data not shown). The chromosomal distribution of the SNPs that were significantly associated in a marginal sense was found to be independent of the distribution of SNPs actually surveyed, indicating that the associations were not merely a function of SNP sampling and the same was true for the distribution of all the SNPs shown in Table 2 (data not shown). Chromosome 15q harbored the majority (14/27) of the SNPs that were marginally associated with iris colors, and all but one of these 14 were found in two different genes: OCA2 and MYO5A (Table 2). Chromosome 5p had 3 SNPs marginally associated, all in the AIM gene, and chromosome 9p had 5 SNPs associated, all in the TYRP1 gene. Multiple SNPs were identified on chromosome 10q; the CYP2C8-10p23 region had 1 marginally associated SNP, and the neighboring region, CYP2C9-10p24, also had one. As one might expect from the proximity of these two regions, CYP2C8-CYP2C9 marker pairs were found to be in tight LD with one another (P < 0.001 for each possible pair). Multiple SNPs were also identified on chromosome 2; the C/C genotype for the POMC SNP located at 2p23 was associated with blue iris color (Table 3) and a CYP1B1-2p21-region SNP was also marginally associated at the level of iris shade (Table 2), as well as within the context of a 2-SNP haplotype (Table 3). The SNPs between the 2p21 and 2p23 regions were also in LD (P < 0.01). Finally, in addition to the OCA2 (15q11.2–q12) and MYO5A (15q21) sequences, a single SNP (15q22–ter) was also implicated on chromosome 15q, but SNPs between each of these three loci were not found to be in LD (data not shown). SNPs for the MC1R (16q24), SILV (12q13), and TYR (11q) genes and for the MAOA-Xp11.4–11.3 and GSTT2-22q11.23 regions were also found to be associated at the level of the haplotype (Tables 3 and 4), although these were the only regions of these chromosomes for which associations were found.
The P values we obtained suggested that diplotypes explained more iris color variation than did haplotypes or individual SNPs. To test this, we performed a corrected ANOVA analysis for our data on each of these three levels. We considered all 61 SNPs in Table 2, their haplotypes in Table 3, and their diplotypes (not shown). Diplotypes explained 15% of the variation, whereas haplotypes explained 13% and SNPs explained only 11% (Table 4) after correcting for the number of variables. The most strongly associated 68 genotypes of the 543 genotypes observed for the 16 genes/regions, on the basis of chi-square-adjusted residuals, explained 13% of the variation (last row in Table 4).
From a screen of 754 SNP loci, we have identified 61 that are statistically associated with variable iris pigmentation at one level of intragenic complexity or another. The remaining SNPs had δ values and chi-square P values that were not significant on any level of intragenic complexity. Diplotypes for these 61 alleles explained most of the iris color variance in our sample; the lowest amount was explained at the level of the SNP, suggesting an element of intragenic complexity to iris color determination (i.e., dominance). Only about half of the 61 SNPs that we identified were associated with iris colors independently—the others were associated only in the context of haplotypes or diplotypes. Even at this level of complexity, the sequences from no single gene could be used to make reliable iris color inferences, which suggests an element of intergenic complexity (i.e., epistasis) for iris color determination as well. Aside from the fact that many of the SNPs we identified were significant after imposing the Steenland correction for multiple testing, there are three lines of evidence that the SNPs we have identified are not spuriously associated. The first is that for most of the genes for which we identified marginally associated SNPs, multiple such SNPs were identified. In other words, the distribution of SNPs among the various genes tested was not random. Indeed, the associations were observed to be generally stronger for the SNPs in the context of within-gene haplotypes—a result that would not necessarily be obtained for individual SNPs spuriously associated—suggesting that the gene sequences themselves are associated, not merely a spurious polymorphism within each gene. Second, although a roughly equal number of pigmentation and nonpigmentation gene SNPs were tested, of the 34 marginally associated SNPs, 28 of them (82%) were in pigmentation genes. In other words, the distribution of SNPs among the various gene types was also not random. Third, when applied to a sample including individuals of multiple ancestries, the linear and nonlinear variables from these and the other genes combined performed even better than when applied just to individuals of majority European ancestry (not shown). Since most individuals of non-European or minority European descent exhibit low variability in iris colors (on average of darker shade than individuals of European descent), this improvement may not seem surprising. However, this result would not have necessarily been obtained were we working with SNPs that were not truly associated with iris colors. Although corrections for multiple testing left most of the SNP-level associations intact, a number of the associations we found did not pass the multiple-testing examination, but nonetheless we present them here to avoid possible type II error; the sequences may be weakly associated with iris colors and possibly relevant within a multiple-gene model for classification (i.e., epistasis). For these, it would seem more prudent to eliminate false positives downstream of SNP identification, such as from tests of higher-order association, using various other criteria, such as those described above, or possibly using the utility of the SNP for the generalization of a complex classification model when one is finally described.
Mutations in the pigmentation genes are the primary cause of oculocutaneous albinism so it was natural to expect that common variations in their sequences might explain some of the variance in natural iris colors, and this is in fact what we observed. However, a number of the associations we identified were for SNPs located in other types of genes. The sequences for most of these genes vary significantly as a function of population structure (Frudakiset al. 2003) and it is possible that alleles for these SNPs are associated with elements of population structure that correlate with iris colors. Alternatively, the mechanism for the associations could be LD with phenotypically active loci in nearby pigment genes. Indeed, some, but not all, of our nonpigment gene SNPs are found in regions within the vicinity of pigmentation genes; CYP2C8 and CYP2C9 are located on chromosome 10 near the HPS1 and HPS2 pigmentation genes (which we did not test directly), CYP1A2 is located at 15q22–ter on the same arm as OCA2 and MYO5A, CYP1B1 is located at 2p21 in the vicinity of the POMC gene at 2p23, and MAOA is located on the same arm of chromosome X (Xp11.4–11.3) as the OA1 pigmentation gene (which we also did not test directly). The distances between these loci associated with iris colors and “neighboring” pigmentation genes is far greater than the average extent of LD in the genome, and if it is the case that these associations are through LD, it would seem that, again, population structure would need to be invoked as an explanation. The structure behind our results is unlikely to be of a crude (i.e., continental) nature; although two-thirds of our European-American samples were of significant (4%) BGA admixture, few correlations between structure measured on this level and iris colors were observed in this study. Rather, it seems likely that the structure behind our results is of a finer, more “cryptic” nature, such as ethnicity or even within-ethnic-group structure. To an investigator interested in elucidating a biological mechanism, association due to population structure might not seem to be very satisfying, but when classification is the goal rather than the elucidation of a biological mechanism, it would seem to matter little why a marker is associated with a trait. For example, forensics investigators construct physical profiles using surprisingly unscientific means; only in rare cases are eye-witness accounts available, and in certain circumstances these accounts are subjective and unreliable. A battery of genetic tests, of which one for the inference of iris color could be a part, could enable the construction of a more objective and science-based (partial) physical profile from crime-scene DNA, and an investigator using these tests would be less interested in the biological mechanism of the phenotype than in an ability to make an accurate inference of trait value. Of course, identifying markers in LD with phenotypically active loci (or the phenotypically active loci themselves) would provide for more accurate classification (as well as for a better understanding of biological mechanism), but the hunt for these elusive loci in heterogeneous populations is still impractical because LD extends only for a few kilobases and the economics of genome-wide scans in heterogeneous samples with full LD coverage are out of reach for most labs.
Linkage studies have implicated certain pigmentation genes as specifically relevant for pigmentation phenotypes, and most of the pigmentation gene SNPs that we identified clustered to certain genes such as OCA2, MYO5A, TYRP1, and AIM. Further, certain of our results support the previous literature. Most of the SNPs that we identified were on chromosome 15, which Eiberg and Mohr (1996) described from linkage analyses as the primary chromosome for the determination of “brownness.” As suggested by these authors, the candidate gene within the interval containing this locus (BEY2) is most likely the OCA2 gene, although the MYO5A gene is also present within this interval and, as shown here, associated with iris colors. OCA2 associations were by far the most significant of any gene or region we tested, while MYO5A SNPs were only weakly associated (but haplotypes and diplotypes more strongly). MYO5A alleles were not found to be in LD with those of OCA2, suggesting that these results were independently obtained and that Eiberg and Mohr's results may have been a reflection of the activity of two separate genes. Rebbeck et al. (2002) recently described two OCA2 coding changes associated with darker iris colors. One of these, the Arg305TRP SNP, was one of the 13 OCA2 SNPs that we found to be strongly associated with iris colors using all four of our color criteria, although its association was only the ninth strongest among the OCA2 SNPs that we identified and the eleventh strongest among all of the associated SNPs that we identified. The P values we obtained for this particular SNP association (P = 0.01–0.05, depending on the color criteria) were less significant than those described (P = 0.002) by Rebbeck et al. (2002). In addition, we independently isolated the “red hair/blue iris” SNP alleles described by Valverde et al. (1995) and Koppula et al. (1997), suggesting that these sequences are indeed associated with iris pigmentation as suggested by these authors, although we note that the associations described by these authors were with blue irises and at the level of the SNP, while those that we observed were with green irises and apparent only at the level of the haplotypes and diplotypes. We also identified associations in the ASIP gene, which supports previous work by Kanetsky et al. (2002), although it should be noted that we did not observe this gene association at the level of the SNP as they did; one of the ASIP SNPs that we identified (marker 861, Table 2) is the 8818 G-A SNP transversion that they described to be associated with brown iris colors, but from our study the association was with hazel color at the level of the haplotype. Last, we also showed that the associations between TYR haplotypes and iris colors were relatively weak, which is not inconsistent with results obtained by many others before us working in the field of oculocutaneous albinism who have failed to find strong associations in smaller samples. Although our results independently verified findings for OCA2, ASIP, and MC1R, they also show that several other pigmentation genes harbor alleles associated with the natural distribution of iris colors (TYRP1, AIM, MYO5A, and DCT). Therefore, it seems that our findings indicate that most of the previous results associating pigmentation gene alleles with iris colors, taken independently, represent merely strokes of a larger, more complex portrait. It is interesting that most of the SNPs that we discovered are noncoding, either silent polymorphisms or SNPs residing in the gene proximal promoter, intron, or 3′ UTR, which is not altogether unusual. Although this could indicate that the SNPs are in LD with other phenotypically active loci, it may also be a reflection that variability in message transcription and/or turnover may explain part of the variability observed in human iris colors. Although we screened a large number of SNPs, some of the genes harbor a large number of candidate SNPs and we did not test them all. For example, the OCA2 has ∼200 known candidate SNPs in NCBI's dbSNP, and it is possible that this gene has more to teach us about variable human iris pigmentation than what we have learned from the work presented herein.
Clearly work remains to be done, objectifying the collection of iris colors from subjects, enhancing the sample size so that epistatic interactions can be explored, possibly screening other regions of the genome not screened here, and modeling the sequences that we have described to enable classification of iris colors from DNA. However, the results presented herein constitute a good first step toward solving what our results confirm is a very complex genetics problem. When this work is more fully developed, it may be possible to assign an iris color to an individual sample with reasonable certainty, and surely in this case the results herein will have some tangible value for the field of forensic science. Alternatively, as a research tool, the common haplotypes that we have identified and the complex, biologically relevant contexts within which they are found may help researchers more accurately define risk factors for pigmentation-related diseases such as cataracts and melanoma.
We thank D. C. Rao, Director of the Division of Biostatistics, Washington University, St. Louis, for help preparing this manuscript; Mark Shriver, Department of Anthropology and Human Genetics at The Pennsylvania State University for his help with the biogeographical ancestry admixture aspect of the project; and Murray Brilliant, professor of Pediatrics and Molecular and Cellular Biology at the University of Arizona for their kind advice and support of our work. We also thank Robert White for his help with sample collection. We sincerely thank the referees for their valuable suggestions for improvements on the earlier version of this article. Last, we thank the reviewers of this manuscript who suggested a number of important improvements.
Communicating editor: P. J. Oefner
- Received March 28, 2003.
- Accepted August 20, 2003.
- Copyright © 2003 by the Genetics Society of America