Identification of risk alleles for human behavioral disorders through genomewide association studies (GWAS) has been hampered by a daunting multiple testing problem. This problem can be circumvented for some phenotypes by combining genomewide studies in model organisms with subsequent candidate gene association analyses in human populations. Here, we characterized genetic networks that underlie the response to ethanol exposure in Drosophila melanogaster by measuring ethanol knockdown time in 40 wild-derived inbred Drosophila lines. We associated phenotypic variation in ethanol responses with genomewide variation in gene expression and identified modules of correlated transcripts associated with a first and second exposure to ethanol vapors as well as the induction of tolerance. We validated the computational networks and assessed their robustness by transposon-mediated disruption of focal genes within modules in a laboratory inbred strain, followed by measurements of transcript abundance of connected genes within the module. Many genes within the modules have human orthologs, which provides a stepping stone for the identification of candidate genes associated with alcohol drinking behavior in human populations. We demonstrated the potential of this translational approach by identifying seven intronic single nucleotide polymorphisms of the Malic Enzyme 1 (ME1) gene that are associated with cocktail drinking in 1687 individuals of the Framingham Offspring cohort, implicating that variation in levels of cytoplasmic malic enzyme may contribute to variation in alcohol consumption.
IT is generally recognized that studies on genetically tractable model organisms can provide information that is relevant for human disorders (Bilen and Bonini 2005; Mackay and Anholt 2006; Crabbe 2008; Flinn et al. 2008; Woodruff-Pak 2008). Nonetheless, attempts to identify risk alleles for human diseases have relied almost exclusively on linkage studies in human families or association analyses in human populations without reliance on translational approaches based on model systems. Genomewide association studies (GWAS) have become increasingly popular, but, despite some impressive successes, have fallen short of expectations. Especially, GWAS performed to date for behavioral and neuropsychiatric disorders have yielded few and often nonreproducible results (Nica and Dermitzakis 2008; O'Donovan et al. 2008; Feulner et al. 2009; Johnson and O'Donnell 2009; Moskvina et al. 2009; Pankratz et al. 2009; Psychiatric GWAS Consortium Steering Committee 2009).
There are many, nonmutually exclusive reasons for the difficulty in identifying risk alleles for human behavioral traits. It is often difficult to precisely quantify a heterogeneous spectrum of phenotypes often encountered in human behavioral phenotypes and neuropsychiatric disorders. Ethnic stratification in nonhomogeneous populations can lead to spurious associations. Gene-by-gene and gene-by-environment interactions can be confounding factors. Association study designs have low power to detect low frequency alleles with moderate effects, even in samples of several thousand individuals. The power of single nucleotide polymorphisms (SNPs) to detect causal variants is reduced further if the true causal variant is not in perfect linkage disequilibrium with the tagging SNP. Furthermore, failure of GWAS to identify alleles with large effects has generated the notion that the underlying basis of neuropsychiatric disorders may be determined by many genes of small effects, and thus many studies have been seriously underpowered. Finally, human association studies typically implicate regions containing many genes, rather than individual genes or the causal variant; and the significance of those genes that have been identified in terms of their mechanistic contributions to the phenotype is often unclear. It is now generally recognized both from studies in model organisms and people that complex traits cannot be understood in terms of single genes, but must be investigated at the level of genetic networks (Joober et al. 2002; Sieberts and Schadt 2007; Chen et al. 2008; Emilsson et al. 2008; Ayroles et al. 2009; Buchanan et al. 2009; Cookson et al. 2009).
One untested strategy for improving the power to detect alleles affecting behavioral traits in human association studies is to identify networks of genes affecting a homologous behavior in a genetically tractable model organism and to perform association tests on only human orthologs of these genes. This strategy ameliorates the statistical penalty for multiple tests in an unbiased genome scan and enables tailoring the choice of molecular markers used in the association tests to the fine structure of linkage disequilibrium at each candidate gene.
Here, we present a proof of principle of this approach by using Drosophila melanogaster as a model system for the ultimate identification of polymorphisms in human genes that are associated with alcohol drinking behavior. Although addiction is difficult to model in animal systems, D. melanogaster presents an excellent system for genetic studies of alcohol sensitivity and tolerance (Moore et al. 1998; Scholz et al. 2000, 2005; Singh and Heberlein 2000; Cheng et al. 2001; Rothenfluh and Heberlein 2002; Wolf et al. 2002; Guarnieri and Heberlein 2003; Morozova et al. 2006, 2007; Rothenfluh et al. 2006). Most previous studies on alcohol sensitivity in Drosophila have characterized the effects of induced mutations. In human populations, we are interested in the effects of naturally occurring genetic polymorphisms on phenotypic variation. Therefore, we decided to study alcohol sensitivity and tolerance in a panel of D. melanogaster inbred lines derived from a natural population (Ayroles et al. 2009). Here, individuals within each line are genetically identical, but the phenotypic variation observed in nature is preserved among the lines. We adopted a two-step approach. First, we characterized genotype–phenotype relationships in these lines by identifying and validating networks of correlated transcripts associated with acute sensitivity to alcohol exposure or tolerance. Next, we identified human orthologs in these networks as candidate genes for alcohol sensitivity in human populations. On the basis of information derived from the Drosophila model, we selected Malic enzyme (Men) as a focus for our proof-of-principle translational study. Men represents a central metabolic switch that links the glycolytic pathway to the tricarboxylic acid cycle and generates NADPH, an essential cofactor for fatty acid biosynthesis, while converting malate into pyruvate. Men is associated with alcohol resistance in flies (Morozova et al. 2006) and in humans is likely to represent a critical metabolic juncture that enables development of fatty liver syndrome in heavy drinkers. Here, we demonstrate that our translational approach indeed has led to the discovery of SNPS in ME1 that are associated with drinking behavior in a human population, with an effect size that could not have been resolved with large-scale unbiased GWAS.
MATERIALS AND METHODS
The 40 inbred lines were derived by 20 generations of full-sib mating from isofemale lines that were collected from the Raleigh, NC farmer's market in 2003 (Ayroles et al. 2009). Homozygous P-element insertion lines containing P[GT1]-elements in or near candidate genes in a co-isogenic Canton-S (A, B, F) background were generated as part of the Berkeley Drosophila Genome Project (Bellen et al. 2004). Fly stocks were reared under standard culture conditions on cornmeal-molasses-agar medium at 25°, 60–75% relative humidity, 12 hr light–dark cycle. Flies were not exposed to CO2 anesthesia for at least 24 hr prior to the assay.
Quantitative assay for alcohol sensitivity and tolerance:
We assessed ethanol sensitivity and tolerance for the 40 wild-derived inbred lines in blocks of approximately eight lines and a control line (Canton-S B). Each block was tested over a 2-week period. There were two replicate measurements of each sex (N = 70 per replicate) per line; the replicates for each line were assessed on different days. We placed the flies in an inebriometer (Weber 1988) preequilibrated with ethanol vapor, and collected them at 1-min intervals as they eluted. We recorded elution times from the initial exposure to ethanol (E1) and a second exposure of the same flies (E2) 2 hr later. The mean elution time (MET) is a measure of alcohol sensitivity, and the scaled difference of MET between the second and first exposures is a measure of tolerance, i.e., (E2 − E1)/0.5 (E1 + E2). Sensitivity from a single exposure to ethanol was similarly tested for 11 P-element insertional mutations in candidate genes and their co-isogenic control line, but only for males, with 4–5 replicates/line, N = 60–70/replicate.
Whole genome expression analysis:
The gene expression analysis has been described previously (Ayroles et al. 2009). Briefly, RNA was extracted from two independent pools of 25 3- to 5-day-old flies/sex/line that were frozen at the same time of day, labeled, and hybridized to Affymetrix Drosophila 2.0 arrays, using a strictly randomized experimental design. The raw array data were normalized using a median standardization. The measure of expression was the median log2 signal intensity of the probes in the perfect match probe sets, after removing probes containing single feature polymorphisms (SFPs) between the wild-derived lines and the reference strain sequence used to design the array. Negative control probes were used to estimate the level of background intensity; probe sets with expression levels below this threshold were considered to be not expressed.
Quantitative genetic and statistical analysis:
We expressed individual elution times of the 40 wild-derived inbred lines as deviations from the control mean for the appropriate block and sex. We used mixed model factorial analysis of variance (ANOVA) to partition variance in ethanol sensitivity after the first and second exposures among the inbred lines, according to the model Y = μ + S + L + S × L + Rep(L × S) + W, where μ is the overall mean, S is the fixed main effect of sex, L is the random main effect of line, S × L is the random effect of the sex by line interaction, Rep(L × S) is the random effect of replicate, and W is the within-replicate variance. Parentheses indicate nested effects. Similarly, we analyzed the data across both exposures by introducing a third cross-classified effect, exposure (E) into the ANOVA model: Y = μ + S + E + L + S × E + S × L + E × L + S × E × L + Rep(S × E × L) + W. Significance of the E × L term indicates genetic variation in the difference in MET between E1 and E2; i.e., genetic variation in the induction of tolerance. A false discovery rate (FDR) for the L term of FDR < 0.001 was used in the analysis of natural variation in gene expression to account for multiple tests (Hochberg and Benjamini 1990). The total genotypic variance among lines was estimated as σG2 = σL2 + σLS2, where σL2 is the among-line variance component and σLS2 is the variance attributable to the L × S interaction. The total phenotypic variance was estimated as σP2 = σG2 + σE2, where σE2 is the environmental variance component. We estimated broad sense heritabilities as H2 = σG2/σP2. We estimated cross-trait genetic correlations as rG = covij/σiσj, where covij is the covariance of line means between trait i and trait j, and σi and σj are the square roots of the among-line variance components for the two traits. ANOVA models (Y = μ + L + Rep(L) + ε) were used to assess differences in ethanol sensitivity between P-element insertion lines and their co-isogenic control. Linear regressions and ANOVA models were used to identify quantitative trait transcripts (QTTs) and SFPs significantly associated (P < 0.01) with variation in ethanol sensitivity and tolerance across the 40 lines, as described previously (Ayroles et al. 2009). We used modulated modularity clustering (Stone and Ayroles 2009) to derive modules of genetically correlated transcripts associated with ethanol phenotypes. Briefly, we compute the correlation between all pairs of transcripts that vary significantly among the lines and transform them to define the edge weights in a graph of genes. Modulated modularity clustering seeks modules of tightly intercorrelated genes by identifying the graph partition that maximizes the modulated modularity function defined in Stone and Ayroles (2009). Transcripts with spurious association to a phenotype are unlikely to correlate with biologically relevant transcripts after removing the source of the association; conversely, transcripts under coordinated control are likely to exhibit correlated abundance patterns even after removing the effect of their common relationship to a phenotype.
Functional gene annotations are based on FlyBase (http://flybase.bio.indiana.edu; Drysdale and Crosby 2005) and Affymetrix, Netaffx Analysis Center (2008 http://www.affymetrix.com) compilations, using FlyBase release version 5.2. Gene Ontology enrichment analysis used the DAVID program (Dennis et al. 2003), and tissue-specific gene expression data were obtained from FlyAtlas (Chintapalli et al. 2007).
Quantitative RT–PCR and enzyme assays:
We quantified mRNA levels by quantitative RT–PCR with the SYBR Green detection method (SYBR GREEN PCR master mix, Applied Biosystems/ABI, Foster City, CA) according to the protocol from ABI, using the ABI PRISM 7900 Sequence Detection System (ABI). We used the glyceraldehyde-3-phosphate dehydrogenase gene for the internal standard. Five independent replicates of total RNA were isolated from the Canton-S B control and the pipsqueak (psq) and Men P-element insertion lines using the Trizol reagent (GIBCO-BRL, Gaithersburg, MD) and cDNA was generated from 350 ng of total RNA by reverse transcription using the High Capacity cDNA Reverse Transcription kit (ABI). Primer Express software (ABI) was used to design transcript-specific primers to amplify up to 100-bp regions of all genes of interest. Primers were designed to encompass common regions of alternative transcripts. Negative controls without reverse transcriptase were used for all genes to exclude potential genomic DNA contamination. Samples in each run were normalized relative to a control sample (using 2−ΔΔCt values, according to ABI User Bulletin no. 2) (Applied Biosystems 2001). Statistical analyses for differences in gene expression levels between P-element insertion lines and the control line were determined by two-tailed Student's t-tests on ΔCt values.
To examine developmental stage-specific gene expression levels in the psq or Men P-element insertion lines, relative levels of expression were analyzed in the same way as above after extraction of triplicate RNA samples from embryos between 5 and 8 hr after oviposition, third instar larvae, pupae, whole adult flies, and adult heads and bodies, separately.
Malic enzyme activity was assayed in triplicate for males and females separately by measuring the rate of generation of NADPH at 340 nm, as described by Merritt et al. (2009), using a 96-well plate spectrophotometer (Power Wave X, BioTek Instruments, Winooski, VT) in which absorbance was measured every 11 sec over 3 min. Samples were assayed twice and the average of each measurement was used for analysis. Enzyme activities were standardized by soluble protein and expressed as micromoles of NADP+ reduced per minute per microgram soluble protein × 10,000. The relationship between Malic enzyme activity after a single exposure to ethanol and phenotypic variation in second MET was assessed using the linear model: Y = C + S + L(C) + C × S + S × C + S × L(C) + ε, where Y is the Malic enzyme activity, C is the phenotypic class (short or long MET after a second exposure to ethanol), S is the effect of sex, L(C) is the effect of line nested within phenotypic class, and ε is the residual error term.
Study subjects and ascertainment of alcohol consumption:
The Framingham Heart Study is a population-based cohort study started in 1948 in Framingham, Massachusetts. The original cohort included 5209 participants. In 1971, children of the original cohort and their spouses were invited to participate in a prospective study called the Offspring Study. Detailed descriptions of the Framingham Heart Study have been published (Dawber et al. 1952; Kannel et al. 1979). Information on alcohol consumption has been collected repeatedly from both the original and offspring cohorts. For the present study we used only unrelated individuals of the offspring cohort. At all examinations (i.e., every 4 years), subjects were asked how many 1.5-oz cocktails, 12-oz glasses (or cans) of beer, and 5-oz glasses of wine they consumed in a week. Details on alcohol assessment in the Framingham Heart Study have been described previously (Djousse et al. 2002).
SNP genotyping and association analysis:
For genotyping the human ME1 gene (chr 6: 83,976,829–84,197,498), genotype data spanning the coding region were downloaded from the International HapMap Project (Gabriel et al. 2002; Frazer et al. 2007). Data from the Centre d'Etude du Polymorphisme Humain (CEPH) sample (Utah residents with northern and western European ancestry) were used. The program Tagger was used to select an optimal set of available SNPs for genotyping, at an r2 threshold of 0.9 (de Bakker et al. 2005). We excluded SNPs with a minor allele frequency of <0.2 since we expected the alleles that contribute to alcohol-related phenotypes in this data set to be common, and we chose one tagging SNP from each LD block. To assess whether LD structure in our population might be different from the CEPH sample, we chose two SNPs in the same LD block (rs13215578, rs1170347). Selected SNPs were scored by the Illumina Customer Support Center for expected performance on the Illumina platform. A total of 24 SNPs in the human ME1 gene were genotyped (see supporting information, Table S1) using the Illumina Golden Gate genotyping platform in samples from 1717 unrelated subjects from the offspring cohort of the Framingham Heart Study, with 250 ng of each sample DNA per well and concentrations adjusted to 50 ng/μl. Genotype calls were made using the BeadStudio Genotyping Module v3.2 (Illumina, San Diego, CA) software package. Three SNPs did not cluster well and were not called (rs3798886, rs1145908, and rs1180186). The call rates for the remaining 21 SNPs were within 99.7–100% confidence with 99.99% concordance among the replicates. The call rates for 30 individuals were lower than 98.9% and they were discarded from the analysis.
We used a repeated measures ANOVA over age to identify SNPs associated with variation in drinking behavior. We fitted the following full factorial model using PROC MIXED: Y = μ + A + G + S + G × S + A × G + A × S + A × G × S + ε, where μ represents the grand mean, A is the effect of age, used as a continuous covariate, G is the genotypic effect, S is the sex effect, and ε is the error. A first-order autoregressive covariance structure for the error was selected on the basis of a likelihood ratio test averaged across SNPs. We adjusted for multiple testing using 1000 permutations. For each permutation we recorded the minimal P-value and the 95th and 99th percentile of the resulting distribution.
Variation in alcohol sensitivity and tolerance among 40 wild-derived inbred Drosophila lines:
Ethanol sensitivity in Drosophila can be quantified by exposing groups of flies of the same genotype to ethanol vapors in an “inebriometer,” a 4-foot-long glass tube with slanted mesh partitions to which flies can adhere (Weber 1988). As flies lose postural control they fall through the inebriometer, from where they are collected at 1-min intervals. The MET is a quantitative measure of ethanol sensitivity. A second exposure 2 hr later extends the MET and reflects the development of tolerance (Scholz et al. 2000; Morozova et al. 2006). Thus, behavioral responses of flies to alcohol show similarities to the effects of alcohol on people.
We measured METs of 40 wild-derived inbred lines of D. melanogaster and found substantial genetic variation in sensitivity following one and two exposures to ethanol, with broad sense heritabilities of 0.244 and 0.216, respectively (Figure 1, A and B; Table S2). Indeed, the range of variation among the inbred lines is comparable to that achieved after 25 generations of selection (Morozova et al. 2007). Although females are on average more sensitive than males, there is no genetic variance in sex dimorphism for alcohol sensitivity (Table S2). The MET averaged over all lines was greater after the second than the first exposure, indicating development of tolerance (Table S2). The genetic correlation between the METs after one and two exposures was high (rG = 0.84, P < 0.0001), but significantly different from unity (Figure 1B; Table S2), indicating significant genetic variation among the lines in the magnitude of induction of tolerance. The significant line by exposure interaction was entirely attributable to changes in rank order of the MET among the lines in the two exposures (Figure S1). We quantified tolerance (T) as the standardized shift in MET between the first (E1) and second (E2) exposures, scaled by their average (i.e., T = (E2 − E1)/0.5(E1 + E2); Figure 1C). Tolerance has a negative genetic correlation with the initial MET (Figure 1C; rG = −0.57; P < 0.001), indicating that tolerance is proportionately greater for sensitive than resistant lines.
Transcriptional networks for alcohol sensitivity and tolerance in Drosophila:
A previous study quantified genetic variation for 10,096 transcripts and 3136 SFPs among these lines (Ayroles et al. 2009). We identified candidate genes affecting alcohol sensitivity and tolerance by regressing phenotypic values on transcript abundance and assessing differences in mean between SFP classes. We identified 1133 transcripts associated with one or more alcohol phenotypes with a nominal P-value of 0.01 (Table S3). This threshold corresponds to a 0.27 false discovery rate, a liberal criterion chosen deliberately for subsequent clustering analysis. We found 295, 592, and 410 candidate genes associated with METs from the first and second exposures and tolerance, respectively (Table S4). In addition, we identified 432 SFPs for a total of 1455 candidate genes. Consistent with the genetic correlations among alcohol phenotypes, there are more transcripts than expected by chance in common between the first and second exposures (105, χ21 = 481, P < 0.0001) and the first exposure and tolerance (42, χ21 = 78.1, P < 0.0001) but not between tolerance and the second exposure (19, χ21 = 0.95, P = 0.33) (Figure 2A). Thus, the genomic signatures for alcohol sensitivity shift dramatically as flies are subjected to repeated exposure to alcohol.
The transcriptome is highly genetically intercorrelated in these lines (Ayroles et al. 2009), enabling us to group transcripts significantly associated with each trait into genetically correlated modules (Stone and Ayroles 2009). This unbiased, self-organizing procedure generated eight transcriptional modules associated with variation in MET upon the initial exposure (Figure 2B), six associated with variation in MET after the second exposure (Figure 2C), and five associated with tolerance (Figure 2D; Table S4).
We analyzed gene ontologies, tissue-specific expression patterns, and shared transcription factor binding motifs to interpret the functions of the modules. Genetic variation in initial sensitivity to ethanol exposure is associated with transcriptional variation in genes with a wide range of functions, including defense responses to environmental chemicals and maintenance of cellular integrity through transcriptional regulation, intracellular protein synthesis, and trafficking (Table S5). Gene ontology categories comprised correlated transcripts encoding glutathione transferases (module 5), epithelial fluid transport and voltage-gated calcium channel activity (module 7), and polysaccharide metabolism (module 8) (Table S5). Genetic variation for MET following the second exposure to ethanol is associated with correlated transcripts affecting protein localization and transport (module 3) and mitochondrial protein synthesis, including mitochondrial ribosomal proteins (module 4). Genes implicated in nervous system function do not stand out among the transcripts associated with genetic variation in MET following one or two exposures to alcohol. However, natural variation in tolerance is associated with correlated transcripts affecting nervous system function (modules 2 and 4; Table S5). Genetic variation in tolerance is also associated with variation in correlated transcripts affecting oxidative phosphorylation (module 2) and metabolism (modules 4 and 5; Table S5).
Consistent with the wide range of functional annotations of transcripts associated with ethanol phenotypes, we find expression of these transcripts in all tissues, with strong enrichment observed in some modules for expression in the midgut, Malpighian tubules, and testis (Figure S2). Modules of correlated transcripts associated with variation in ethanol phenotypes were also enriched for binding motifs of several transcription factors (Table S6).
Analysis of the connectivity of the genes in tolerance module 2 reveals two highly connected gene clusters, one associated with synaptic function, and the other with electron transport and oxidative phosphorylation (Figure 3). Gene products associated with muscle contraction and carbohydrate biosynthesis, processes dependent on ATP, are also contained within this cluster. Genetically correlated transcripts associated with ATP synthesis are linked to the cluster of transcripts associated with synaptic function through comatose (comt), which encodes an N-ethylmaleimide sensitive fusion protein that mediates ATP-dependent synaptic vesicle release.
Many transcripts associated with ethanol phenotypes are predicted genes with unknown function. However, their connectivity with other transcripts across the whole genome and within modules of correlated transcripts associated with a specific phenotype can generate testable hypotheses regarding their functions. Transcripts of CG16743 and CG5704 are associated with all three ethanol phenotypes. GO analysis of the 100 transcripts most highly genetically correlated with each of these genes indicates that CG16743 is most likely a peptidase and CG5704 a transcription factor (Table S7). Similarly, CG1882 is associated with tolerance and five other quantitative traits, including recovery from chill coma and waking activity (Table S4), and is highly correlated with transcripts affecting oxidative phosphorylation and phototransduction. CG7990 and CG9350 are in tolerance module 2 and expressed in the brain. CG7990 probably affects synaptic transmission and CG9350 oxidative phosphorylation, consistent with other transcripts in tolerance module 2 (Table S7).
Validation of transcriptional networks:
To validate these networks we identified 11 P-element insertions in a common co-isogenic background (Bellen et al. 2004) that correspond to transcripts in modules associated with the initial exposure to alcohol. Seven of these indeed affect alcohol sensitivity (Figure 4A). One of these genes, psq, belongs to a family of helix-loop-helix transcription factors that is essential for neurodevelopment (Norga et al. 2003), affects olfactory behavior (Sambandan et al. 2006), and shows a correlated transcriptional response to selection for ethanol sensitivity and resistance (Morozova et al. 2007). Compared to the co-isogenic control, expression of psq is enhanced in the psq mutation (Figure 4B; Table S8). We predicted that transcripts genetically correlated with psq in E1 module 7 would also show altered transcript levels in the psq mutant background, and found that expression of 17 of 23 such transcripts tested (74%) is altered in the psq mutant line (Figure 4C; Table S8). Since P-element mutations often cause widespread changes in transcript abundance (Anholt et al. 2003), we also expect to find altered expression of genes that do not belong to correlated transcriptional modules associated with ethanol phenotypes in the psq mutant line, but that the proportion of such genes will be lower than that of genes in the module. Indeed, transcript levels of only 2 of 11 genes (18%) not genetically correlated with psq show altered expression in the psq mutation (Fisher's exact test P < 0.0005; Table S8).
To further examine the effects of psq on gene expression we surveyed genes in E1 module 7 for Psq binding GAGAG motifs (Schwendemann and Lehmann 2002) in promoter regions and found such motifs in bi, phol, Cbl, and simj. Multiple GAGAG repeats 5′ to the ATG translation initiation codon in the gene that encodes the bifid (bi, a.k.a. optomotor blind) T-Box type transcription factor were especially prominent (Porsch et al. 2005). Moreover, among the 17 genes analyzed by quantitative RT–PCR, bi showed the greatest reduction in transcript abundance in the psq P-element mutant (Table S8). These findings indicate that the genetic correlations with the phenotype within the psq module presented in Figure 4C do not originate from direct transcriptional effects of psq on each correlated gene, but rather arise from a complex cascade of indirect effects.
Next, we examined network connectivity with Men in E2 module 6, which contains a large number of metabolic enzymes (Table S5), and validated the connectivity using a line in which a P-element has inserted in the Men gene. This P-element insertion line is resistant to alcohol exposure compared to the co-isogenic Canton-S (B) control by an average MET of 2.8 min (P < 0.0001). Analysis of Men expression across all developmental stages shows that this P-element insertion mutant is essentially a null mutant (Figure 5A). We analyzed expression levels of 16 genes connected in E2 module 6 with Men (Figure 5B) and found that 12 (75%) of the transcripts tested indeed showed altered transcript levels measured by RT–PCR in the P-element insertion mutant (Figure 5B; Table S8). At the same time, only 3 of 11 transcripts not genetically correlated with Men showed altered expression in the P-element insertion mutant (Fisher's exact test P =0.02; Table S8), once again validating the computationally derived covariant module.
We selected four lines with the shortest and four lines with the longest second ethanol exposure METs averaged across sexes and measured Malic enzyme activities after a single exposure to ethanol. We observed significant differences in the means of enzyme activities between the lines, consistent with enhanced Malic enzyme activity contributing to subsequent induction of tolerance and providing further validation for the involvement of Malic enzyme in alcohol sensitivity (Figure 6). Differences in Malic enzyme activity were transient and no longer apparent after a second exposure to ethanol. We also found a significant correlation between variation in transcript level of flies that had not been exposed to ethanol and Malic enzyme activity measured after a single exposure to ethanol during induction of tolerance (r = 0.49, P = 0.0004)., This correlation was no longer evident when Malic enzyme activity was measured after a second exposure to ethanol, after tolerance had been established (r = 0.098; P = 0.50).
Variation in alcohol drinking behavior in the Framingham Heart Study cohort:
Individual drinking behavior was assessed over a period of 25 years (Kannel et al. 1979). Each individual was evaluated on average five to six times during the course of the study. Analysis of variance revealed a strong effect of sex for all types of alcoholic drinks, with men overall drinking more beer than women (P < 0.0001). There was also a strong effect of age (P < 0.0001), with individuals drinking less alcohol as they age. We did not find a correlation between beer, wine, or cocktail drinking (Figure S3). A more extensive description of the drinking behavior within this population has been published previously (Djousse et al. 2002).
Associations of polymorphisms in the Malic Enzyme 1 (ME1) gene with cocktail drinking:
We used repeated measures ANOVA to assess the association between drinking behavior and each of the 21 SNPs genotyped for ME1. Since drinking behavior is uncorrelated between beer, wine, and cocktails (Figure S3), we performed the association tests separately for each type of drink. We found seven SNPs significantly associated with cocktail drinking (Figure 7). Six SNPs (rs1180242, rs1144188, rs1145913, rs1144185, rs1144184, rs6941094) survived a permutation threshold of 0.05 after correction for multiple testing, and one SNP (rs1145916) was significant at a permutation threshold of 0.01. Although the tagging SNPs within ME1 were not in linkage disequilibrium in the HapMap population, we found significant and often strong pairwise linkage disequilibrium among the 21 SNPs in ME1 genotyped in the Framingham cohort (Figure 7; Table S1). The significant SNPs are located in intron 1, intron 6, and intron 13 of ME1, and are all in strong linkage disequilibrium with each other (Table S1 and Table S9). Although significant, the effects of the SNPs in ME1 on cocktail drinking are small, consistent with the emerging view of the genetic architecture of human complex traits. The effects ranged from 0.35 to 0.64 drinks per day (0.085–0.156 phenotypic standard deviations), and accounted for 0.36–0.9% of the phenotypic variation in cocktail drinking in this population (Table S9).
Studies on model organisms have provided a wealth of insights in universal biological processes; yet, the advantage model organisms offer for translational studies that can lead to discovery of genes that harbor risk alleles for human disorders remains remarkably underexplored. The World Health Organization (2004) has estimated that alcohol abuse disorders affect ∼76.3 million people worldwide. Despite the identification of some candidate genes (Radel and Goldman 2001; Mulligan et al. 2003; Sinha et al. 2003; Flatscher-Bader et al. 2005; Goldman et al. 2005; Radel et al. 2005; Edenberg et al. 2006; Birley et al. 2009), the genetic underpinnings of sensitivity to alcohol remain largely unknown. Here, we demonstrate that findings from studies on alcohol sensitivity in Drosophila can be applied directly to the identification of genetic factors associated with human alcohol intake.
Systems genetics in Drosophila: coregulated transcriptional networks for alcohol sensitivity and tolerance:
For our gene discovery approach to be successful, it was essential first to characterize the genetic underpinnings of alcohol sensitivity and tolerance in Drosophila. We capitalized on natural variation among 40 wild-derived inbred lines to identify correlated transcriptional networks associated with response to ethanol exposure. It should be noted that genes that contribute to the response to ethanol exposure, but do not vary in transcript abundance, would not be included in these networks. Transcripts that are present at levels below the detection limit of the Affymetrix microarrays would also go undetected. Furthermore, the exact composition of the transcriptional networks may differ for different populations, and genotype-by-environment interactions may cause shifts in the transcriptional modules under different environmental conditions (Fay et al. 2004; Landry et al. 2006; Sambandan et al. 2006; Monroy et al. 2007). Nonetheless, it is clear that the transcriptional network associated with acute exposure to ethanol shifts dramatically as flies develop tolerance. This is in agreement with a previous study, which showed that in an isogenic laboratory strain (Canton-S) induction of tolerance is accompanied by rapid downregulation of a suite of chemosensory genes and modulation of expression of detoxification enzymes, followed by altered regulation of genes that encode metabolic enzymes (Morozova et al. 2006). The modules of correlated transcripts that we identified here mimic some aspects of these previous observations. Initial sensitivity to ethanol exposure is associated with correlated transcripts that encode gene products that mediate defense responses to environmental chemicals (e.g., glutathione S transferases), and tolerance involves transcriptional networks associated with intermediary metabolism, including generation of ATP necessary for synaptic exocytosis. Perturbations of the computed modules by introduction of a transposon in a different, Canton-S (B), genetic background results in altered levels of transcripts predicted to covary with the disrupted gene, providing direct validation of the modules and indicating significant robustness of the observed modules associated with response to alcohol exposure across genetic backgrounds.
The modular organization of the transcriptional networks associated with alcohol-related phenotypes can provide information about the possible functions of computationally predicted transcripts in the modules based on the “guilt by association” principle and, thus, can serve as an efficient method for gene annotation. If modules associated with alcohol-related phenotypes in Drosophila contain transcripts with human orthologs associated with alcohol-related phenotypes in people, it is reasonable to predict that the latter are likely to preserve a similar modular connectivity. One can then nominate genes of which a functional context is already known as candidates for association studies in human populations. This strategy avoids the large multiple testing penalty inherent in unbiased large scale GWAS and confers power to detect significant associations in a moderate size population. We have used the gene that encodes the cytosolic form of human Malic enzyme to demonstrate proof of principle for this approach.
Identification of human orthologs as candidate genes for drinking behavior: the ME1 gene as proof of principle:
Alcohol intake in the Framingham population was assessed as three different behavioral phenotypes: beer drinking, wine drinking, and cocktail drinking. It should be noted that the vast majority of individuals in this cohort were light to moderate drinkers, with a smaller number of heavy drinkers (Figure S3). Consequently, we do not make inferences about alcohol addiction or alcohol abuse, but instead focus on normal tendencies toward alcohol consumption. Drinking behaviors show sexual dimorphism with men being more inclined to consuming beer. Previous studies have identified association of Alcohol dehydrogenase (Adh) alleles with alcohol intake (Ehrig et al. 1990; Agarwal and Goedde 1992; Whitfield 1994; Edenberg et al. 2006; Birley et al. 2009). Different Adh alleles differ in enzymatic effectiveness and are correlated with the extent of tolerance to alcohol consumption. High tolerance provides increased risk for addiction. Asian populations have a generally lower risk for developing alcohol dependence than European populations due to differences in frequencies of alleles for Adh and Aldehyde dehydrogenase, designated ADH1B and ALDH2, respectively (Mulligan et al. 2003; Edenberg et al. 2006; Birley et al. 2009). However, Adh is only one of many likely factors that contribute to facilitating alcohol consumption.
In populations where there is no variation for the Adh alleles, other factors must contribute to phenotypic variation in drinking behavior. On the basis of the present study and previous studies in Drosophila (Morozova et al. 2006, 2007), Malic enzyme emerges as a significant facilitator of alcohol intake as it provides a critical metabolic link between the glycolytic pathway, the tricarboxylic acid cycle, and fatty acid biosynthesis, which can shunt alcohol-derived excess energy toward lipid biosynthesis. Indeed, we found significant associations between cocktail drinking and polymorphisms in the ME1 gene, which encodes cytosolic Malic enzyme. The alcohol content in cocktails is greater than in beer or wine, which may account for our ability to detect associations only with cocktail drinking, but not beer and wine drinking, given the sample size.
It is of interest to note that SNPs associated with cocktail drinking are intronic and, thus, do not change the amino acid sequence of the protein, but might affect enzyme level. Due to the linkage disequilibrium structure in the human genome (Gabriel et al. 2002; Frazer et al. 2007) we cannot determine unambiguously which polymorphism is causal. Moreover, historical recombination within the ME1 gene differs between the Framingham study cohort and the HapMap CEPH population. Thus, it is possible that SNPs in ME1 in different populations may contribute to phenotypic variation in drinking behavior to different extents, and in different populations different SNPs within ME1 might be associated with alcohol intake.
The lack of power of many human genetics studies in the past has often led to poorly supported results and, consequently, replication in independent study populations has been widely adopted as a general standard for reliability. While this ensures that genes with risk alleles with consistent effects can be identified with confidence, it discards risk alleles with effects that are sensitive to genetic background and genotype-by-environment interactions in specific populations, even when such studies are executed with good statistical power. Here we show that translational studies that exploit model organisms can provide a viable alternative to replicate human populations and allow the detection of risk alleles with small effects. Men has been associated with alcohol resistance in both flies (Morozova et al. 2006) and mice (Yin et al. 2007), and the current study shows that translational approaches from findings in model organisms can be extended to human candidate genes. Perhaps the most convincing demonstration of the power of the translational approach advocated in this article is the estimate of the number of individuals that would need to be genotyped for 500,000 SNPs in a GWAS aimed at identifying alleles associated with drinking behavior in people to detect the same size of effect resolved in our study. We would need 903 individuals in the rarer homozygous SNP genotype class to detect our largest effect (0.311 phenotypic standard deviations) following a Bonferroni correction for multiple tests (for a nominal P-value of 10−7) (Falconer and Mackay 1996). Since the frequency of the rarer homozygote genotype is ∼0.118, a total of 7653 individuals would need to be both genotyped and phenotyped to detect an effect of this magnitude.
This work was supported by grants AA016569 (T.F.C.M.), GM45146 (T.F.C.M.), GM59469 (R.R.H.A.), and AA013304 (R.C.E.) from the National Institutes of Health.
Supporting information is available online at http://www.genetics.org/cgi/content/full/genetics.109.107490/DC1.
↵1 These authors contributed equally to this work.
Communicating editor: N. Perrimon
- Received July 17, 2009.
- Accepted July 30, 2009.
- Copyright © 2009 by the Genetics Society of America