Abstract
In mammals, several classes of monoallelic genes have been identified, including those subject to X-chromosome inactivation (XCI), genomic imprinting, and random monoallelic expression (RMAE). However, the extent to which these epigenetic phenomena are influenced by underlying genetic variation is unknown. Here we perform a systematic classification of allelic imbalance in mouse hybrids derived from reciprocal crosses of divergent strains. We observe that deviation from balanced biallelic expression is common, occurring in ∼20% of the mouse transcriptome in a given tissue. Allelic imbalance attributed to genotypic variation is by far the most prevalent class and typically is tissue-specific. However, some genotype-based imbalance is maintained across tissues and is associated with greater genetic variation, especially in 5′ and 3′ termini of transcripts. We further identify novel random monoallelic and imprinted genes and find that genotype can modify penetrance of parental origin even in the setting of large imprinted regions. Examination of nascent transcripts in single cells from inbred parental strains reveals that genes showing genotype-based imbalance in hybrids can also exhibit monoallelic expression in isogenic backgrounds. This surprising observation may suggest a competition between alleles and/or reflect the combined impact of cis- and trans-acting variation on expression of a given gene. Our findings provide novel insights into gene regulation and may be relevant to human genetic variation and disease.
BIALLELIC gene expression affords diploid organisms a safeguard against single-hit mutations that may prove detrimental in development or cause disease. Exceptions to this rule have been limited to genes involved in parental resource allocation (imprinting) (Bartolomei and Ferguson-Smith 2011; Ferguson-Smith 2011) and developmental processes that necessitate expression of a single allele to achieve either cell heterogeneity [random monoallelic expression (RMAE), e.g., olfactory receptor genes)] (Gimelbrant et al. 2007; Chess 2012; Zwemer et al. 2012) or sex chromosome dosage compensation [X-chromosome inactivation (XCI)] (Wutz and Gribnau 2007; Payer and Lee 2008; Starmer and Magnuson 2009; Disteche 2012; Lee and Bartolomei 2013). Although the notion that genotypic variation can modify phenotypic expression has long been appreciated, the extent to which divergence between parental genomes influences gene expression in offspring has not been rigorously addressed, partly because allelic imbalance due to genetic variation and imprinting cannot be distinguished in human studies (Cheung et al. 2010; Heap et al. 2010; Montgomery et al. 2010; Pickrell et al. 2010; Battle et al. 2014). Inbred mouse lines represent a powerful tool to address this question, given the possibility of reciprocal cross-design and a priori knowledge of genotype. Previous studies in mice have focused largely on RMAE (Zwemer et al. 2012; Eckersley-Maslin et al. 2014; Gendrel et al. 2014), X-linked (Yang et al. 2010; Yildirim et al. 2011; Pinter et al. 2012), and imprinted genes (Babak et al. 2008; Deveale et al. 2012; Prickett and Oakey 2012), specifically excluding allelic imbalance attributed to genotypic variation [henceforth, genotypically imbalanced expression (GTIE)] with some exceptions (Goncalves et al. 2012; Lagarrigue et al. 2013). Interestingly, recent single-cell RNA sequencing (RNA-Seq) studies have suggested stochastic and uncoordinated expression of single alleles (Deng et al. 2014; Marinov et al. 2014; Borel et al. 2015), as suggested previously (Raj et al. 2006; Raj and van Oudenaarden 2008; Dar et al. 2012), though high levels of technical noise precluded conclusive quantitative analysis of this phenomenon (Deng et al. 2014).
Here we analyze transcriptomes of hybrid mice mated in reciprocal crosses to measure and classify allelic imbalance due to genotype, imprinting, or RMAE, including attenuated random imbalance. We find that genotype-based imbalance is (1) more prevalent than imprinting or RMAE, (2) generally tissue-specific but typically consistent in allelic preference when present in two tissues, and (3) associated with elevated genetic variation in proximal noncoding sequences. Surprisingly, GTIE genes often show monoallelic expression in inbred parental lines, potentially revealing a predisposition for imbalanced expression in outbred hybrids. Because hybrid genomes share one nucleoplasm and hence their trans-acting factors, differences in allelic representation linked to genotype can be attributed to variants acting in cis. Our data demonstrate that a large number of genes contain cis-acting variants that skew their expression in favor of one allele, an observation that may prove relevant to the study of human genetic variation and disease.
Materials and Methods
Mouse crosses and RNA-Seq
Parental and F1 hybrid tail-tip fibroblasts (TTFs) of Mus musculus castaneus and M.m. musculus origins were prepared from 2- to 5-day-old pups or obtained from Payer et al. (2013). TTFs and mouse embryonic fibroblasts (MEFs) from similar reciprocal crosses were immortalized by SV-40 T-antigen (Brown et al. 1986) and either subcloned by limiting dilution or used as a heterogeneous population. The sex and strain of each parent, as well as the sex of the offspring from which cells were isolated, were varied to obtain each of the possible combinations listed in Figure 1E. Total RNA from TTFs was harvested using the mirVana RNA Isolation Kit (Ambion, Austin, TX), followed by polyA selection of messenger RNA (mRNA) and directional RNA-Seq library generation on the Apollo 324 System (IntegenX, Pleasanton, CA) at the Biopolymers Facility (Harvard Medical School, Boston, MA). Parental and F1 hybrid RNA libraries (Supporting Information, Table S1) were sequenced to a depth of typically ∼60 million paired-end 50-nt reads per sample. Read pairs were aligned to both parental genomes, allowing for a maximum of three mismatched or gapped bases per read (Tophat v2.0.8), followed by removal of PCR duplicates and multiply mapping reads. Read pairs considered allele-specific aligned (1) to only one parental genome, (2) with both ends to one but only a single end to the other parental genome, or (3) with fewer mismatches/insertions/deletions to one parental genome than to the other. Allele-specific pairs and pairs aligning equally well to both genomes were used for transcript assembly and quantitation (Cufflinks v2.1.1) using Ensembl Release 66.
One-fifth of autosomal genes are preferentially expressed from one allele. (A and B) Distribution of median skew values R from −1 to +1, or 100–0% CAST/EiJ origin (GT1) [where read ratio R = (genotype 1 − genotype 2)/(genotype 1 + genotype 2)] for GTIE (red, blue) and imprinted genes (open symbols, magenta, cyan) in forward and reverse crosses of liver (A) and TTF (B) samples. (C and D) Counts of balanced (BAL) and skewed autosomal genes shaded by fold difference between alleles for liver (C) and TTF (D) samples. Imprinted (MAT/PAT) and GTIE (GT1/GT2) categories are capitalized (P < 0.05, two-tailed Barnard’s exact test) or lowercased (0.05 < P < 0.07), and randomly skewed (RND). Expressed X, Y, and mitochondrial genes (C and D: 208 and 340, respectively), genes skewed in only a single cross (C and D: 38 and 434), and genes not expressed or lacking variant information (C and D: 24,255 and 23,475) were excluded; 34,110 genes in total (Ensembl release 66). (E) Relative allelic expression levels (from 0 to 100% GT1, left axis, or GT2, right axis) based on allele-specific qRT-PCR for the indicated genes (Capn5, Tspan15, and Ano5: GT2-skewed; Cxcl5 and Ica1l: GT1-skewed; Peg3 and Zim1: maternally, paternally imprinted; Trim25: random; Fli, Frk, and Sdha: balanced). Primary and transformed populations and three independent clones for both sexes in forward (Fx) and reverse (Rx) crosses are as indicated in the legends.
Classification of allelic imbalance
Exonic variants for each annotated transcript were queried for allele-specific counts and aggregated across the transcript as in Deveale et al. (2012). To systematically identify genes preferentially expressed from one allele over the other, as well as genes exhibiting balanced expression within a given sample, a number of qualifying criteria were applied to each gene (File S1). Transcripts lacking sufficient expression and assembly support and showing only spurious allelic coverage were excluded. For each assessable gene, the skew (R) and the cumulative probability (P) under a binomial distribution were calculated either under the assumption of equal probability for mapping to either allele or, where sufficient parental data were available, by empirically estimating technical bias for each gene and accounting for this bias by adjusting the null probability. A number of genes (486 in liver samples, 1008 in TTFs, mostly pseudo- or predicted genes) skewing significantly in the direction of the nonrepresented allele in the parental controls were excluded from further analysis.
To estimate the likelihood of detecting significantly skewed genes (P < 0.05, cumulative binomial probability) reproducibly across a number of samples, each allele-specific read was first randomly and repeatedly redistributed between the two alleles for each data set to build a null model based on actual sample sizes (number of allelic reads for each gene in each experiment). Genes skewing significantly more often across experiments than expected under the null model then were assessed in 2 × 2 contingency tables (forward/reverse crosses vs. genotypes) to determine whether allelic skew associates with parental or genetic origin. Genes associating significantly (two-tailed P < 0.05, Barnard’s exact test) and unanimously with parental or genetic origin were classified as “imprinted” or showing “genotypically imbalanced expression” (GTIE), respectively. In light of the low number of samples for TTFs, genes with borderline significance in this test (0.05 < P < 0.07) were classified as “potentially” imprinted or “GTIE” and classified as separate categories indicated with lowercase letters (e.g., mat, pat, gt1, and gt2). Genes skewing significantly more often than expected under the null model yet lacking significant association with either parental or genetic origin (two-tailed P > 0.07, Barnard’s exact test) and skewing toward each allele at least once across experiments were classified as “randomly” skewed. A small number of genes lacking significant association because they were detected in only a small number of samples yet skewing only toward one allele were classified as “undetermined” (38 in liver samples, 434 in TTFs). To provide estimates for false discovery rates (FDRs) in each category (balanced, imprinted, random, and GTIE), one randomized data set was carried through the classification (FDR < 0.02 for all the categories described earlier in both liver and TTF samples). A full list of genes showing allelic imbalance in liver and TTFs can be found in File S2 and File S3, respectively.
Feature and enrichment analyses of imbalanced genes
Imprinted and randomly skewed genes were compared to collections of known imprinted (mousebook.org) and recently identified random monoallelic genes (Zwemer et al. 2012; Eckersley-Maslin et al. 2014; Gendrel et al. 2014). Significance of enrichment in the corresponding categories was determined under a hypergeometric distribution using the union of balanced, imprinted, random, and GTIE genes as a background set. Equivalent testing was performed to assess the significance of overlap with specific Ensembl gene categories (biotypes), ribosomal protein genes, and category identity between liver and TTF tissues. Similarly, gene ontology (GO) annotations were queried using the same background set of expressed and assessable genes, and GO-term enrichment P-values were determined by hypergeometric test (adjusted P < 0.05). For comparison of quantitative features [fragments per kilobase per million (FPKM), allelic read support, gene and exon length, and density of sequence variation over exonic, intronic, upstream, and downstream intervals] between categories (balanced, imprinted, random, and GTIE), the Mann-Whitney U-test was used, and significance values were denoted as described in the figure legends.
RNA-DNA FISH
TTFs from CAST/EiJ, 129S1/SvImJ, or F1 hybrid mice were cytospun onto glass slides, extracted with CSKT buffer (100 mM NaCl, 300 mM sucrose, 10 mM PIPES, 3 mM MgCl2, and 0.5% Triton X-100, pH 6.8) and fixed in 4% paraformaldehyde. DNA probes were generated from fosmid or BAC clones (Apbb1ip: WIBR1-0890H01; Pon3: WIBR1-1690D10; Ywhag: WIBR1-2316B24; Spred2: RP24-294M15; Egfr: RP23-263C13; Grb10: RP24-345M19; Children’s Hospital Oakland Research Institute, Oakland, CA) and labeled by nick translation using Cy3-conjugated dUTP. For detection of RNA, ∼50 ng of probe and ∼1 μg of mouse Cot-1 DNA in 10% dextran sulfate, 50% formamide, and 2× SSC (300 mM NaCl and 30 mM sodium citrate, pH 7.4) were denatured at 95° prior to application to slides and hybridization overnight at 42°. Slides then were washed with 50% formamide and 2× SSC and mounted, and images were captured on a Nikon Eclipse 90i microscope workstation with Volocity software (Improvision, Coventry, England). For detection of DNA, slides were pretreated with RNase A (400 μg/mL in PBS) at 37° for 40 min and denatured in 70% formamide and 2× SSC at 80° for 10 min. Methods and probe design for the allele-specific DNA FISH experiments using Oligopaint FISH probe sets are described elsewhere (Beliveau et al. 2015).
Allele-specific qualitative reverse transcriptase polymerase chain reaction (qRT-PCR)
RNA was isolated from cells using TRIzol Reagent (Invitrogen, Carlsbad, CA), from which cDNA was generated with SuperScript III Reverse Transcriptase (Invitrogen) and Oligo(dT)15 Primer (Promega, Madison, WI). For qRT-PCR, 250 nM each of universal and either GT1 (CAST/EiJ)– or GT2 (C57BL6/J or 129S1/SvJm)–specific primer and SYBR Green Supermix (Bio-Rad, Hercules, CA) were used. The allele-specific primers were designed to target genetic variants according to the method of TaqMAMA (Glaab and Skopek 1999) or further optimized for allelic discrimination as described previously (Li et al. 2004). Expression levels were compared against the opposite allele using the formula fold difference = 2^Ct(GT2 − GT1) and corrected for primer bias and differential primer efficiencies by comparison with amplification of pure parental or hybrid genomic DNA (see File S10 for primer information).
Results and Discussion
To investigate how genotypic variation may contribute to gene expression on a transcriptome-wide scale, we analyzed RNA-Seq data from primary and clonal populations of cells representing developmentally distinct tissues—specifically, liver cells (Goncalves et al. 2012) and TTFs. Using parental and hybrid mice from reciprocal crosses, we categorized genes by frequency and direction of allelic imbalance and validated our findings using orthogonal methods. We began by designing a statistical framework to take advantage of the count-based nature of sequencing data while controlling for overdispersion based on empirical estimates derived from inbred parental samples (Figure S1, A–C). Imbalanced gene calls were based on aggregate read counts across each transcript as in Deveale et al. (2012) and, for most genes, represented at least two exons and over five variants, with excellent agreement between individual variants (Figure S1, D–I). To classify allelic imbalance, we tested whether significant overrepresentation of one allele (P < 0.05, cumulative binomial) was detected reproducibly across samples. We estimated the significance of reproducing allelic imbalance against multiply permuted data sets generated by randomly distributing reads between two alleles. Association of allelic imbalance with genetic or parental origin was assessed using 2 × 2 contingency tables (Barnard’s exact test). Reproducibly imbalanced genes lacking association were classified as random, provided that each allele was the preferred allele at least once. Not all randomly imbalanced genes represent instances of RMAE, however, because this distinction would have required applying an arbitrary cutoff to define RMAE. Instead, we report all randomly imbalanced genes irrespective of the magnitude of the imbalance, as we do for GTIE and imprinted genes. Our approach detected allelic imbalance across a wide range of expression and treated each sample in each cross independently to conservatively estimate the reproducibility and direction of allelic imbalance (Figure S2, A and B).
We observed that GTIE was more frequent than imprinting in both liver and TTFs (ascending vs. descending diagonal, respectively) (Figure 1, A and B), with allelic skew [read ratio ] stretching across the full range from 100% CAST/EiJ [genotype 1 (GT1), in blue, R = −1] to 0% [genotype 2 (GT2), in red, R = 1]. Because of the lower number of F1 hybrid samples in TTFs, we applied a relaxed cutoff (P < 0.07, two-tailed Barnard’s exact test) to detect additional candidate imprinted and GTIE genes (denoted by lowercase labels in Figure 1, C and D). Overall, ∼20% of genes appeared to be imbalanced, almost half of which showed greater than twofold difference between the two alleles (Figure 1, C and D). Approximately 5% of all expressed genes were candidates for monoallelic expression (greater than threefold difference between alleles) in both liver (488) and TTFs (458). Overall, GTIE was associated with higher, not lower, expression than was seen with balanced genes (Figure S3A), indicating that the detected allelic imbalance is not restricted to lowly expressed genes. In addition, we excluded genes that were more likely to be subject to technical noise because of very low read numbers (see Supporting Materials and Methods, File S1). Although median expression decreased somewhat with increasing magnitude of imbalance (Figure S3B), the range of GTIE expression levels still overlapped the range for balanced genes.
Because RMAE is known to be a clonal event, detection of randomly imbalanced genes not only in clonal TTFs but also in primary liver cells was unexpected. However, allelic read support for randomly imbalanced genes in liver samples, but not TTF clones, dropped off with increasing allelic fold difference compared to GTIE genes (Figure S3C). In addition, randomly imbalanced genes in liver were detected in a lower fraction of samples and were depleted in monoallelic genes (greater than threefold difference) compared to TTFs (Figure S3D). These observations are consistent with the possibility that biopsies from solid tissue can capture spatially linked subsets of cells that preserve some clonal mosaicism present in the animal. Interestingly, randomly imbalanced genes in TTFs were lower in length-normalized expression (FPKM) than balanced and GTIE genes (Figure S3, A and B). This difference could be attributed entirely to their significantly longer transcripts and gene bodies (Figure S3E), an observation that may prove useful in elucidating the mechanisms of RMAE.
To verify imbalanced expression of gene candidates, we performed allele-specific qRT-PCR using complementary DNA (cDNA) reverse-transcribed from RNA of hybrid clonal cell lines representing forward and reverse mouse crosses (Figure 1E). In addition, we tested primary and immortalized cells of males and females to rule out potential transformation artifacts and sex-specific phenomena, respectively. GTIE genes (Capn5, Tspan15, and Ano5: GT2-skewed; Cxcl5 and Ica1l: GT1-skewed) showed significant (greater than twofold) skewing in the direction observed in RNA-Seq across the majority of samples, whereas imprinted genes (Peg3: paternally expressed; Zim1: maternally expressed) showed appropriate parent-specific expression patterns, and randomly skewed (Trim25) and balanced (Fli1, Frk, and Sdha) control genes were centered around 50:50 expression. Importantly, skewing of imbalanced genes was observed repeatedly, even in heterogeneous primary and immortalized populations, demonstrating that allelic imbalance cannot be attributed solely to clonal expansion but is reproducible in direction and magnitude of skewing at the population level.
Genes imbalanced owing to parental origin were expectedly enriched in known imprinted genes (P < 6 × 10−17, P < 4 × 10−19, hypergeometric test, liver and TTFs, respectively) (Figure 2, A and B). Hierarchical clustering of imprinted genes by the numerical score we assigned to allelic skew (R from −1 to 1) correctly separated samples by forward and reverse cross and genes by maternal and paternal expression. Surprisingly, expression of some known imprinted genes was associated with genotype rather than parental origin. These included genes that cluster within larger imprinted domains (Ono et al. 2003) but show tissue-specific imprinting, such as Ddc (Menheniott et al. 2008) (next to Grb10 in Figure 2C) in liver and Pon2, Pon3, and Ppp1r9a (Schulz et al. 2006) (next to Sgce and Peg10 in Figure 2D) in TTFs. Moreover, despite expected paternal expression of Peg10 and Sgce in this cluster, recently identified imprinting candidate Casd1 (Babak et al. 2008) showed balanced expression (Figure 2D), and neighboring Asb4 (Mizuno et al. 2002) and Tfpi2 (Monk et al. 2008) were randomly imbalanced rather than imprinted, though intriguingly in anti-correlated fashion with each other (Figure 2B). In sum, these examples support the notion that even within a single imprinted cluster, genotype can modify penetrance of parent-of-origin effects and reveal that several genes imprinted in the placenta (Proudhon and Bourc’his 2010) are subject to GTIE in adult tissues. Among our novel imprinted candidates (e.g., Vcp, Fam46b, and neighboring Trnp1), allelic imbalance was attenuated compared to most known imprinted genes, with the exception of H13 (Figure 2, E, F and G). It is plausible that partial and tissue-specific imprinting patterns are more difficult to detect than complete and widespread parent-of-origin effects, raising the question of how many partially imprinted genes exist but have gone undetected and how they may be regulated. Likewise, our collection of randomly imbalanced genes includes instances of RMAE, raising the question of whether arbitrary numerical cutoffs for defining monoallelic expression should be applied to define the phenomenon of RMAE or whether attenuated allelic imbalance, as observed for many imprinted genes (e.g., H13) (Figure 2F)(Goncalves et al. 2012), should be included, as was done here.
Genotype can affect some genes in imprinted clusters. (A and B) Allelic skew values [blue-red scale from −1 to 1 or 100–0% CAST/EiJ origin (GT1), respectively] of previously known and novel imprinted genes in forward (Fx, CAST/EiJ maternal, M) and reverse crosses (Rx, CAST/EiJ paternal, P) of individual liver (A) and TTF (B) samples. Color-coded panels next to gene names label previously known imprinted genes (www.mousebook.org) as either confirmed herein (dark/light purple) or scoring as GTIE (blue: GT1; red: GT2), randomly skewed (gold), or balanced (gray). Novel imprinted gene candidates (bright green, P < 0.05, two-tailed Barnard’s exact test) and known imprinted genes (P < 0.05 or *, 0.05 < P < 0.07) cluster liver (A) and TTF samples (B) by parental origin (into forward and reverse crosses) and genes by maternal (top) and paternal (bottom) expression. Previously known imprinted genes (6 of 8 and 10 of 31 in liver and TTFs, respectively) are expectedly enriched (hypergeometric test, P < 6 × 10−17 in liver, P < 4 × 10−19 in TTFs). (C and D) GTIE of known imprinted genes in liver (C) and TTFs (D). Allele-specific GT1 (blue) and GT2 (red) coverage in forward (Fx) and reverse (Rx) crosses (M, maternal; P, paternal). Multiple samples combined for each overlay track (+ strand above genes, − strand below). (E–G) Examples of incomplete penetrance of parental origin in imprinted candidates Fam46b and Trnp1 (E) and Vcp (G) compared to incomplete penetrance of known imprinted gene H13 in both liver and TTFs (F).
We next addressed whether imbalanced genes were enriched in specific transcript classes or annotations. RMAE and monoallelic GTIE candidates were modestly enriched in antisense, pseudogene, and long intergenic noncoding RNA (lincRNA) classes, but their numbers were small (<10 for any category). Analysis of GO terms revealed that GTIE and RMAE genes are enriched in distinct cellular component and biological process annotations (Figure 3, A and B, and File S4, File S5, File S6, File S7, File S8, and File S9). In agreement with previous reports (Chess 2012; Gendrel et al. 2014), randomly skewed genes in TTFs, for example, were enriched in cell surface and adhesion/migration categories (Figure 3, A–C), where monoallelic representation may serve to increase cell heterogeneity. Interestingly, GTIE genes in TTFs also were enriched in gland morphogenesis and hormone metabolism terms, which may reflect inherent biological differences between the parental strains used in this experiment (M.m. musculus vs. M.m. castaneus). In addition, we found significant enrichment of ribosomal protein (RP) genes (Figure S4, A and B) among GTIE genes in both liver (P < 5 × 10−3) and TTFs (P < 4 × 10−9, hypergeometric test). This observation is intriguing because of the known monoallelic state of ribosomal RNA genes (Schlesinger et al. 2009) and is reminiscent of nucleolar dominance in plants (Ge et al. 2013). Although we cannot entirely rule out that allelic calls were less reliable in these genes because of the high number of RP pseudogenes in mammalian genomes (Zhang et al. 2002), we see this explanation as less likely given that only uniquely mapping reads were used for both variant mapping and RNA-Seq analyses, and moreover, RP genes were enriched even when considering intronic variants in allelic imbalance calls since RP pseudogenes in rats and mice lack introns (Wool et al. 1995; Zhang et al. 2004). Future study of GTIE in RP genes may yield insight into the elusive mechanism behind their coordinated expression (Hu and Li 2007).
Randomly skewed and GTIE genes are enriched in distinct cellular component and biological process annotations. (A and B) Enriched cellular component (A) GO terms in liver and TTFs and biological processes (B) in TTFs. Bubble color denotes significance of enrichment, bubble size (see side legend) indicates the fraction of genes linked to a given term out of all genes across allelic imbalance classes (GTIE; either GT1 or GT2; RND, random). (C) Examples of enriched biological processes (brown circles, * and **, adjusted P < 0.05 and P < 0.01, hypergeometric test) for randomly skewed and GTIE genes in TTF samples. Median allelic skew for each gene is color-coded (−1, green, to +1, red, for 100–0% CAST/EiJ, respectively).
The genetic variants that give rise to allelic differences in RNA abundance may affect either the act of transcription or the transcript itself and may act in a tissue-specific or tissue-spanning fashion. To determine what fraction of genes comprises each of the latter two categories, we compared allelic calls across tissues in the 7465 genes that were expressed in both liver and TTFs (77.4 and 72.5% of all genes expressed in these samples). We found that 5129 genes (<69%) were balanced in both tissues, 1916 genes (∼26%) were imbalanced in one tissue, and 420 genes (∼6%) were imbalanced in both tissues (Figure 4, A–C). Considering that we find 31% of genes to skew in at least one of two tissues sampled from a single mouse cross (M.m. musculus vs. M.m. castaneus), it is likely that an even larger number of genes will show allelic imbalance across additional tissues and/or genetic backgrounds. Importantly, of genes classified as GTIE in both liver and TTFs, 79% preferred the same allele in both tissues (Figure 4, A and B), switching only occasionally (21% of GTIE genes) (Figure 4, A and B). This enrichment was highly significant (Figure 4C). We investigated whether the primary variants acting on these genes were more likely to be found in close proximity (e.g., promoters, splicing motifs, and untranslated regions) rather than in distal, possibly tissue-specific enhancers. While identification of causal variants is beyond the scope of this analysis, we asked whether imbalanced genes were generally more divergent than balanced genes (Figure 4D and Figure S5, A and B). Indeed, variant density in GTIE genes was significantly higher than in balanced genes (Figure 4D, P-values on plot, and Figure S5, A and B, significance plotted in bottom panels, Mann-Whitney U-test) across coding (exonic) and noncoding sequences, and this difference appeared to be sensitive to distance (least different at ±10 kb). Importantly, the significance of difference in medians between balanced (gray) and GTIE genes (blue, red) was consistently greater in the gene body than in up/downstream regions but significant (P < 0.05, Mann Whitney U-test) in both. Of note, the magnitude of allelic imbalance generally lacked correlation with variant density (Figure S6, A and B), indicating that while a greater variant density may increase detection of allelic imbalance, it does not appear to increase its magnitude. These results suggest that the overall likelihood, not the magnitude, of GTIE for a given gene is proportional to the level of divergence present in its two alleles. How individual variants (coding vs. noncoding, proximal vs. distal) contribute to the observed allelic imbalance, however, will require taking into account additional epigenomic and transcriptomic information. For example, cognate sites for DNA or RNA binding proteins, such as transcription or RNA splicing/processing factors, may be especially helpful in this regard because their occupancy is likely to alter nascent transcription or RNA stability and to be sensitive to single-nucleotide changes.
GTIE genes expressed across tissues frequently maintain direction of skew. (A and B) Comparison of skewed gene categories across tissues. Bar plots, reporting the number of genes in a given category, are arranged around a circle. Bar sizes are proportional to the number of genes in a given category except for balanced genes. Membership in a given category in TTFs can be traced across the circle to the corresponding categories in liver (A) and vice versa (B). (A) GTIE genes in TTFs (GT1/GT2, blue/red in right half-circle) typically recapitulate skew direction in liver (left half-circle) unless expressed in a balanced fashion (BAL, gray) and vice versa. (B) Call in liver genes (left half-circle); in TTFs (right half-circle). Randomly skewed genes (RND, gold) replicate poorly across tissues. (C) Maintenance of direction for GTIE genes across tissues is statistically significant (hypergeometric test, P-values indicated in quilt plot). Colors denote enrichment (red) and depletion (blue) as seen in scale on left. (D) Variant density in skewed genes (gold, blue, red) is significantly higher than in balanced (gray) genes (P-values above, Mann-Whitney U-test) in exonic, intronic, and proximal noncoding sequence (±0.5, 2.5, and 10 kb) for liver (left) and TTFs (right). Median indicated by black horizontal mark; box width based on number of genes in each category. Thinner colored box in each comparison contains only genes skewing more than threefold. Immediate promoter sequence (−0.5 kb) is more divergent in GTIE (GT1 in blue, GT2 in red) and randomly skewed genes (gold). (E) Tissue-spanning allelic imbalance is associated with increased genetic divergence in transcribed sequence. Averaged binned density of genetic variants that distinguish parental genomes is plotted across ±10 kb of sequence centered on either the TSS (left) or TES (right). Variants across balanced genes (gray), GTIE genes in liver, or TTFs (line 1, purple) and GTIE genes in both tissues (line 2, green) were binned at approximately nucleosome resolution (150 bp) and assessed for difference in medians. Top panels show variant density; bottom panels denote significance of difference between these groups (with dotted lines representing P = 0.05, Mann-Whitney U-test). Both lines 1 and 2 are compared to balanced genes and are identified by their respective colors (purple, green) in the bottom panel. Orange lines plot the difference in median variant density comparing GTIE genes in liver or TTFs (1) and GTIE genes maintaining their preferred allele in both tissues (2).
Because most GTIE genes in TTFs are balanced in liver and vice versa (Figure 4, A and B), it is plausible that their allelic imbalance is at least in part determined by variants in tissue-specific enhancers. Conversely, genes that maintain direction of allelic imbalance across tissues may be subject to proximal variants that exert their effect on either cotranscriptional processes or the transcript itself and thus may override effects of distal enhancer-associated variants. We therefore asked how the distribution of proximal variants differed between genes that maintained GTIE across tissues (245, green in top panels of Figure 4E) and genes that showed GTIE in only a single tissue (1623, purple in top panels of Figure 4E) or genes balanced in both tissues (5129, gray in top panels of Figure 4E). Indeed, genes that reproduce the direction of allelic imbalance across tissues were more divergent (Figure 4E, bottom panels, Mann-Whitney U-test) than genes imbalanced in a single tissue (orange lines in bottom panels of Figure 4E) or genes balanced in both tissues (green lines in bottom panels of Figure 4E). However, the significance of the former comparison (orange lines above dotted line in bottom panels of Figure 4E) was restricted to sequence that is part of the transcript proper (i.e., UTRs as opposed to, e.g., the promoter). This observation suggests that variants in transcript termini have a strong influence on allelic imbalance detected across tissues, likely because they become part of the actual transcript and can affect processes downstream of transcription (e.g., splicing, polyadenylation, export, and turnover).
We next asked whether the GTIE observed by RNA-Seq, which provides a population average across millions of cells, is recapitulated at the single-cell level. In principle, overrepresentation of one allele in a population of cells can result from imbalanced expression within each cell or from a disproportionate number of cells each expressing one preferred allele. To parse out these possibilities, we performed sequential nascent RNA-DNA FISH on several candidate genes in hybrid TTF clones (Figure 5A). A known imprinted gene (Grb10) presented with one RNA spot in almost all cells, as expected. Predicted GTIE genes (Apbb1ip, Egfr, and Pon3) presented with one RNA spot in most cells but also exhibited a smaller fraction of cells with two spots. Conversely, predicted balanced genes (Spred2 and Ywhag) presented with two spots in most cells, with Spred2 also exhibiting a smaller fraction with one spot (n > 200). These data, taken together with our allele-specific RNA-Seq and qRT-PCR results, suggest that within a clonal population of cells, GTIE may stem in part from a difference in transcriptional firing rates between the two alleles, with a single allele being preferred in most cells at any given time.
RNA-DNA FISH demonstrates monoallelic expression in individual cells of hybrid and parental origins. (A) Sequential RNA-DNA FISH for Apbb1ip, Egfr, and Pon3 (GTIE); Grb10 (imprinted); and Spred2 and Ywhag (balanced) in clonal TTFs from F1 hybrid animals. Diploid nuclei with RNA signal (n > 200) were scored for each sample, with representative images shown. Inset values list percentage of cells with RNA signal presenting with the number of spots (either one or two) shown in the corresponding image. Adjacent bar graphs list total counts for cells displaying no signal or monoallelic or biallelic expression. All genes presented with two spots in subsequent DNA FISH, verifying probe specificity and 2n ploidy. RNA spots always colocalized with DNA spots. Scale bar, 5 μm. (B) As in A but repeated using allele-specific Oligopaint probe sets for DNA FISH in TTFs from F1 hybrid animals of both forward (Fx) and reverse (Rx) crosses. Adjacent bar graphs now display additional allelic resolution obtained for monoallelic population (GT1, magenta; GT2, green). Inset values list proportion of RNA signal coming from GT1 vs. GT2 alleles. (C) As in A but repeated using clonal TTFs from isogenic parental (GT1, GT2) lines.
To trace the genotypic origin of transcripts at the single-cell level, we repeated the RNA-DNA FISH using a novel allele-specific Oligopaint DNA FISH probe technology (Beliveau et al. 2012, 2015), with probe sets designed to target variants specific to either GT1 or GT2 (Figure 5B). By tiling an ∼2.5-Mb region around the genes of interest with a pair of differentially labeled Oligopaint probe sets, with one being specific for GT1 and the other for GT2, this approach allowed us to discern whether the nascent RNA signal colocalized with the GT1 or GT2 allele. As expected, imprinted gene Grb10 was expressed exclusively from the GT1 allele (pink bar, left panels, in Figure 5B) in almost all cells derived from a forward mouse cross (Fx) but from the GT2 allele (green bar, right panels, in Figure 5B) in cells derived from the reverse cross (Rx). Conversely, GTIE gene Egfr was expressed primarily from the GT1 allele (1.5- to 1.7-fold compared to GT2, in excellent agreement with the RNA-Seq estimate of 1.5- to 1.8-fold) in both crosses. However, Spred2 was expressed equally between both alleles, with expression being biallelic in most of cells, or expressed at equal frequency from either allele in the minority of cells with only one signal. Thus, the combination of RNA FISH and allele-specific DNA FISH examining single-cell dynamics corroborates the conclusions drawn from RNA-Seq analysis of cell populations.
In principle, allelic imbalance due to genotype (GTIE) observed in hybrids could be imagined to take three possible forms in isogenic cell lines of the respective genotypes: (1) biallelic expression in cells of one parental strain and much reduced expression in cells of the other, (2) biallelic expression in cells of both parental strains, or (3) stable or stochastic monoallelic expression in cells of both parental strains. In scenario 1, one of the parental genomes carries a nonfunctional allele, causing hybrids to receive only one functional allele from the other parental genome. In scenario 2, nonequivalence between alleles only manifests in the hybrid (in the presence of divergent variants). In scenario 3, genes subject to RMAE or stochastic monoallelic expression in isogenic cells gain a genotypic preference in hybrids (similar to the Xce effect on XCI choice) (Migeon 1998). As seen in Figure 5C (GT1, GT2), GTIE genes Apbb1ip, Egfr, and Pon3 remained largely monoallelic in isogenic cells of either parental strain, as did imprinted gene Grb10, whereas balanced genes Spred2 and Ywhag remained mostly biallelic. We conclude that, at least for the genes tested here, the observed GTIE in hybrid cells may reflect random or stochastic monoallelic expression in cells derived from isogenic strains. We propose that reported stochastic and uncoordinated firing of single alleles (Raj et al. 2006; Dar et al. 2012) observed in isogenic cells (Figure 5C) may be a prerequisite for the GTIE observed in F1 hybrids (Figure 5A), possibly because the presence of divergent variants in F1 cells may determine which allele fires more frequently. However, it is worth noting that the presence of these variants also appears to exacerbate the apparent “competition” between alleles, as seen by the increase in occurrence of monoallelic cells for each GTIE gene among hybrid vs. isogenic backgrounds (compare cell counts in Figure 5, A and C).
In summary, we have determined the extent of allelic imbalance in hybrids as a function of divergence between parental genomes and have provided a classification of genes imbalanced by genetic variation and epigenetic mechanisms. Although the notion that genetic variation can modify epigenetic phenomena has been appreciated in unique cases (e.g., Xce in XCI) (Rastan and Cattanach 1983), our findings extend this to include instances of genomic imprinting. These findings are mirrored in very recent work (published while this paper was under review): Crowley et al. (2015) present evidence for cis-regulatory variation affecting imprinting patterns, including many cases of incomplete imprinting. Furthermore, their work supports our assertion above that the majority of mammalian genes may exhibit allelic imbalance depending on available cell types and genotypes.
In addition, our results provide the surprising insight that genes subject to genotype-based skewing identified in hybrids can already be imbalanced in isogenic parental cells. This confluence of genetics and epigenetics may reflect a competition between two alleles for limiting trans-acting factors or nuclear niches that results in RMAE or stochastic monoallelic expression in isogenic cells but manifests as a genotype-dependent choice in the presence of divergent alleles (one strong, one weak). Alternatively, these genes may be subject to concurrent cis- and trans-acting regulatory variation. Goncalves et al. (2012) hypothesize that genes that show genotype-based allelic imbalance in the hybrid yet no differential expression between the two parental isogenic lines (as in Figure 5C) reflect an epistatic relationship between cis- and trans-acting regulatory variation that is uncovered only in hybrid cells as a result of sharing of trans-acting factors. In other words, variation that changes the activity or abundance of a given transcription factor may have resulted in selection of compensatory cis-regulatory mutations in its corresponding target genes. In the hybrid, however, both sets of trans-acting factors act on these cis-regulatory elements, leading to loss of cis-driven compensation and emergence of differential allelic expression. These two models are not mutually exclusive, and both remain untested for now.
The relevance of allelic imbalance likely extends to all outbred diploid organisms, including humans, especially in view of loss of heterozygosity and haploinsufficiency (Savova et al. 2013). Although the parental genomes of most humans are less divergent than those of the two mouse lines used in this study, the level of heterozygosity is likely predictive of the fraction of genes showing allelic imbalance, and by extrapolation, these genes may be expected to number in the hundreds. In fact, recent human single-cell analyses revealed ∼6% (35 of 568) of assessable genes to undergo nonstochastic allelic imbalance, and human epigenomes show widespread allelic bias (8–14% of genes in any given H1 embryonic stem cell lineage) (Dixon et al. 2015). Interestingly, RMAE genes in humans have recently been shown to feature a distinctive chromatin signature consistent with allelic differences in transcription (Nag et al. 2013). GTIE and RMAE may play an important role in the penetrance of recessive and dominant traits in development and may contribute to human diseases, for which a large number of genome-wide association studies incriminate noncoding genetic variants (Zhang et al. 2014).
Acknowledgments
We thank A. Saltzman for critical reading of the manuscript, M. Tolstorukov for valuable discussions, and all laboratory members for helpful suggestions. We acknowledge H. Sunwoo for FISH protocols and W. Press for assistance with handling of mice. This work was supported by grants from the German Research Foundation (DFG) and the MGH Fund for Medical Discovery to S.F.P., NIH training grant 2T32GM007226-37A1 to D.C., NIH grant R01-GM61936 and the NIGMS Director’s Pioneer Award to C-t.W., and NIH grant R01-GM090278 to J.T.L. J.T.L. is an investigator at the Howard Hughes Medical Institute. The authors declare no conflicts of interest.
Author contributions: D.C. generated cell lines and performed allele-specific qRT-PCR and FISH experiments. S.F.P. performed bioinformatics and statistical analyses of allele-specific RNA-Seq and DNA-Seq data. R.I.S. performed preliminary bioinformatics analyses of published data sets (Pinter et al. 2012; Yildirim et al. 2011). B.P. and E.Y. contributed cell lines. B.J.B. and C-t.W. contributed allele-specific FISH methodology and probes. S.F.P., D.C., and J.T.L conceived of and designed the study and wrote the manuscript. J.T.L. obtained funding and supervised the research.
Footnotes
Communicating editor: G. A. Churchill
Supporting information is available online at http://www.genetics.org/lookup/suppl/doi:10.1534/genetics.115.176263/-/DC1
Sequencing data have been submitted to the GEO database at NCBI as series GSE58524.
- Received December 8, 2014.
- Accepted March 27, 2015.
- Copyright © 2015 by the Genetics Society of America