The Utility of Comparative Genetics to Inform Breast Cancer Prevention Strategies
Michael N. Gould


My research seeks to aid in developing approaches to prevent breast cancer. This research evolved from our early empirical studies for discovering natural compounds with anticancer activities, coupled with clinical evaluation to a genetics-driven approach to prevention. This centers on the use of comparative genomics to discover risk-modifying alleles that could help define population and individual risk and also serve as potential prevention drugable targets to mitigate risk. Here, we initially fine map mammary cancer loci in a rat carcinogenesis model and then evaluate their human homologs in breast cancer case-control association studies. This approach has yielded promising results, including the finding that the compound rat QTL Mcs5a's human homologous region was associated with breast cancer risk. These and related findings have the potential to yield advancements both in translation-prevention research and in basic molecular genetics.

Anecdotal, Historical and Critical Commentaries on Genetics

WRITING this Perspectives for Genetics allows me to examine how a cancer biologist focused on cancer prevention morphed into a practicing geneticist. In addition, it allows me to review a decade of our investigations into the complexity of the genetic risk to breast cancer development using comparative genomics.

Our comparative genomic strategy consists of genetically identifying mammary cancer risk loci using fine mapping studies in a rat mammary carcinogenesis model. Human homologs of these loci are then evaluated in human breast cancer association studies for their potential to modify risk. This genetics approach provides an integrated discovery platform to identify and mechanistically characterize novel breast cancer risk alleles. We predict that this platform will serve as a foundation for a cancer prevention drug development pipeline.

My early work focused on the etiology and prevention of breast cancer. It is work on these interrelated areas that led me to investigate breast cancer genetics. While studying the etiology of breast cancer after joining the faculty of the University of Wisconsin, my interest targeted early events in the etiology of cancer. These range from altering the metabolic activation of environmental xenobiotics to metabolites capable of adducting DNA to destroying clones premalignant cells. At the time we began work in this area, cancer chemoprevention was an emerging field that was assumed to be less complex than cancer therapy. This was, in part, based on the fact that normal and premalignant tissues were genetically more stable than cancer cells and thus less likely to develop resistance to anticancer drugs.

Our chemoprevention studies focused on a novel class of nontoxic monoterpenes widely found in the essential oils of fruits. These compounds were found to have both preventive and therapeutic anticancer activities in being able to inhibit both premalignant and malignant cells. Our lead compound was limonene, found in orange peel oil, and the first monoterpene we entered into FDA-approved clinical trials was perillyl alcohol (POH), found originally in lavender oil. For expediency, our first trial was a therapeutic one. This therapeutic phase I trial showed limited promising results (Ripple et al. 2000). We later discovered that POH inhibited the antiapoptotic ability of cancers via a calcium channel interaction that led to the downregulation of NFκB (Berchtold et al. 2005). This mechanism of action could underlie the cytostatic and cytotoxic actions of POH toward both premalignant and malignant cells.

The monoterpenes and POH were found through empirical screening. Like the monoterpenes, many chemopreventive and therapeutic agents are found to be of low overall efficacy. Many also have undesirable toxicity, in part due to the lack of target specificity. As such, we felt the need to develop nonempirical methods to develop prevention strategies and drugs.

To develop chemopreventative agents for common diseases, we sought an approach that would identify both appropriate drug targets and high-risk populations. For example, we aimed to develop prevention strategies for the large number of individuals at risk for breast cancer but not those who specifically carried the rare but highly penetrant susceptibility alleles of the breast cancer genes such as Brca1 and -2; these and other highly penetrant breast cancer risk alleles collectively account for <25% of inherited breast cancer risk in humans (Pharoah et al. 2008).

We thus sought to identify moderately penetrant breast cancer susceptibility alleles that were common (high population frequency). Ten years ago it was difficult to identify such loci directly in human populations. In fact, most association studies at that time were based on a “candidate gene” approach; these studies were rarely successful (Pharoah et al. 2007). We thus adapted a comparative genomics strategy in which such loci are identified in a model organism using a nonbiased linkage approach and then evaluated in humans. We chose what we believe is the in vivo breast cancer model most closely related to the human—the rat.

The rat, in contrast to the mouse and like the human, develops a spectrum of hormonally responsive and nonresponsive breast cancers. Importantly, almost all rat and human cancers have a ductal cell origin (Gould 1995). At the time we began this research, however, the rat had far fewer genetic resources and tools than the mouse (Gould 1995). This can be illustrated by our need to use a M13 minisatellite marker to identify our first rat mammary susceptibility QTL (Hsu et al. 1994). Over the course of this research and subsequent studies, rat geneticists have substantially narrowed this technology gap (see Aitman et al. 2008). For example, in pursuing this project we developed a technology that produced the first gene inactivation (“knockout”) rat models (Zan et al. 2003).

Comparative genetics studies:

The first major results of these genomewide comparative studies were published by Shepel et al. (1998) in Genetics. In this study we crossed two rat strains with large differences in their susceptibility to the induction of mammary carcinomas by the chemical carcinogen dimethylbenzanthracene (DMBA). The susceptible strain was the Wistar-Furth (WF) rat, while the resistance strain was the Copenhagen (COP) rat. F1 hybrid rats were backcrossed (WF × COP) F1 × WF or intercrossed (F1 × F1). Large groups of these rats were orally gavaged with DMBA, and the average number of mammary carcinomas per rat was quantified at necropsy. Rats were also genotyped using microsatellite markers, which had become available for the rat in the 1990s.

The QTL genetically identified in this study accounted for most of the genetic variance controlling susceptibility to mammary cancer by identifying the Mammary carcinoma susceptibility (Mcs) loci—Mcs1, -2, -3, and -4. The COP allele of Mcs1, -2, and -3 conferred resistance while Mcs4 conferred an increased susceptibility to mammary cancer development. This study demonstrated the ability to use the rat model to identify the major COP vs. WF polymorphic loci controlling susceptibility. These loci interacted in an additive manner. Interestingly, the almost completely mammary cancer-resistant COP rat strain was shown to carry a polymorphic allele at the Mcs4 locus predicted to increase mammary cancer risk.

In extending this study, we asked whether other mammary cancer-resistant strains varied at polymorphic mammary cancer susceptibility loci shared with those genetically identified in the WF × COP cross. A similar analysis was performed by conducting a QTL analysis of a cross between WF and a second resistant strain Wistar-Kyoto (WKy). In this backcross analysis we genetically identified four loci that accounted for most of the genetic variance associated with the susceptibility phenotype. As with the COP cross, the WKy cross identified three loci in which the WKy allele contributed to resistance and one locus at which the WKy allele contributed to increased susceptibility (Lan et al. 2001). Of these four WKy loci, only one broadly overlapped with those identified in the COP × WF cross, i.e., Mcs2 (COP) with Mcs6 (WKy). This study also used a novel statistical approach developed by our statistical collaborator, Christina Kendziorski, to identify alleles with no main effect that modify QTL with main effects. Mcs-modifier 1 (Mcsm1) was the most strongly supported locus of this class. The WKy allele of this locus fully negated the effects of the resistance conferred by the WKy allele of the Mcs8 QTL. Thus it appeared that there could be a large number of polymorphic loci in rats that could contribute to mammary cancer risk.

It is important to keep in mind that genetically identified QTL are the product of statistical modeling and analysis of segregating populations from crosses. It is thus critical that their existence be confirmed in more homogeneous genetic material. An established method for QTL validation is to breed and phenotype congenic animals carrying only the region surrounding the QTL allele of interest on an alternative genetic background. So far we have generated and characterized six of the eight candidate WKy and COP QTL by genetically introgressing them onto the WF background. All six have the phenotype predicted by our quantitative models.

Most congenic substitutions include tens of megabases encompassing the introgressed allele. The next step is to fine map this congenic interval to first determine whether this interval harbors more than one independent susceptibility locus. In addition, the fine mapping process allows for an increased genomic resolution of the locus and thus a more limited set of candidates. We have fine mapped two Mcs loci–Mcs1 (COP) and Mcs5 (WKy). Each was found to be complex, containing at least three separable subloci termed Mcs1a, -b, -c and Mcs5a, -b, -c. In the case of Mcs1, all three identified loci within it contributed to the cancer resistance phenotype of Mcs1. This led us to speculate that this apparent clustering might be biologically “random”; their strong-combined phenotype allowed us to readily identify Mcs1 over the experimental background. In contrast, the Mcs5 also had at least three subloci, Mcs5a, -b, and -c, but two of these, a and c, contribute to resistance while b confers an increased sensitivity. Each of the three had similar absolute relative risk (RR) contributions. If they interact in a purely additive manner, it might have been difficult to identify Mcs5. However, Mcs5 had the strongest of LOD scores of any identified locus in the WKy cross (Lan et al. 2001). When we explored the interaction of the alleles at the Mcs5 loci, we found complex epistatic interactions. The strongest was the complete neutralization of the effect of the sensitive WKy allele of Mcs5b by the resistant WKy allele of Mcs5a (Samuelson et al. 2005).

It is interesting to explore an alternative hypothesis that suggests that the clustering of mammary cancer susceptibility alleles arise from evolutionary selection. Data supporting such a possibility in rodents has been published by Petkov et al. (2005). Their findings suggest that alleles controlling certain phenotypes cluster to assure joint inheritance, in that in concert with one another, they provide for an enhanced survival advantage. This could account for the clustering of risk-related genes at the Mcs1 and Mcs5 QTL.

Many of the most comprehensive published mammalian fine-mapping studies achieve mapping resolutions in the order of several megabases. Such intervals, while carrying a limited number of genes, often require choosing one or more candidate genes for intensive study. These are usually chosen on the basis of how they might functionally relate to the specific disease risk under investigation. This negates the potential of positional cloning to identify an unbiased candidate. As mentioned above, experience suggests that functional candidate selection rarely identifies disease-specific modifier genes. For example, in breast cancer, when 120 such published candidates (710 SNPs) were rigorously evaluated, none met minimal statistical significance in a study of a large population of women in a breast cancer case-control study (Pharoah et al. 2007).

We explored the ability of ultrafine mapping to annotate the Mcs5a locus. We mapped this locus to >100-kb resolution by phenotyping congenic rats recombinant within this locus. We found it to contain two elements. The WKy allele of each element by itself failed to elicit a mammary cancer phenotype; however, when combined, the resistance phenotype was obvious. These elements, termed Mcs5a1 and Mcs5a2, synthetically interact, making Mcs5a one of the first-identified compound QTL in mammals. Because Mcs5a acts in a semidominant manner, we could use heterozygous congenic recombinants to ask whether both elements of Mcs5a needed to lie in cis on the same chromosome, or could they interact in trans from separate homologs. They interact only in cis (Samuelson et al. 2007). Another interesting observation arising from the fine mapping of Mcs5a is that it localizes to noncoding DNA. All four Mcs loci that we have fine mapped to high resolution are localized to noncoding DNA (in progress).

The observations that the rat compound locus Mcs5a consists of two synthetically interacting elements separated by ∼50-60 kb (based on the human sequence), interact only when on the same chromosome, and are noncoding suggest the hypothesis that they may be localized in closer proximity than suggested by the linear genomic distance that separates them. Recent observations in our laboratory using chromosome confirmation capture suggest that most of the sequences between these elements form a CTCF-mediated loop bringing both elements in close physical proximity to each other. The ability of this compound locus to control local and interchromosomal gene expression is being studied (in progress).

To determine whether our findings in our rat model could be extended to women, we next asked whether the human ortholog of Mcs5a (-a1 and -a2) could influence breast cancer risk. In contrast to the method of searching for modifier genes using genomewide association studies (GWAS), we restricted our search to an ∼100-kb region of the human genome. Focusing on this orthologous locus defined by comparative genomics vastly reduced the number of SNP-tagged alleles needed for testing for association, greatly reducing the statistical penalty for multiple testing. We tested several SNPs in the orthologous MCS5A1 and -5A2 regions of the human genome in a total of ∼12,000 women in a breast cancer case-control study. We found that a tagged SNP in both MCS5A1 and -5A2 was significantly associated with risk to breast cancer in this population of women. The minor allele of SNP rs56476643 (MCS5A1) acts in a recessive manner to increase risk. Its allele frequency is 25% and it increases risk in homozygous women by 19%. In contrast, the minor allele of MCS5A2 (rs2182317) has an allele frequency of 13% and acts in a dominant manner to reduce by 14% the risk of breast cancer in the 24% of women carrying one or two copies of this allele (Samuelson et al. 2007).

Not only does this human study support the use of comparative genomics to identify human cancer risk modifier alleles, it also extends the resolution obtained in the rat in localizing the two genetic elements of the Mcs5a allele. The rat localizes Mcs5a1 and -a2 to 32 and 84 kb, respectively, while the human studies resolved these determinants to 5.7 kb and 16.8 kb (Samuelson et al. 2007). Thus, we have demonstrated a clear advantage in using comparative genomics to localize target regions within QTL.

Both MCS5A1 and -5A2 have similar allele frequencies and genetic penetrance (relative risk) as do most breast cancer alleles identified by GWAS studies. However, unlike alleles identified by GWAS studies, those identified by comparative genomics also provide in vivo models to functionally characterize risk alleles. For example, it is often assumed that breast cancer modifiers are likely to act within breast tissue to modulate risk. Using the rat as a model we have been able to show that Mcs5a, a noncoding allele, acts to differentially regulate its neighboring FBXO10 gene in immune but not mammary tissues (Samuelson et al. 2007).

It is also intriguing to consider the observation that breast cancer risk-associated alleles such as Mcs5a1 and -5a2 are either conserved over millions of evolutionary years or are highly mutable and functionally neutral, suggesting that these alleles do not significantly reduce fitness. If so, one then speculates that they would make good targets for chemoprevention drugs by possessing low toxicity and as such a good therapeutic index. In particular, converting sensitive to resistant allelic function with drug therapy would mimic the conserved resistance allele that persists in the human population and should therefore show a low side-effects profile.

Our current research on these genetically identified Mcs loci focuses on molecular, cellular, and organismal mechanisms by which they modify risk. Not only will these investigations provide insight into the function of each noncoding Mcs locus, but collectively they will provide a mechanistic framework to facilitate integrative genetic studies of the plethora of polymorphic risk loci identified by GWAS in multiple diseases.


It will continue to be important to integrate GWAS and comparative genomic approaches. Such approaches, combined with detailed and global genomics and proteomic information, will allow for a systematic genetics approach to mechanistically annotate the function of complex disease modifier loci. Such analysis would go beyond risk analysis with the potential to develop drug targets guiding the search for novel agents that may be used for individualized disease prevention.

Due to a nonbiased approach, these comparative genomic studies also have the potential to discover novel functional aspects of the noncoding genome. For example, our work with Mcs5a will address the function of synthetic genetic interactions in mammals. Other interesting Mcs loci range from risk alleles in gene deserts (e.g., Mcs1a) to those that are intergenic and act over a very long distance (e.g., Mcs5c; >300 kb) (in progress). These data (together with the results with GWAS studies of many common diseases) are consistent with the hypothesis that the majority of low-penetrant, high-frequency disease risk modifier alleles are most often noncoding. Many of these may modify gene expression. Testing this hypothesis and explaining its evolutionary origins and significance is an important challenge in this area of genetics research.

In summary, the use of comparative genetics in the search for novel approaches to chemoprevention has been fruitful, suggesting targeting the immune system for breast cancer prevention. Our voyage into the genetic etiology of breast cancer will continue to not only fuel our efforts in breast cancer prevention but also provide us with intriguing inquiries into fundamental molecular genetics.


I thank William Dove for critical discussions. The comparative genetics work in the author's lab is supported by National Institutes of Health grants R01 CA0077494, R01 CA028954, R01 CA123274, and R01 ES017400.


View Abstract