Issue 23 : December, 2019 PDF Icon

Data Corner
CTD² DREAM Challenges: Develop Predictive Algorithms to Identify Effective Cancer Treatment Strategies

The CTD² Network in partnership with Sage Bionetworks invites the community to participate in DREAM Challenges to develop predictive bioinformatics methods. The article provides the goals and questions to address for the CTD² Pancancer Drug Activity and CTD² BeatAML DREAM Challenges.

Featured Researchers
Translating Genomics in Cancer: Interview with Dr. Andrew (Andy) Mungall

Dr. Andy Mungall is a genome scientist from British Columbia (BC) Cancer Genome Sciences Centre who is involved with molecular characterization of tumors for CGCI projects. In this interview, Dr. Mungall provides his background and perspectives on cancer genomics research.

CTD² Guest Editorial
Cracking the Cancer Code with Computational Approaches

The Califano lab at Columbia University developed the OncoMatch, OncoTarget, and OncoTreat algorithms to identify effective cancer treatment therapies. The article provides an analogy of orchestra with the cells in our body and how the methods they developed would allow them to listen to the symphony, sound of the conductor....

OCG Program Highlights
OCG-supported Initiatives Provide Valuable Resources to Advance and Accelerate Precision Oncology

NCI’s Office of Cancer Genomics supported initiatives aim to accelerate the translational research efforts towards precision oncology. The article describes the databases and other resources available for the community.

HCMI Program Highlights
HCMI Model-associated Data Available at NCI’s Genomic Data Commons

HCMI’s next-generation cancer models are associated with clinical and molecular data which are stored at NCI’s Genomic Data Commons (GDC). This article explains the types of HCMI data, generated from Cancer Model Development Centers, available at the GDC and a brief guide on how to access the data.

Data Corner
CTD² DREAM Challenges: Develop Predictive Algorithms to Identify Effective Cancer Treatment Strategies

Justin Guinney, Ph.D.
Sage Bionetworks
Justin Guinney, Ph.D.

Crowdsourcing the analysis of highly complex and massive data has emerged as one way to incentivize and match experts from around the world to scientific problems. When crowdsourcing is done in the form of scientific competitions—or Challenges—the validation of the analytical approach is automatically incorporated into the study design. Challenges foster open innovation, creating communities that collaborate directly or indirectly to solve important biomedical problems. The Dialogue on Reverse Engineering and Assessment Methods (DREAM) Challenges are special instances of biomedical Challenges that have spawned a community of solvers committed to advancing important science questions using open and reproducible methods. Since its inception in 2006, DREAM has hosted dozens of Challenges across a wide spectrum of biomedical domains, disease areas, and data modalities that include genomics, imaging, and clinical data1,2.

In the past, DREAM has partnered with NCI’s Cancer Target Discovery and Development (CTD²) Network to host the Gene Essentiality Prediction Challenge. The goal of this challenge was to evaluate and develop computational algorithms that predicted gene dependencies using gene expression and copy number features. This led to benchmarks and insights into factors influencing gene essentiality from functional genetic screens3.

DREAM and CTD² have partnered again to host two new Challenges: the CTD² Pancancer Drug Activity DREAM Challenge and the CTD² BeatAML DREAM Challenge.

CTD² Pancancer Drug Activity DREAM Challenge
CT^2 Pancancer Drug Activity DREAM Challenge

CTD² members at the Columbia University developed Pancancer Analysis of Chemical Entity Activity (PANACEA), a comprehensive repertoire of dose dependent cellular responses and post-treatment molecular profiles to drug treatments. PANACEA covers a broad spectrum of cellular contexts representative of poor outcome malignancies, including rare ones such as GastroIntestinal Stromal Tumor (GIST) sarcoma and GastroEnteroPancreatic NeuroEndocrine Tumors (GEP-NETs).

PANACEA is uniquely suited to support DREAM Challenges related to the elucidation of drug mechanism of action (MOA), drug sensitivity, and drug synergy. Specifically, this Challenge is posing three questions or sub-Challenges:

  • Inference of targets using the transcriptional data collected 24h after treatment with chemotherapeutic compounds
  • Prediction of cell line compound sensitivity using baseline transcriptional profiles
  • Identification of optimal compounds to sensitize three KRAS-mutant cell-lines to treatment with the MEK inhibitor selumetinib

Data provided to participants will include drug perturbational profiles from cell lines, as well as drug-sensitivity measurements from a panel of compounds. Participants are encouraged to utilize large public databases such as Connectivity Map4, Cancer Cell Line Encyclopedia5, and Genomics of Drug Sensitivity in Cancer6, as well as insights and models developed from previous DREAM Challenges7-9 in the development or training of algorithms. The details for the CTD² Pancancer Drug Activity DREAM Challenge can be viewed at:!Synapse:syn20968331/wiki/597042.

CTD² BeatAML DREAM Challenge
CTD^2 BeatAML DREAM Challenge

Oregon Health and Science University (OHSU), in collaboration with academic medical centers, pharmaceutical, and biotechnology companies, developed the BeatAML research initiative. This study integrates molecular alterations data with ex vivo drug sensitivity for a large number of clinically annotated Acute Myeloid Leukemia (AML) cases. One of the primary goals of this multi-center study is to prioritize drugs that could yield new drug target hypotheses and discover predictive biomarkers of therapeutic response. Patient samples were subjected to whole-exome sequencing (WES), transcriptomic sequencing (RNA-seq), and ex vivo functional drug sensitivity screens10. This rich resource enables the discovery of molecular correlates of drug response and putative patient populations most likely to respond to targeted agents. Indeed, analysis of these data has already revealed numerous correlations of drug sensitivity or resistance with a variety of mutational subsets of disease, as well as numerous gene expression signatures that correlated with drug sensitivity/resistance10.

The overall goal of the BeatAML DREAM Challenge is to define patient subpopulations tailored to specific treatments by discovering (genomic and transcriptomic) biomarkers of drug sensitivity. This Challenge is posing two sub-Challenges:

  • Predict quantitative ex vivo drug sensitivity to targeted and chemotherapeutic agents using genomic alterations and gene expression data
  • Stratify patients into clinical responders (i.e. those that did not have a relapse within two years of standard induction therapy) and non-responders based on ex vivo drug sensitivity data, genomic alterations, and/or gene expression data

OHSU will provide training (Beat AML waves 1 and 2) and validation (Beat AML wave 3) data. The details for the CTD² BeatAML DREAM Challenge can be viewed at:!Synapse:syn20940518/wiki/596265.


  1. Saez-Rodriguez J, Costello JC, Friend SH, et al. Crowdsourcing biomedical research: leveraging communities as innovation engines. Nat Rev Genet. 2016 Jul 15;17(8):470-86. (PMID: 27418159)
  2. Ellrott K, Buchanan A, Creason A, et al. Reproducible biomedical benchmarking in the cloud: lessons from crowd-sourced data challenges. Genome Biol. 2019 Sep 10;20(1):195. (PMID: 31506093)
  3. Gönen M, Weir BA, Cowley GS, et al. A Community Challenge for Inferring Genetic Predictors of Gene Essentialities through Analysis of a Functional Screen of Cancer Cell Lines. Cell Syst. 2017 Nov 22;5(5):485-497.e3. (PMID: 28988802)
  4. Subramanian A, Narayan R, Corsello SM, et al. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell. 2017 Nov 30;171(6):1437-1452.e17. (PMID: 29195078)
  5. Barretina J, Caponigro G, Stransky N, et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012 Mar 28;483(7391):603-7. (PMID: 22460905)
  6. Iorio F, Knijnenburg TA, Vis DJ, et al. A landscape of pharmacogenomic interactions in cancer. Cell. 2016 Jul 28;166(3):740-754. (PMID: 27397505)
  7. Bansal M, Yang J, Karan C, et al. A community computational challenge to predict the activity of pairs of compounds. Nat Biotechnol. 2014 Dec;32(12):1213-22. (PMID: 25419740)
  8. Costello JC, Heiser LM, Georgii E, et al. A community effort to assess and improve drug sensitivity prediction algorithms. Nat Biotechnol. 2014 Dec;32(12):1202-12. (PMID: 24880487)
  9. Menden MP, Wang D, Mason MJ, et al. Community assessment to advance computational prediction of cancer drug combinations in a pharmacogenomic screen. Nat Commun. 2019 Jun 17;10(1):2674. (PMID: 31209238)
  10. Tyner JW, Tognon CE, Bottomly D, et al. Functional genomic landscape of acute myeloid leukaemia. Nature. 2018 Oct;562(7728):526-531. (PMID: 30333627)

Featured Researchers
Translating Genomics in Cancer: Interview with Dr. Andrew (Andy) Mungall

Interviewed by Cindy Kyi, Ph.D.
Office of Cancer Genomics, NCI
Headshot of Dr. Andy Mungall

Dr. Andy Mungall

Genome science and advances in sequencing technologies are leading the way towards precision oncology for cancer patients. OCG interviewed Dr. Andrew (Andy) Mungall, Senior Staff Scientist and Group Leader of Biospecimen and Library Cores at Canada’s Michael Smith Genome Sciences Centre of BC Cancer in Vancouver, Canada. Dr. Mungall has been involved with molecular characterization projects for OCG’s Cancer Genome Characterization Initiative (CGCI). Dr. Mungall provides his background and perspectives on genome science in cancer research.

When did your interest in genetics and genome science begin? 

My undergraduate degree was in Applied Biological Sciences at a time when genetics was making significant progress in mapping genes such as those responsible for Huntington’s disease and cystic fibrosis. I spent a gap-year as a waiter in a small family-run restaurant in the French ski resort of Meribel. My downtime was spent skiing and reading French magazines and books describing the emergence of genomics in France and elsewhere. I was particularly taken by Bertrand Jordan’s book ‘Travelling around the human genome: A world tour of 80 laboratories’ and was certain I wanted to be part of the genomics revolution. In the summer of 1993, I began my career at the Sanger Centre (now Wellcome Trust Sanger Institute), near Cambridge, UK. I was first recruited as a sequence finisher working on the Huntington’s disease gene region (at 4p16.3) I’d studied a year earlier, as well as the red-green colour blind locus at Xq28. I subsequently led the team to map and sequence chromosome 6 as part of the International Human Genome Project. After publication of human chromosome 6 in 2003 (PMID: 14574404), I began my PhD studying the evolution and gene regulation of the genomic imprinting mechanism.

How did you get into cancer genomics research?

Numerous collaborations forged during the human chromosome 6 project stimulated my interest in applying genomics in healthcare and particularly cancer research.

What influenced your decision to move from Britain to Canada?

In 2007, as I was completing my PhD thesis, I began to look at genome centres closely associated with hospitals. One such environment was Canada’s Michael Smith Genome Sciences Centre (GSC), a department within BC Cancer. Under the directorship of Dr. Marco Marra, the GSC was one of the first worldwide research centres to develop and apply cutting-edge genomic technologies to cancer research. After visiting Marco and colleagues, some of whom I’d known from early Sanger days (including Drs. Steve Jones and Karen Novik), and spending a family holiday in beautiful Vancouver with its proximity to mountains and ocean, it was clear that my family would enjoy the professional and lifestyle opportunities.

How did you first get involved with NCI’s CGCI and what is your role in it?

My Staff Scientist role at the GSC first involved molecular characterizations of Non-Hodgkin lymphomas (NHL), using fingerprint profiling to identify structural rearrangements in follicular lymphomas (FL), then subsequently, massively parallel sequencing of FL and Diffuse Large B-Cell Lymphomas. Through this work, supported in part by CGCI, we published our findings of frequent EZH2 mutations in NHL in 2010 (PMID: 20081860). We then extended this work to identify that mutations in histone modifier genes are frequent in NHL (PMID: 21796119). These studies have launched new avenues of investigations into NHL biology and therapeutics. In 2011, I became the Group Leader for Library Core at the GSC, leading the team that generates diverse library types for CGCI and other projects. In 2019, our team completed the molecular characterization of pediatric Burkitt lymphomas through CGCI's Burkitt Lymphoma Genome Sequencing Project (PMID: 30617194). Finally, working with CGCI's HIV+ Tumor Molecular Characterization Project - cervical cancer working group, we have recently completed the genomic, transcriptomic and epigenomic characterization of Ugandan cervical tumours, surprisingly revealing human papillomavirus (HPV) clade-specific tumour differences. 

Before you started the project, what was your expectation of HIV-associated mutations in cervical cancer?

HIV infection is well known to be associated with increased incidence of cancers, including cervical cancer (see CGCI’s HIV+ Tumor Molecular Characterization Project (HTMCP)). However, in cervical cancers, the interplay between HIV and HPV, a virus which is necessary but not sufficient for the development of cervical cancer, is unknown. One of the aims of the HTMCP cervical cancer project was to reveal whether molecular differences existed between HIV+ and HIV- patients with cervical cancer. Somewhat surprisingly, we did not identify significant molecular differences in the tumours of HIV+ and HIV- patients, although there was a trend towards PIK3CA mutations being more prevalent in HIV- cases (45%) than HIV+ cases (29%). HIV+ tumours may therefore be less reliant on dysregulation of the PI3K-MAPK pathway for cervical tumourigenesis. We observed that HIV+ cervical cancer patients were on average 10 years younger than HIV- patients, and therefore hypothesized that HIV infection may contribute to the early development of cervical cancer, potentially by suppressing immune recognition and clearance of HPV from the cervical epithelium.

Most notably, the cervical cancer study revealed the importance of epigenetic modifiers, with 87% of patients carrying mutations in one or more epigenetic modifiers such as KMT2D. Genome DNA methylation and histone modification data separated tumours according to HPV clade integrations in the tumour genome, with clade A9 HPV associating close to promoter regions and A7 in intergenic regions resulting in altered gene and endogenous retroviral expression programs.

What are some of the challenges in characterizing genomes from patient tumors?

Short-read sequencing technologies for re-sequencing applications is advanced and enables accurate somatic single nucleotide and insertion/deletion variant calling assuming constitutional DNA is available. A key challenge in tumour characterization is the uniform processing and timely collection of patient tumour and constitutional (often from blood) DNA samples. This can be difficult in rare tumour types sourced from around the world and can be overcome with detailed SOPs. Obtaining large fresh/flash frozen sample cohorts is not always possible for both prospective and retrospective cancer studies. It has, therefore, been important to develop the capability to characterize formalin-fixed and paraffin-embedded (FFPE) tissues typically collected for diagnostic purposes. At the GSC, we have developed methods for reducing FFPE artifacts in genome sequencing data (PMID: 30418619) while maintaining automated high throughput nucleic acid extraction (PMID: 28570594).

What are some of the “unknowns” or “gaps” that still need to be discovered in cancer genomics in order to fully understand pathogenesis in diverse tumor types?

The more we molecularly characterize tumour types, the more we appreciate the role of the epigenome in pathogenesis. This has been true for many of the CGCI projects mentioned above, including lymphomas and cervical cancer, and opens up possibilities for epigenetic therapies. Another promising area of cancer research is the relationship between tumour cells and their microenvironment, which is of particular importance in cancer immunotherapy as it has had spectacular success in some patients but not others. In the coming months and years, it will be critical to identify patients most likely to benefit from immunotherapy. I believe that tools including the organoid culture of tumour cells followed by single cell genome, transcriptome and epigenome analyses will play a significant role to further our understanding of tumour pathogenesis.

How can one apply results from ‘omics’ research in clinical research?

Omics technologies are powerful tools with which to perform discovery, and in time, with reducing costs, I expect genomes and transcriptomes (& more) to be featured prominently in clinical research. At the GSC, we have worked closely with multidisciplinary teams, using whole genome and transcriptome sequencing to inform on treatment decision-making in advanced stage cancers through BC Cancer’s Personalized OncoGenomics (POG) program (NCT02155621). More than 1100 patients have been enrolled thus far and the genomic data together with recorded treatment outcomes will continue to be mined by the scientific community for years to come. Genomic discoveries are already being used in the clinic through the use of targeted gene panels. At the GSC, we are accredited by the College of American Pathologists and Diagnostic Accreditation Program of British Columbia to provide clinical sequencing of genes in our Hereditary Cancer Program (HCP), Oncopanel and Myeloid panels. As omics research identifies new clinically actionable gene targets, these gene probes can be readily added to existing probe sets for custom capture sequencing.    

What challenges lie in the way of achieving precision treatments for patients with diverse cancer subtypes?

Whole genome and transcriptome sequencing of tumours through programs such as CGCI and POG continue to provide invaluable information that can influence clinical patient management. Examples demonstrating personalized treatment options include gene expression signatures used to reclassify a tumour’s diagnosis and thus treatment, and drugs targeting the products of novel gene fusions identified across diverse tumour types, that are likely to benefit from immunotherapy. The challenge may therefore not lie solely in the detection of clinically actionable targets but rather how to prioritise the treatments. Drug availability, cost and off-label use are additional challenges that must be met before truly achieving precision medicine for cancer patients.

CTD² Guest Editorial
Cracking the Cancer Code with Computational Approaches

Aaron T. Griffin, M.D. Ph.D. program, Prabhjot S. Mundi, M.D., and Andrea Califano, Ph.D.
Columbia University CTD² Center
Aaron T. Griffin, M.D. Ph.D. program, Prabhjot S. Mundi, M.D., and Andrea Califano, Ph.D.

Think of the genes that make up the DNA in our cells as the musicians in an orchestra, each playing a perfectly constructed and uniquely tuned instrument, yet all silent and unheard until allowed to play their individual melodies. Think of the RNA molecules—which are ultimately responsible for producing the proteins that make up the cell—as the individual melodies that will be produced, note by note, by each musician to blend and combine into the complex symphony of the cell’s inner workings. Now think of this orchestra, much like our cells, as having more than 20,000 musicians. Clearly, no coherent piece of music could emerge spontaneously, without many synchronized conductors, each one coordinating the work of a subset of musicians, telling them when to play, when to be silent, and setting the tempo for their scores.

From this simple analogy, it should be quite obvious that, even though two orchestras may have the same musicians playing the same exact instruments, the music they produce may be vastly different, depending on their conductors and on the score and tempo they will set. Not dissimilarly, cells with virtually identical genomes in our body can act in dramatically different ways, for instance operating as a neuron, a liver cell, or a white blood cell.

To complete the picture, now imagine a few of the key instruments, such as the first violin or the piano, being badly damaged and suddenly playing dissonant notes. Chaos would ensue and a few of the conductors, disoriented by this cacophony, may even make matters worse, by over-compensating to keep the concert going at all costs, thus resulting in an even more discordant symphony. In our cells, such dystonic and uncoordinated music is the music of cancer, where mutations in several genes may end up affecting the behavior of key conductors of the cellular harmony—proteins that we have called "master regulators"—causing utter mayhem and disease1.

Until recently, the foundational assumption of precision medicine was that if you could stop the one instrument playing the most discordant music—in other words, target the most important mutated oncogene with a selective drug—then the malignant orchestra would come to a grinding halt. Unfortunately, given the large number of independent mutations that are necessary to trigger and maintain the cancer state of a human cell, targeting a single oncogene is rarely sufficient to accomplish this task and can at best slow down the “tempo” of the cell, only to see it dramatically increase again after the remaining mutations find a way to compensate for the effect of the drug. Unfortunately, this is what is being observed in the clinic with targeted therapeutics—i.e. drugs designed to inhibit specific oncogenes activated by a mutation. Indeed, on average, only 20% to 30% of patients have oncogene mutations that can be targeted with drugs and less than half respond to the therapy. More importantly, of those that initially respond, almost all will eventually relapse with a more aggressive version of the tumor that no longer responds to treatment. As a result, only 5% to 11% of patients treated with targeted therapy, on average, improve their progression free survival (PFS) and even fewer are cured.

In hindsight, considering that each cell, out of the billion that comprises a tumor mass on average, comes with its own unique set of mutations, this is not overly surprising. This is because, although targeting one mutated oncogene may kill some cells, it will not kill them all, thus resulting in the emergence of cells that are immune to the effects of the drug and will thus cause relapse. In addition, much like how some of the orchestra conductors may try to adapt the music to the sudden disappearance of one of their key instruments, using a different yet still discordant tune, tumors can dynamically reprogram themselves into a different kind of malignant state which can escape therapy by activating a new set of master regulators without requiring additional DNA mutations. For instance, prostate tumors can reprogram themselves to a very aggressive and drug-resistant neuroendocrine tumor state. Indeed, one of the reasons why some tumors are so hard to treat is that, even when effective drugs are used—e.g. against mutated oncogenes—individual cancer cells will almost inevitably find multiple ways to escape their effect.

Dr. Andrea Califano, Principal Investigator and founding chair of the Department of Systems Biology at the Columbia University Irving Medical Center, has developed computational approaches that focus on targeting the master regulator proteins responsible for directing the dissonant genetic symphony of cancer rather than the individual mutated oncogenes. The key hypothesis pursued by his lab is that, while there are billions if not quadrillions of potentially tumor-inducing DNA mutation patterns, the actual number of distinct cancer related states they can induce (i.e., those controlled by a distinct set of master regulators) is actually very small—typically 2 to 5 per tumor type. Thus, if further validated in the clinics, this would lead to the development of a much more universal set of drugs, which, rather than targeting the almost infinite number of mutated gene patterns that can start a tumor, target instead a small set of master regulator proteins.
Califano Lab employs systems biology algorithms
Figure: The Califano Lab employs systems biology algorithms (ARACNe and VIPER) to identify tumor master regulators on an individual patient basis. The OncoTarget and OncoTreat algorithms are then used to predict drugs which will invert the activity of these master regulators. Unless a patient derived xenograft (PDX) model exists, testing the predicted drugs requires an appropriate mouse or organoid model that recapitulates the patient’s master regulator dependencies. This is accomplished with the OncoMatch algorithm. Finally drug predictions are tested in the most appropriate model, including PDXs, genetically engineered mouse models (GEMMs), or organoids.

Indeed, the lab has shown that either genetic or pharmacologic inhibition of master regulator proteins discovered by this approach can stop cancer cells dead in their tracks. To do this, the lab had to develop a variety of algorithms, which are now used in several clinical studies and trials. For instance, the Algorithm for the Reconstruction of Accurate Cellular Networks2 (ARACNe) finds the strings that connect the individual genes to master regulators that control them. Then, the  Virtual Proteomics by Enriched Regulon analysis3 (VIPER) algorithm follows all of these strings to pinpoint the key master regulators of a specific tumor or even of a specific cell within a tumor. The OncoMatch algorithm then searches all available cancer models (for instance, more than a thousand cell lines that can grow in a test tube) to identify the optimal avatar for the patient’s tumor in which to evaluate the patient-relevant targets of all available FDA-approved and experimental drugs. This is achieved using large-scale perturbational assays where the RNA of the cell lines is profiled before and after perturbation with each drug. Finally, the OncoTarget and OncoTreat4 algorithms match the master regulators of a tumor to the targets of each drug to identify one or more that can reverse the activity of the master regulators and thus kill the tumor. Once considered very hard to implement, if not impossible, this strategy of targeting tumor master regulators has already been validated in numerous human malignancies4-12. Returning to the orchestra metaphor, OncoTarget and OncoTreat identify drugs that can target the handful of out-of-control conductors of the dystonic symphony rather than the individual musicians.

Over three hundred FDA-approved and late-stage experimental anticancer drugs are used to perturb the optimal avatar of the patient’s tumor, as selected by OncoMatch13,14. By using VIPER to assess the activity of the master regulators before and after the drug has been introduced in these cells, one can easily identify the drug that can target the vast majority of them. Most drugs do almost nothing in terms of inverting the activity of the master regulators and some may take down a few of them but not all. However, almost invariably, there are a few drugs that can take down the vast majority of them. In studies on 39 drugs predicted by OncoTarget and OncoTreat for patients with completely different tumors that had failed 3 to 7 lines of therapy, 60% induced a relevant response in the patient tumor transplanted into a mouse—including arresting the tumor growth or causing the tumor to shrink—and 20% caused tumor shrinkage4. In contrast, none of the classical targeted inhibitors had any effect on these very aggressive tumors.

Interestingly, these algorithms also work on the individual cells of a tumor, thus allowing the identification of drugs capable of targeting different tumor cell subpopulations within the same tumor that would have completely different drug sensitivities. This could pave the road to a cell-by-cell approach to eliminate this terrible disease.


  1. Califano A, Alvarez MJ. The recurrent architecture of tumour initiation, progression and drug sensitivity. Nat Rev Cancer. 2017 Feb;17(2):116-130. (PMID: 27977008)
  2. Basso K, Margolin AA, Stolovitzky G, et al. Reverse engineering of regulatory networks in human B cells. Nat Genet. 2005 Apr;37(4):382-90. (PMID: 15778709)
  3. Alvarez MJ, Shen Y, Giorgi FM, et al. Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nat Genet. 2016 Aug;48(8):838-47. (PMID: 27322546)
  4. Alvarez MJ, Subramaniam PS, Tang LH, et al. A precision oncology approach to the pharmacological targeting of mechanistic dependencies in neuroendocrine tumors. Nat Genet. 2018 Jul;50(7):979-989. (PMID: 29915428)
  5. Piovan E, Yu J, Tosello V, Herranz D, et al. Direct reversal of glucocorticoid resistance by AKT inhibition in acute lymphoblastic leukemia. Cancer Cell. 2013 Dec 9;24(6):766-76. (PMID: 24291004)
  6. Compagno M, Lim WK, Grunn A, et al. Mutations of multiple genes cause deregulation of NF-kappaB in diffuse large B-cell lymphoma. Nature. 2009 Jun 4;459(7247):717-21. (PMID: 19412164)
  7. Bisikirska B, Bansal M, Shen Y, et al. Elucidation and Pharmacological Targeting of Novel Molecular Drivers of Follicular Lymphoma Progression. Cancer Res. 2016 Feb 1;76(3):664-74. (PMID: 26589882)
  8. Carro MS, Lim WK, Alvarez MJ, et al. The transcriptional network for mesenchymal transformation of brain tumours. Nature. 2010 Jan 21;463(7279):318-25. (PMID: 20032975)
  9. Aytes A, Mitrofanova A, Lefebvre C, et al. Cross-species regulatory network analysis identifies a synergistic interaction between FOXM1 and CENPF that drives prostate cancer malignancy. Cancer Cell. 2014 May 12;25(5):638-651. (PMID: 24823640)
  10. Mitrofanova A, Aytes A, Zou M, et al. Predicting Drug Response in Human Prostate Cancer from Preclinical Analysis of In Vivo Mouse Models. Cell Rep. 2015 Sep 29;12(12):2060-71. (PMID: 26387954)
  11. Rajbhandari P, Lopez G, Capdevila C, et al. Cross-Cohort Analysis Identifies a TEAD4-MYCN Positive Feedback Loop as the Core Regulatory Element of High-Risk Neuroblastoma. Cancer Discov. 2018 May;8(5):582-599. (PMID: 29510988)
  12. Rodriguez-Barrueco R, Yu J, Saucedo-Cuevas LP, et al. Inhibition of the autocrine IL-6-JAK2-STAT3-calprotectin axis as targeted therapy for HR-/HER2+ breast cancers. Genes Dev. 2015 Aug 1;29(15):1631-48. (PMID: 26227964)
  13. Alvarez MJ, Yan P, Alpaugh ML, et al. Reply to ‘H-STS, L-STS and KRJ-I are not authentic GEPNET cell lines’. Nat Genet. 2019 Oct;51(10):1427-1428. (PMID: 31548719)
  14. Alvarez MJ, Yan P, Alpaugh ML, et al. Unbiased Assessment of H-STS cells as high-fidelity models for gastro-enteropancreatic neuroendocrine tumor drug mechanism of action analysis. bioRxiv. 2019 Jun 23. (doi: 10.1101/677435)

OCG Program Highlights
OCG-supported Initiatives Provide Valuable Resources to Advance and Accelerate Precision Oncology

Subhashini Jagu, Ph.D. and Cindy Kyi, Ph.D.
Office of Cancer Genomics, NCI
OCG Resources word art

Cancer is a global disease that calls for scientists and clinicians from around the world to join forces to advance their collective efforts. Advances in genomic studies such as high-throughput sequencing has made it possible to characterize genomes from many cancer types and learn about the genetic alterations seen in common cancers. However, the diversity of cancer subtypes and variations in individual patients’ genomes make it challenging to develop effective therapies with minimum side effects. In order to alleviate some of these challenges, the Office of Cancer Genomics (OCG) supports precision oncology programs: Cancer Genome Characterization Initiative (CGCI), Cancer Target Discovery and Development (CTD2) Network, Human Cancer Models Initiative (HCMI), and Therapeutically Applicable Research To Generate Effective Treatments (TARGET). OCG initiatives share resources such as data, reagents, tools, experimental methods, and Standard Operating Procedures (SOPs) (Figure). The availability of these resources to the public fosters collaboration and encourages growth of the genomics and precision oncology fields.

OCG strongly believes that sharing data and resources will help accelerate the translational efforts towards precision oncology. The National Institutes of Health Data Sharing Policy encourages sharing of data from projects supported by public funds in a broadly accessible fashion while protecting patient’s privacy and confidentiality of proprietary data. The NCI’s Data Coordinating Center (DCC) and Genomic Data Commons (GDC) provides bioinformatics support for the OCG-supported research programs. DCC team members play a critical role in making the data available and enhancing the usability of data files by establishing the guidelines for data file formats to facilitate data mining across all projects. Thus, implementing four foundational principles—Findability, Accessibility, Interoperability, and Reusability (FAIR) for good data management.
Image showing various icons representing resources offered by Office of Cancer Genomics

Figure: Resources from OCG’s multidisciplinary genomic programs.

Accessing OCG Data

OCG initiatives share data generated through their programs with the research community in accordance with the NIH Data Sharing Policy. OCG-supported data is available in open- and controlled-access tiers. The controlled-access protects patient privacy and confidentiality. Obtaining access to controlled data and metadata files requires authorization through NCBI’s database for Genotypes and Phenotypes (dbGAP). The Data Access Guide provides the detailed instructions.

Accessing CGCI and TARGET Data

CGCI and TARGET are tumor characterization initiatives. CGCI projects use molecular characterization methods to uncover distinct molecular features of HIV+ associated cancer as well as rare adult and pediatric cancers. TARGET program utilizes comprehensive genomic characterization approach to determine the molecular changes that drive childhood cancers and to develop effective, less toxic therapies. Genomic profiles (molecular characterization and sequence data) and clinical data for a variety of tumor types are accessible through a user-friendly Data Matrix specific to each program.

Quality-controlled, standardized, raw and analyzed data from CGCI projects can be accessed through the CGCI Data Matrix ;TARGET projects can be accessed through TARGET Data Matrix and NCI’s GDC . To promote reproducible research practices, detailed experimental approaches are also provided.

Accessing CTD2 Data

CTD2 Network is a collaborative initiative that aims to bridge the knowledge gap between genomics and development of effective therapeutic strategies. Raw and analyzed primary data are available through the CTD2 Data Portal. This is a unique resource, and includes several data types such as small-molecule high-through screens, RNAi and CRISPR loss-of-function and gain-of-function screens, protein-protein interaction screens, reverse phase protein arrays etc. Researchers, who do not frequently work with large-scale, high-content data generated through high-throughtput methods should refer to caveat emptor”.

The Network also developed the open-access web interface, “CTD2 Dashboard,” to address the community’s need to find data generated from multiple types of biological and analytical approaches by CTD2 Centers. This resource assembles Network Center-generated conclusions or “observations” with associated supporting evidence. The Dashboard allows easy navigation to biologists and data scientists.

Other Resources

Searchable Catalog of Next-Generation Cancer Models (NGCMs): HCMI is an in an international consortium that is generating patient-derived NGCMs from rare cancers and cancers from ethnic and racial minority populations. HCMI Searchable Catalog provides a list of available NGCMs to the research community. The HCMI Searchable Catalog User Guide provides instructions on how to navigate the Searchable Catalog. Harmonized genomic and clinical data of the models stored at NCI’s GDC and links to the model distributor are provided in the Catalog.

Pediatric Genomic Data Inventory (PGDI): PGDI is a catalog of known pediatric cancer projects and provides a summary of the molecular and clinical annotations.

Analytical Tools: The CTD2 Centers have developed Analytical Tools including new computational methods, algorithms, databases, and Data Portals to facilitate the processes of data mining, visualization, and analysis.

Reagents: Resources such as plasmids, cDNA clones, siRNA libraries, CRISPR-cas9 gRNA libraries, and protein-protein interaction reagents are available to the scientific community via distribution venues or by contacting the investigators that generated them.

SOPs: SOPs for current CGCI projects are provided as a reference for clinical practitioners, institutional officials, and laboratory or research personnel.

Case Report Forms (CRFs): Cancer type-specific enrollment and follow-up CRFs developed through collaborations with international clinical experts are available for use to enable uniform clinical data collection across all the tissue source sites. These forms can be used as a guide in collecting clinical data associated with cancer types.

Publications: OCG-supported initiatives have been very productive, with many manuscripts being published in high-visibility journals. In addition, most of the publications have a strong influence on the cancer genomics field and scientific community.

In summary, OCG shares valuable resources, data, and technology with the cancer research community to aid in expanding our understanding of cancer and to accelerate development of clinically useful markers, targets, and therapeutics for precision oncology. These resources are being widely used by the research community. OCG requests that users acknowledge data usage according to the guidance on relevant program-specific webpages.

HCMI Program Highlights
HCMI Model-associated Data Available at NCI’s Genomic Data Commons

Lauren Hurd, Ph.D. and Eva Tonsing Carter, Ph.D.
Office of Cancer Genomics, NCI
Icon for HCMI CMDC Data at Genomic Data Commons

NCI’s Human Cancer Models Initiative (HCMI)

The Human Cancer Models Initiative (HCMI) is an international consortium whose goal is to provide a community resource of ~1000 clinically and molecularly characterized next-generation (next-gen) cancer models. The next-gen models are generated from patient tumors that span a range from common and aggressive to rare adult and pediatric cancer subtypes. OCG has partnered with NCI Center to Reduce Cancer Health Disparities to support model development from racially and ethnically diverse populations. The aim of this initiative is to provide the scientific community diverse, fully-annotated (clinical and genomics data) models which more accurately recapitulate the biology of their parent tumors.

Successful characterization of NCI HCMI models requires integration of data from multiple institutions within the NCI cancer model development pipeline .These data include NCI-supported Cancer Model Development Center (CMDC) generated clinical data collected by the Clinical Data Center (CDC), biospecimen data from the Biospecimen Processing Center (BPC) and sequencing data from the Genomic Characterization Centers (GCCs). NCI’s Genomic Data Commons (GDC) serves as a repository for NCI HCMI data, ensuring that the data is in a standardized, interoperable format. The GDC analyzes the sequencing data through specific alignment and analysis pipelines. Therefore, the data collection, analysis and sharing for use by the community is as uniform as possible.

HCMI-CMDC data at NCI’s Genomic Data Commons (GDC)

HCMI’s model-associated molecular characterization, biospecimen and clinical datasets can be accessed through GDC data portal. The GDC data portal, which also houses datasets from large-scale genomic projects such as The Cancer Genome Atlas Program (TCGA), Therapeutically Applicable Research to Generate Effective Treatments (TARGET) and the Cancer Genome Characterization Initiative (CGCI), provides users with web-based access to HCMI-CMDC and some of the data from HCMI collaborators. HCMI-CMDC data is updated as it becomes available. Currently, model-associated data are available from cancer types originating in the brain, colon, and rectum. Each model’s page details the available associated data and includes high-level clinical, biospecimen and genomic information (Figure).

Snapshot of GDC HCMI webpage

Figure: Snapshot of an HCMI-CMDC model data page at NCI’s GDC.

The model-associated clinical data (Figure (a)) available at the GDC are submitted by the CMDCs using cancer type-specific Case Report Forms (CRFs) so that relevant diagnostic, therapeutic and prognostic information is collected. The CRFs function to standardize the clinical data and utilize clinical Common Data Elements (CDEs). The CDEs use a controlled vocabulary and are registered within NCI’s Cancer Data Standards Registry and Repository (caDSR) to ensure semantic interoperability. The clinical data are quality-controlled (QC-ed) and formatted by the CDC before the de-identified clinical data are transferred to the GDC. The GDC then harmonizes the clinical data according to GDC’s clinical data harmonization process.

Model-associated biospecimen data (Figure (b)) are also deposited at the GDC which allows for metadata collection associated with the physical sample and establishes relationships between case (e.g. model) and sample (e.g. tissue type). Prior to submission to the GDC, normal, tumor and model samples are processed by the BPC under standard protocols to ensure uniformity of nucleic acid isolation for molecular characterization. Biospecimen metadata at the GDC captures QC values for histopathology for model-associated normal and tumor tissues, such as percent tumor nuclei and necrosis, as well as nucleic acid metrics such as A260/A240 and RIN values for DNA and RNA aliquots, respectively.

Raw sequencing datasets for model and associated normal and tumor samples are submitted to the GDC by the GCCs. Nucleic acids from the BPC are sent to the GCCs where 150x whole exome sequencing (WXS) and 15x whole genome sequencing (WGS) are performed on normal, tumor and model by the DNA sequencing center and RNA-sequencing (RNA-Seq) is performed on model and tumor by the RNA sequencing center, respectively. At the GDC, the sequencing data (Figure (c)) are analyzed and harmonized using the curated GDC genomic data analysis and harmonization pipeline and ensures molecular characterization datasets are analyzed using the same bioinformatics methodology.

HCMI-CMDC datasets are either open-access or controlled. Open-access data present minimal risk that a participant can be identified and include de-identified clinical data, biospecimen data, tumor and model-associated somatic mutations. Open-access data does not require data use certification. Access to controlled datasets requires dbGaP authorization. For more information on access to controlled datasets such as raw sequencing data or harmonized datasets which contain germline variants, see the “Accessing HCMI Data” page. For accessing and downloading large sets of data, see GDC’s Data Transfer Tool.

Standardized HCMI model-associated data at NCI’s GDC available for research community

Harmonizing data through the GDC bioinformatics pipelines allows the comparison of information from multiple models or across GDC projects. GDC uses their DNA-seq analysis pipeline to identify somatic variants within the model and associated tumor from WXS and WGS data. The GDC RNA-seq pipeline is used to generate raw and normalized gene expression profiles for tumor and model data.  More analytical results (e.g. alternative splicing, etc. will be added in the future).  Harmonized raw sequencing data, variant calling datasets (VCFs and MAFs) and gene expression data for HCMI-CMDC models and associated normal and tumor are available at the GDC.

The results of the GDC analyses can be explored using the visualization and exploration (DAVE) tools which allows users to explore all GDC datasets including the HCMI-CMDC datasets. The GDC develops seminars and extensive documentation to orient users to the DAVE tools. Exploration pages at the GDC data portal allow for filtering of HCMI-CMDC datasets based on clinical parameters such a primary site, gender, age of diagnosis and ethnicity, etc. Once HCMI-CMDC somatic MAF files are generated, genomic data can be filtered by genes or mutations of interest. Users can select multiple model datasets and use GDC data portal tools to formulate and analyze scientific questions. Detailed information on DAVE tools and the full breath of analysis capabilities can be found at the GDC.

HCMI’s objective is to provide the research community with a resource of next-generation cancer models that are characterized with clinical and molecular data. Providing datasets in a standardized and unified manner, in alignment with four foundational principles—Findability, Accessibility, Interoperability, and Reusability (FAIR), is essential for efficient data sharing and enables users to compare multiple datasets. NCI’s GDC plays an integral role in helping HCMI meet this objective through its systematic and harmonized data processing, organization, storage and analysis tools. Users may join the GDC User Mailing List to receive updates on data releases.