cBioPortal: A Web Platform of Gene-Based Data Exploration

The OCG e-News “Exploring Cancer Genomes” series highlights visualization and analysis tools used by the cancer research community to extract actionable hypotheses from large-scale genomic datasets. In the latest edition, we spotlight the cBioPortal for Cancer Genomics, a tool developed at Memorial Sloan Kettering Cancer Center’s Computational Biology Center (cBio).
Large-scale genomic studies, such as Therapeutically Applicable Research to Generate Effective Treatments (TARGET) and The Cancer Genome Atlas (TCGA), are generating an unprecedented amount of data on a significant number of tumors. Much of this raw data is made available to the research community, but the sizes of these datasets pose an enormous barrier to analysis. To address this shortcoming, Memorial Sloan Kettering Cancer Center’s Computational Biology Center (cBio) has developed the cBioPortal for Cancer Genomics. The cBioPortal is a web application platform that facilitates data exploration and analysis through the use of a variety of visualization and analytical tools. Some of the tools in cBioPortal, such as the network mapper Cytoscape, were developed by external sources, and some were developed by the cBio engineers themselves. The tools are threaded together seamlessly through cBioPortal’s user-friendly interface that was designed to be accessible to all researchers, regardless of their level of informatics expertise.
The cBioPortal allows users to interrogate datasets across genes, samples, and data types, giving them the opportunity to examine a number of different biologically and/or clinically relevant hypotheses. The cBioPortal currently hosts more than 40 datasets from TCGA and other large-scale genomic studies, and makes them available for bulk download. Data from OCG’s TARGET Initiative will be added to its database in the next year. The data types from the 13,000+ tumor samples include mutations, copy number alterations, mRNA expression changes, and DNA methylation values, as well as clinical parameters, such as disease-free survival.
cBioPortal provides “gene-based” visualizations and analyses, so that users can find altered genes and/or networks within a study of interest or across all studies. Initiating a search is simple with the cBioPortal’s four-step “Query” web interface (Figure 1). Users select the desired: 1) cancer study, 2) genomic profile(s), such as mutations and copy number alterations, 3) patient case set, and 4) gene sets of interest. Gene sets can be entered manually or selected from pre-loaded cancer pathways derived from Pathway Commons. Following the submission of a query, the cBioPortal generates condensed, easy-to-navigate data summary readouts that are organized in a series of clickable tabs (Figure 2). A few of these readouts are highlighted below. Visit the cBioPortal website to explore all of them.
Figure 1: The cBioPortal query Web interface
OncoPrint and Mutual Exclusivity
The OncoPrint is a graphical representation of alterations in genes across multiple tumor samples (Figure 2). Individual patient cases are designated by columns, and individual genes are designated by rows. To simplify analysis, genes are discretely classified as either “altered” or “unaltered” in each patient case. Different colors, shapes, and symbols (e.g. arrowheads) distinguish the type of alteration(s), such as homozygous deletions, amplifications, differential expression, and mutations.
Figure 2: OncoPrint of genomic alterations in samples from a TCGA lung squamous cell carcinoma study
The OncoPrint is especially useful for visualizing alterations across a set of cases and identifying trends, such as co-occurrence and mutual exclusivity. Genes that do not co-occur in the same patients may ultimately be in the same pathway. Identifying these genes exposes opportunities for synthetic lethal therapies. The significance level of co-occurrence and mutual exclusivity, calculated by the statistical method Mutual Exclusivity Modules (MEMo), is found under the Mutual Exclusivity tab.
When gene queries are performed across all studies, the Portal returns a graphical summary of alterations, as well as OncoPrints for every individual study. The summary is a histogram that shows the percentage or total number of samples per study in which at least one of the query genes is altered (Figure 3).
Figure 3: Histogram summarizing percent of tumor samples with at least one gene altered in selected gene network for all cancer studies in the cBioPortal
Plots
The OncoPrint displays discrete gene values (i.e., whether a gene is altered or not based on a defined threshold) for all data types, even for data with continuous values such as mRNA expression. The Plots tab offers a way to visualize continuous and/or discrete data for one or two genes across samples in pairwise comparison plots. For instance, mRNA expression of one gene (e.g. TP53) can be plotted against copy number alterations from the same locus (Figure 4). Alternatively, mRNA expression of one gene can be plotted against the mRNA expression of another.
Figure 4: A plot comparing TP53 mRNA levels to putative copy alterations from GISTIC
Network
Under the Network tab, the Portal constructs a map consisting of the query genes and up to 50 of the most highly altered neighboring genes in the selected cancer study (Figure 5). This map uses network and interaction data derived from the Pathway Commons Project. The 50 gene limit helps minimize the complexity of each map, but the full network can be viewed using the independent application Cytoscape.
Figure 5: An interactive network map of query genes (p53 signaling pathway) and highly altered neighboring genes from a TCGA lung squamous cell carcinoma study
The cBioPortal network map contains multiple layers of genomic information that can be filtered using its numerous features. Each node represents a single gene. The color-coded discs that appear when hovering over a node display the various genomic data, including copy number alterations, mutations, and mRNA expression differences. The edges connecting network nodes are color-coded based on the type of interaction they represent (e.g., a brown edge indicates that Genes A and B are in the same complex). From the Genes & Drugs menu, the user may also choose to display drug-target interactions in the network. This information is curated from other sources, including DrugBank, KEGG Drug, NCI Cancer Drugs, and Rask-Andersen et al. 2011.
Mutations
Not all mutations have functional consequences at the protein level. The Mutations tab helps researchers predict how specific mutations may affect the function of individual genes before initiating functional studies. It contains a summary graphic and table of all nonsynonymous mutations identified in each gene. The graphic display (Figure 6) shows the position and frequency of the mutations overlaid on protein domains from Pfam (Protein family database from Wellcome Trust Sanger Institute), so that users can visually inspect the potential impact a mutation has on a functional domain. The table (not shown) provides a host of information about each mutation, such as amino acid change, type of mutation, predicted functional impact as calculated by the Mutation Assessor application, and a hyperlink to the 3D protein structure with mutation highlighted (in Mutation Assessor).
Figure 6: Graphical summary of TP53 nonsynonymous mutations from TCGA lung squamous cell carcinoma study mapped across the gene
Patient View
The Patient View page condenses all relevant molecular and clinical data available from one patient’s tumor into multiple visualizations. This tool provides users the unique opportunity to examine patient-specific questions, such as which altered genes might influence response to treatment. Access this page by clicking on the case ID hyperlinks found on the OncoPrint or the Mutations table. The links on the OncoPrint appear when hovering over a patient-specific genomic alteration.
The cBioPortal has many options for saving, downloading, and sharing results from a query. Each of the readouts, a few of which are highlighted above, can be saved at unique URLs and/or e-mailed for convenient sharing. In addition to the visualization and analysis readouts, the Portal also generates simple tab-delimited files of each query for use in other applications. Furthermore, cBioPortal facilitates data sharing and transfer of Portal data into other programs, such as the Integrative Genomics Viewer, which may be more suitable for certain types of analyses.
cBioPortal offers a variety of easy-to-understand readouts for researchers that may not necessarily have a background in bioinformatics and computational science. It facilitates data integration and interpretation, which is a critical in developing novel treatments that will improve the lives of cancer patients.
Researchers may download and install cBioPortal locally. Please contact cbioportal@googlegroups.com for information. To learn more about this and many of the other features of this tool, visit the cBioPortal website (http://www.cbioportal.org).
References:
1) https://cbio.mskcc.org/tools/cancer-genomics/index.html
2) Cerami E, Gao J, Dogrusoz U, Gross BE, Sumer SO, Aksoy BA, Jacobsen A, Byrne CJ, Heuer ML, Larsson E, Antipin Y, Reva B, Goldberg AP, Sander C, Schultz N. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discovery May;2(5):401-4 (2012)
3) Gao J, Aksoy BA, Dogrusoz U, Dresdner G, Gross B, Sumer SO, Sun Y, Jacobsen A, Sinha R, Larsson E, Cerami E, Sander C, Schultz N. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Science Signaling Apr 2;6(269):pl1 (2013)
4) Schroeder MP, Gonzalez-Perez A, Lopez-Bigas N. Visualizing multidimensional cancer genomics data. Genome Medicine Jan 31;5(1):9, (2013)
5) Storrs, Carina. Combing the Cancer Genome. The Scientist Magazine (2012)