Predicting Cancer Cell Line Dependencies From the Protein Expression Data of Reverse-Phase Protein Arrays

 Snapshot of the protein-dependency analytic module in The Cancer Proteome Atlas. The newly added module is highlighted in red boxes.

Chen et al. (2020) JCO Clin Cancer Inform. CC BY 4.0

Chen MM, Li J, Mills GB, Liang H.

JCO Clin Cancer Inform.

April 01, 2020

Purpose: Predicting cancer dependencies from molecular data can help stratify patients and identify novel therapeutic targets. Recently available data on large-scale cancer cell line dependency allow a systematic assessment of the predictive power of diverse molecular features; however, the protein expression data have not been rigorously evaluated. By using the protein expression data generated by reverse-phase protein arrays, we aimed to assess their predictive power in identifying cancer dependencies and to develop a related analytic tool for community use.

Materials and methods: By using a machine learning schema, we conducted an analysis of feature importance based on cancer dependency and multiomic data from the DepMap and Cancer Cell Line Encyclopedia projects. We assessed the consistency of cancer dependency data between CRISPR/Cas9 and short hairpin RNA-mediated perturbation platforms. For a fair comparison, we focused on a set of genes with robust dependency data and four available expression-related features (copy number alteration, DNA methylation, messenger RNA expression, and protein expression) and performed the same-gene predictions of the cancer dependency using different molecular features.

Results: For the genes surveyed, we observed that the protein expression data contained substantial predictive power for cancer dependencies, and they were the best predictive feature for the CRISPR/Cas9-based dependency data. We also developed a user-friendly protein-dependency analytic module and integrated it with The Cancer Proteome Atlas; this module allows researchers to explore and analyze our results intuitively.

Conclusion: This study provides a systematic assessment for predicting cancer dependencies of cell lines from different expression-related features of a gene. Our results suggest that protein expression data are a highly valuable information resource for understanding tumor vulnerabilities and identifying therapeutic opportunities.

Last updated: July 29, 2020