Open versus Controlled-Access Data

OCG employs stringent human subjects’ protection and data access policies to protect the privacy and confidentiality of the research participants. Depending on the risk of patient identification, OCG programs data are available to the scientific community in two tiers: open or controlled-access. Both types of data can be accessed through its corresponding OCG program-specific data matrix or portal.

Open-access Data

Data within this category presents minimal risk of participant identification. Much of OCG program data, excluding patient identifiers, are open-access. OCG provides the scientific community the maximum amount of open-access data allowable under HIPAA guidelines. Access to these data does not require user certification, and researchers may explore data content without restriction.

Controlled-access Data

Data within this category present a higher risk of patient identification. While stripped of direct patient identifiers as defined by HIPAA, controlled-access data contain specific demographic, clinical, and genotypic information that are excluded in open-access data. Controlled-access data are unique and valuable to research projects for which open-access data are insufficient. Access to protected data requires user certification which can be obtained through NCBI’s dbGaP (National Center for Biotechnology Information’s database of Genotypes and Phenotypes). 

To learn more and understand which data each OCG program provides, visit How to Access Multiple Datasets

The Cancer Genome Atlas (TCGA) Data Portal

The Cancer Genome Atlas Data Portal contains clinical information, genomic characterization data, and high-throughput sequencing analysis of over twenty different cancers. Search, download, and analyze datasets generated by TCGA.

What is Cancer?

A brief explanation of how cancer forms, basic statistics, and links to additional resources.