CTD² is a “community resource project [1],” meaning members of the Network are required to release data to the broader research community. All data generated by this initiative are released in agreement with the data release policy [2] developed by its members in concordance with NIH data release policy. The release of CTD² data to the scientific community is intended to maximize the translational impact of these findings. In addition to utilizing the raw data for investigational purposes, researchers outside the Network are encouraged to use CTD² datasets to develop novel methods and tools.
- What is the CTD² Data Portal?
- What Data are Available in the CTD² Data Portal?
- How do I Navigate the CTD² Data Portal?
- How Are Data Files Formatted?
- How Can I Analyze the Data?
- How Do I Acknowledge CTD² Data?
What is the CTD² Data Portal?
The Data Portal is an open-access data portal that serves as the single access point for downloading CTD² data. It is managed by the National Cancer Institute’s Data Coordinating Center (DCC). Along with each project dataset in the Data Portal, users will find links to a summary of the corresponding Center’s overall research goals, a description of technologies used to generate the data, and project contact information.
What Data are Available in the CTD² Data Portal?
The Network employs a variety of high-throughput and bioinformatics/computational methods to validate cancer targets identified in large-scale genomics data. While each Network Center has its own set of specialties, open collaboration across the groups is designed to maximize translational impact. For example, several Network Centers specialize in the identification of small molecules that modulate validated cancer targets (e.g., for use as probes or therapeutics), while other groups specialize in testing these small molecules in animal models. Raw datasets resulting from Network experiments are made freely available to download and use at a researcher’s discretion [3].
As the Network continues to innovate, their experimental approaches evolve along with the types of data that are made available through the Data Portal. Below are examples of approaches Network members have applied in their research. This list is not comprehensive.
- Small molecule screening
- Protein-protein interaction identification
- RNA interference (RNAi) and clustered regularly interspaced short palindromic repeats (CRISPR)/cas9 screening
- Genome-wide loss-of-function and gain-of-function screening
- Targeted candidate gene validation
- Judiciously applied mouse-based screening
How Do I Navigate the CTD² Data Portal?
The CTD² Data Portal [4] contains all available CTD² raw (and some analyzed) data. Each row in the Portal corresponds to a specific project, and each column corresponds to detailed information associated with each project. Entries in the Portal are clickable, and the links take users to a page with the following details about each project dataset:
- Project Title – displays project description
- Institute – links to a broad description of each Center’s goals and aims
- Experimental Approaches – links to a summary the experimental approach or directs users to a publication with corresponding methodology
- Data Files – directs users to a page where they can download the data
- Some projects also link to associated CTD² Dashboard [5] submission(s)
- Contact – opens an email addressed to project representative, so users can send specific inquire
How Are Data Files Formatted?
In order for all data to be usable and uniform, the CTD² Network follows common data format guidelines defined by Network members.
- Data files are in the .GCT file format or neutral format (e.g., CSV, tab-delimited).
- Metadata is documented either using headers (e.g., GEO- Soft format) or separate documentation (e.g., README files).
- When the submission includes many data types, files are deposited as a compressed archive (.zip, .tar, .tgz, etc.) that will allow downloading of the whole package at once.
How Can I Analyze the Data?
CTD² generates massive datasets that cannot be analyzed manually and may be of limited use to researchers with little bioinformatics support. Automated analytical tools allow a deeper mining of the data. While OCG/CTD² does not endorse any specific data mining tool, the Network members curated a list they found useful for analyzing and/or visualizing the datasets. Visit the CTD² Analytical Tools [6] page to learn more.
How Do I Acknowledge CTD² Data?
The CTD² Network requests that researchers who use CTD² data acknowledge it as follows: “The results published here are in whole or part based upon data generated by Cancer Target Discovery and Development (CTD²) Network (https://ocg.cancer.gov/programs/ctd2/data-portal [4]) established by the National Cancer Institute’s Office of Cancer Genomics.”
The Network also requests that researchers who use the CTD² Dashboard as a resource for their studies to acknowledge and cite the manuscript Aksoy, Dančík, Smith et al., [7] Database 2017;1-10 and provide the URL https://ctd2-dashboard.nci.nih.gov/dashboard/ [8].
For a more detailed explanation of the publication guidelines, visit the CTD² Publication Guidelines [9] page.