- ANNOUNCEMENT -
The TARGET data matrix will not function properly in Internet Explorer unless the Compatibility View is completely turned off.
The TARGET Initiative produces large-scale genomic data sets for some of the most prevalent pediatric cancers and further provides the research community access to those findings. The goal for broadly sharing TARGET data is to facilitate the discovery of therapeutic targets for childhood cancers and catalyze the translation of these discoveries into clinical applications.
Read the following user guide to learn how to search and download data generated by TARGET.
TARGET project teams take an integrated approach to identify genetic alterations within tumors from pediatric patients enrolled in clinical or biology trials, primarily through the Children's Oncology Group. TARGET utilizes various complementary genomics methods, such as gene expression and next generation sequencing platforms, to analyze tumor and matched normal samples, as well as relapse samples when available. Resulting data is correlated with clinical outcome and alternate metadata to extract novel, biologically-relevant information about pediatric cancers, as well as reveal potentially targetable clinical markers.
Researchers use array-based techniques to analyze tumor and matched normal samples for gross changes to genome structure and expression. Data from these methods can be analyzed individually by platform, as well as integrated with other array or sequence data, to construct a more comprehensive genomic profile.
- Gene expression profiling
- Copy number analysis
- Epigenetics profiling
- miRNA profiling
Researchers use 2nd and 3rd generation sequencing to analyze tumor and matched normal samples for mutations, gene fusions, and other alterations present in childhood cancers. The ALL and neuroblastoma pilot studies additionally employed targeted sequencing for certain case cohorts.
- Targeted Sequencing
- Whole Genome Sequencing
- Whole Exome Sequencing
- Transcriptome Sequencing
- Kinome Sequencing
TARGET employs stringent human subjects’ protection and data access policies to protect the privacy and confidentiality of the research participants. Therefore, TARGET data is available to the scientific community in two tiers: open or controlled access. Both types of data can be accessed through the TARGET Data MatrixOpens in a New Tab.
Data within this category presents minimal risk of participant identification. Much of TARGET data, excluding patient identifiers, are open-access. TARGET provides the scientific community the maximum amount of open-access data allowable under HIPAA guidelines. Access to this data does not require user certification, and researchers may explore data content without restriction.
Examples of open-access data
- Clinical information that could not be used to identify the patient
- Tissue pathology data
- Gene expression data (other than 1º exon array data or mRNA-seq)
- Tumor-specific copy number alterations and loss of heterozygosity
- Sequence data of single amplicons (matched tumor and normal when available; cannot be assembled to link to an individual)
- Tumor-associated (somatic) mutations
Data within this category presents a higher risk of patient identification. While stripped of direct patient identifiers as defined by HIPAA, controlled-access data contains specific demographic, clinical, and genotypic information that are excluded in open-access data. Controlled-access data is unique and valuable for research projects for which the open-access data are insufficient. Access to this data requires user certification which can be obtained through NCBI’s dbGaP (National Center for Biotechnology Information’s database of Genotypes and Phenotypes). Researchers apply for access by submitting an electronic Data Access Request. Read “How to Access Protected Data” below for more information.
Examples of controlled-access data
- Specific demographic and clinical data genome-wide genotypes for each case
- Information linking all sequence traces to an individual
- Whole genome, exome or transcriptome sequences for an individual case
The following flowchart details how to access the various forms of protected data. Refer to this flowchart as needed while reading both the General and Detailed Instructions.
General Outline of Instructions (detailed instructions immediately below):
- Obtain Data Use Certification through dbGaP
- Maintain User Accounts for Data Access
Access data via the TARGET Data Matrix
- Use HHS credentials (intramural investigators) or NCI-issued user account to directly access all data stored in NCI databases
- Use HHS credentials (intramural investigators) or eRA Commons account to access data stored in NCBI databases
- Get Help If You Have Trouble Accessing Data
1. Obtain Data Use Certification through dbGaP
All users requesting access to controlled data must:
- Have an eRA Commons account or HHS credentials (for intramural investigator) to submit requests for access. Further information can be found on the NCBI dbGaP homepage.
Complete the electronic dbGaP Data Access Request (SF 424 (R&R)) form, which outlines the investigator’s intended use of the data. To get approved for a Data Use Certification (DUC), the requestors must:
- Agree to restrict their use of the information for biomedical research purposes only.
- Agree not to try to identify and/or contact the patients.
Submit requests that agree with the Data Use Limitations within the TARGET Initiative Data Use Certification (DUC):
Requests for controlled-access data will be considered for research projects that can only be conducted using pediatric data (i.e., the research objectives cannot be accomplished using data from adults) and that focus on the development of more effective treatments, diagnostic tests, or prognostic markers for childhood cancers. Moreover, TARGET data can be used for research relevant to the biology, causes, treatment and late complications of treatment of pediatric cancers. Applications proposing methods, software, or other tool development are not considered acceptable uses of the data.
Submit the completed SF 424 (R&R) form electronically to dbGAP for consideration of data access approval.
- Upon SF 424 (R&R) form submission, the signing official of the Principal Investigator’s institution will be notified of the submission and asked to certify agreement with the Data Use Limitations stated within the Data Access Request form.
- After the signing official has certified agreement, the SF 424 (R&R) application will be sent to the NCI Data Access Committee (DAC) to review for approval. The approval review process can take 2-4 weeks.
- Currently, approval in the form of a DUC allows the investigator data access to TARGET data for one calendar year.
- Submit a progress report to the DAC no later than one year after obtaining the DUC. The requestor needs to understand that a progress report is a current condition for the data access. Approved users may also apply for renewal to access protected data at the same time they submit the reports. A reminder to submit an annual progress report and renew approval status, if needed, will be sent by the DAC staff approximately one month before the access termination deadline. If the requestor does not submit the progress report or requests a renewal, access to the data will cease.
2. Maintain User Accounts for Data Access
Intramural investigators with an approved DUC may access protected TARGET data using their HHS credentials.
Investigators outside of HHS with an approved DUC require two separate user accounts to access protected TARGET data:
- For access to TARGET data stored and maintained at NCBI – approved users can access TARGET data stored at NCBI using the eRA Commons account associated with the original Data Access Request. TARGET data stored at NCBI includes Sanger sequencing files and aligned reads from 2nd generation sequencing (BAM files).
- For access to data stored and maintained at the OCG Data Coordinating Center (DCC) at the National Cancer Institute (NCI) – approved users outside of HHS will be issued a user account from NCI immediately after obtaining a DUC. This NCI-issued account will be used to access data at the OCG DCC, which includes most of the genomic data generated for the TARGET initiative (clinical information, all levels of chip-based molecular characterization, and higher level sequencing data). ***The password on this account needs to be updated every 60 days, and those instructions are distributed when the account is created***
3. Access Protected Data via the TARGET Data Matrix
Approved users may access protected TARGET data through the TARGET Data MatrixOpens in a New Tab with either HHS credentials or the appropriate external account (as outlined in #2).
Access data stored at the OCG DCC directly through the TARGET Data Matrix (requires NCI-issued account for extramural investigators):
- Protected clinical information
- All levels chip-based molecular characterization data
- Processed sequencing data (upper level files, excluding BAM files)
- Access low-level sequence files stored at NCBI indirectly through hyperlinks on the TARGET Data Matrix (requires eRA Commons account for extramural investigators):
Trace sequences stored in the NCBI TRACE Archives
Sanger targeted sequencing
BAM files stored in the Sequence Read Archives accessible through NCBI dbGaP
2nd/3rd generation whole genome, exome, mRNA-seq, miRNA-seq
4. Get Help If You Have Trouble Accessing Data
For NCI-stored data – OCG@mail.nih.gov
For NCBI-stored data – https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=email&from=loginOpens in a New Tab
TARGET data is accessible through a tabular, easy-to-use Data MatrixOpens in a New Tab. New data from ongoing projects gets incorporated into the Data Matrix as it becomes available, along with an update of the matrix version history. Users should note the version of the TARGET Data MatrixOpens in a New Tab when accessing information.
The TARGET Data Matrix evolves over time to meet the needs of the research community. We encourage users to send comments, questions, and suggestions for improvement to firstname.lastname@example.org.
The Data Matrix links to both open and controlled access TARGET data. To obtain specific datasets or metadata, including descriptions of each project, users can hover over the text within the table and click to access the appropriate files. The pilot TARGET projects, acute lymphoblastic leukemia (ALL) and neuroblastoma (NBL), are separated by phase: Phase I, the pilot portion of the initiative; and Phase II, expansion through ARRA funding. Note: NBL data are color-coded by phase.
Raw or low level data files (level 1)
Normalized and integrated data (levels 2 and 3)
Summarized findings (level 4)
Data Access Code
Blue = open access
Red = controlled access (NCI & NCBI)
Black = unavailable
Types of Data Found in the Matrix
Names of diseases studied
Clinical information, including outcomes
Types of molecular data generated and platforms used
Metadata descriptions about each individual project
Multi-level chip-based and sequencing data links