Using TARGET Data
- ANNOUNCEMENT -
Newly Harmonized TARGET Data Released in the NCI GDC Data Portal
September 17, 2019
- WXS alignments and somatic variant calls* for ALL P2, AML, NBL, and WT and new Pindel variant calls for ALL P2, P3, WT, AML, and NBL
- WGS alignments for AML, RT, and WT
- RNA-Seq alignments and gene expression quantifications for ALL P1, P2, NBL, RT, and WT
- miRNA-Seq alignments and expression quantifications for ALL P2, P3, and AML
*The GDC applied multiple variant calling pipelines for the somatic variant calls in the Variant Call Format (VCFs). They plan to release new aggregated mutations in the Mutation Annotation Format (MAF) for each sample in the near future. The currently available MAFs do not reflect the VCF updates. Additional details are available in the GDC Data Release Notes for Data Release 19.
The TARGET data matrix will not function properly in Internet Explorer unless the Compatibility View is completely turned off. Visit the How to use Compatibility View in Internet Explorer 9 on the Microsoft Support website for more information.
The TARGET Initiative produces large-scale genomic data for a selected set of pediatric cancers and provides the research community access to those data. The goal for broadly sharing TARGET data is to facilitate the discovery of therapeutic targets for childhood cancers and catalyze the translation of these discoveries into clinical applications.
Learn how to search and download TARGET data by reading the sections in this user manual.
About the Data - Explains the types of data provided by TARGET
Guide to Accessing TARGET Data - A visual and interactive guide on how to access TARGET data
Open vs. Controlled Access - Explains the types of data that are openly available and those that require Data Use Certification (i.e. approval)
Using TARGET Data for Publication - Guidelines for publishing manuscripts with TARGET data
How to Access Protected Data - Provides step-by-step instructions for how to access controlled TARGET data
About the TARGET Data Matrix - Describes the Data Matrix through which TARGET data can be accessed
How to Navigate the TARGET Data Matrix - Describes how the Data Matrix is organized
TARGET project teams take an integrated approach to identify genetic alterations within tumors from children enrolled primarily in Children's Oncology Group clinical or biology studies. TARGET utilizes various complementary genomics methods, such as gene expression and next generation sequencing platforms, to analyze tumor and matched normal samples, and relapse samples when available. Resulting data are correlated with clinical outcome to extract biological insights and reveal potentially targetable clinical markers in pediatric cancers. Visit the TARGET Project Experimental Methods page for detailed information describing how TARGET data were generated by genomic platform, including protocols for establishing high-quality nucleic acid samples.
Researchers use array-based techniques to analyze tumor and matched normal samples for gross changes to genome structure and expression. Data from these methods can be analyzed individually by platform, as well as integrated with other array or sequence data, to construct a more comprehensive genomic profile.
- Gene expression profiling
- Chromosome-specific copy number analysis
- Methylation profiling (including some sequencing)
- miRNA profiling (including some sequencing)
Researchers use 2nd and 3rd generation sequencing to analyze tumor and matched normal samples for mutations, gene fusions, and other alterations present in childhood cancers. The acute lymphoblastic leukemia and neuroblastoma pilot studies additionally employed targeted sequencing for certain case cohorts.
- Whole Genome Sequencing
- Whole Exome Sequencing
- Transcriptome Sequencing (mRNA-seq and/or miRNA-seq)
- Targeted Capture Sequencing (primarily for verification and validation)
- Targeted Sanger Sequencing (including kinome)
Visit the Guide to Accessing Data page for a visual and interactive guide on how to access all TARGET data. Please refer to this guide as you read the two sections below: Open vs. Controlled Access and How to Access Protected Data.
TARGET employs stringent human subject protection and data access policies to protect the privacy and confidentiality of research participants. Therefore, TARGET data are available to the scientific community in two tiers: open or controlled access. Both types of data can be accessed through the TARGET Data MatrixOpens in a New Tab.
Open Access Data
Open access data are verified and interpreted data that cannot be used to identify individual patients. These types of data can be analyzed, for example, to make correlations between expression of genomic variants in molecular subtypes and clinical outcomes. Most researchers may find open access data sufficient in fulfilling their research needs. TARGET provides the scientific community the maximum amount of open access data allowable by informed consent.
Researchers can access these data by clicking on any link labeled "Open" in the TARGET Data Matrix. Data Use Certification (i.e. approval) is not required, and researchers may explore data without restriction.
Examples of open access data
- Clinical information that could not be used to identify patients
- Tissue pathology data
- Chromosome-specific copy number alterations and loss of heterozygosity
- Sequence data of single amplicons (matched tumor and normal when available)
Controlled Access Data
Data within this category present a small but significant risk of patient re-identification. While stripped of direct patient identifiers as defined by HIPAA, controlled access data contains specific patient/tumor information and unverified or raw molecular data (e.g., array-based and sequencing files). These data can be used to perform sophisticated bioinformatics analyses.
Researchers must obtain approval in the form of Data Use Certification (DUC) to access and download controlled data. They must apply for DUC by submitting requests through NCBI’s dbGaP (National Center for Biotechnology Information’s database of Genotypes and Phenotypes). Requestors must agree to the Data Use Limitations specific to this TARGET study. Refer to the "Guide to Accessing TARGET Data" and “How to Access Protected Data” for detailed information.
Examples of controlled access data
- Specific genotype or phenotype data for each case
- Information linking all sequence traces to an individual
- Raw sequence files for an individual case
If you are interested in using TARGET data for publication or other research purposes, you must follow the TARGET Publication Guidelines. Visit the page to learn more.
Below are step-by-step instructions for how to access protected TARGET data. The "Guide to Accessing TARGET Data" provides a visual and interactive overview of these steps.
- Obtain Data Use Certification through dbGaP
- Maintain User Accounts for Data Access
- Access Data via the TARGET Data Matrix
- Get Help If You Have Trouble Accessing Data
dbGaP video tutorial: https://www.youtube.com/watch?v=-3tUBeKbP5c
- Login to dbGaP (https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=login) to apply for access to controlled TARGET data.
- All users must have an eRA Commons account or HHS credentials (for intramural investigator) to submit requests for access. Further information can be found on the NCBI dbGaP homepage.
- Complete the electronic dbGaP Data Access Request (SF 424 (R&R)) form, which specifies the investigator’s intended use of the data. To get approved for a Data Use Certification (DUC), the requestors must:
- Agree to restrict their use of the information for biomedical research purposes only.
- Agree not to try to identify and/or contact the patients.
- Agree with the Data Use Limitations of the TARGET Initiative:
Requests for controlled-access data will be considered for research projects that can only be conducted using pediatric data (i.e., the research objectives cannot be accomplished using data from adults) and that focus on the development of more effective treatments, diagnostic tests, or prognostic markers for childhood cancers. Moreover, TARGET data can be used for research relevant to the biology, causes, treatment and late complications of treatment of pediatric cancers. Applications proposing methods, software, or other tool development are not considered acceptable uses of the data.
- Submit the completed SF 424 (R&R) form electronically to dbGaP.
- Upon SF 424 (R&R) form submission, the signing official of the Principal Investigator’s institution will review and certify the submission if relevant institutional policies and applicable laws and regulations (if any) have been followed.
- After the signing official has certified the submission, the SF 424 (R&R) application will be sent to the NCI Data Access Committee (DAC) to review for approval. The approval review process can take 2-4 weeks.
- Currently, approval in the form of a DUC allows the investigator data access to TARGET data for one calendar year.
- Submit a progress report to the DAC no later than one year after obtaining the DUC. The requestor needs to understand that a progress report is a current condition for the data access. Approved users may also apply for renewal to access protected data at the same time they submit the reports. A reminder to submit an annual progress report and renew approval status, if needed, will be sent by the DAC staff approximately one month before the access termination deadline. If the requestor does not submit the progress report or request a renewal, access to the data will cease.
Intramural investigators with an approved DUC may access protected TARGET data using their HHS credentials.
Investigators outside of HHS with an approved DUC require two separate user accounts to access protected TARGET data:
- Access to TARGET data stored and maintained at NCBI and the NCI Genomic Data Commons (GDC) – approved users can access TARGET data stored at NCBI using the eRA Commons account associated with the original Data Access Request. TARGET data stored at NCBI includes Sanger sequencing files (TRACE archives), raw and aligned reads from 2nd and 3rd generation sequencing (FASTQ and BAM files, Sequence Read Archive). Legacy TARGET data stored at the GDC includes raw and aligned reads from 2nd and 3rd generation sequencing (FASTQ and BAM files), as well as some aggregate data (including mutation calls and other associated molecular data).
- Access to data stored and maintained at the OCG Data Coordinating Center (DCC) at the National Cancer Institute (NCI) – approved users outside of HHS will be issued an NIH External (NIHEXT) user account from NIH, if none already exists, immediately after obtaining a DUC. This NIH-issued account will be used to access data at the OCG DCC, which includes most of the genomic data generated for the TARGET initiative (clinical information, all levels of chip-based molecular characterization, and higher level sequencing data). ***The password on this account needs to be updated every 120 days, and instructions are distributed when the account is created***
For more information on maintaining or troubleshooting these data access accounts, click here.
Approved users may access protected TARGET data through the TARGET Data MatrixOpens in a New Tab with either HHS credentials or the appropriate external account (as outlined in #2).
- Access data stored at the OCG DCC directly through the TARGET Data Matrix (requires NCI-issued account for extramural investigators):
- Protected clinical information
- All levels chip-based molecular characterization data
- Processed sequencing data (upper level files, excluding BAM files; i.e. VCF or MAF files)
- Access low-level sequence files stored at NCBI and the GDC indirectly through hyperlinks on the TARGET Data Matrix (requires eRA Commons account for extramural investigators):
- Trace sequences stored in the NCBI TRACE Archives - Sanger targeted sequencing
- FASTQ/BAM files stored in the Sequence Read Archives accessible through NCBI dbGaP - 2nd/3rd generation whole genome, exome, mRNA-seq, miRNA-seq, targeted capture, methyl-seq, ChIP-seq
- FASTQ/BAM/VCF files stored in the GDC accessible through NCI GDC website with eRA login - 2nd/3rd generation whole genome, exome, mRNA-seq, miRNA-seq, targeted capture
For NCI-stored data – OCG@mail.nih.gov
For NCBI-stored data – https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=email&from=loginOpens in a New Tab
TARGET data are available to the research community and accessible through a tabular, easy-to-use Data MatrixOpens in a New Tab. Throughout the initiative, matrix version history has been updated as datasets are added. Users should be aware of the version and date when a dataset is downloaded, as alternative versions of the TARGET Data MatrixOpens in a New Tab exist.
We want the TARGET Data Matrix to meet the needs of the research community and encourage users to send comments, questions, and suggestions for improvement to email@example.com.
The Data Matrix links to both open and controlled access TARGET data. To obtain specific datasets or metadata, including descriptions of each project, users can hover over the text within the table and click to access the appropriate files. The pilot TARGET project in acute lymphoblastic leukemia (ALL) is separated by phase: Phase I, the pilot portion of the initiative; and Phase II.
Raw or low level data files (level 1)
Normalized and integrated data (levels 2 and 3)
Summarized findings (level 4)
Data Access Code
Blue = open access
Red = controlled access (NCI & NCBI)
Black = unavailable
Types of Data Found in the Matrix
Names of diseases studied
Clinical information, including outcomes
Types of molecular data generated and platforms used
Metadata descriptions about each individual project
Multi-level chip-based and sequencing data links