All Resources

AIDS-Related Cancers

Find comprehensive information on HIV-associated cancers, including treatment, prevention, clinical trials, and more.

Burkitt Lymphoma Genome Sequencing Project (BLGSP) Standard Operating Procedures (SOP) Manual

Burkitt Lymphoma Genome Sequencing Project (BLGSP): The Epstein-Barr Virus (EBV) Sequences from Burkitt Lymphoma Cases Published in Grande, Gerhard et al.,2019

The EBV sequences are available for download as BAM alignments from the Public directory at the DCC:  

The 106 BAM files made available by an open access are the Epstein-Barr virus (EBV) sequences that were extracted from the BLGSP patient cohort genomes included in the following publication:

Grande BM, Gerhard DS, Jiang A, et al. Genome-wide discovery of somatic coding and non-coding mutations in pediatric endemic and sporadic Burkitt lymphoma. Blood. March 2019; 21;133(12):1313-1324. (PMID: 30617194)

The following intentionally stringent criteria were used to ensure that no human reads were included in the BAMs.

  • Only reads aligned to the EBV genome (chrEBV) in the reference (GenBank accession AJ507799.2) were included. 
  • Unmapped reads were excluded. 
  • Reads whose mate did not align to the same chromosome (i.e. chrEBV) were excluded. 
  • Reads with more than 5 clipped bases (soft- or hard-clipped) in case of a split read (e.g. due to an EBV genome integration event) were excluded. 

As an additional check, the number of reads in EBV-negative tumors were counted with the expectation of finding virtually nothing if human reads are not contaminating. Out of 35 EBV-negative genomes, 25 (71%) had exactly zero reads. The remaining genomes, with one exception (which had 90), had at most 19 (range: 1-19) reads. When a few randomly selected reads were attempted to align to the human genome, only short matches (20-30 bp) were found that were expected to be spurious. Therefore, it is believed that these are real EBV reads.

Given that EBV is ubiquitous (e.g. over 90% of adults globally and most African children are infected), it is possible that EBV-infected normal B cells were included at very low levels in otherwise EBV-negative tumor biopsies. This would explain the presence of a few EBV reads found in EBV-negative BL samples. In general, EBV reads are often found in DNA sequencing data. For more information, see .Therefore, we are confident that there are virtually no human reads in these EBV BAM files, consistent with the strict criteria that were used.

Cancer in Children and Adolescents

View a fact sheet that has statistics as well as information about types, causes, and treatments of cancers in children and adolescents in the United States.

Cancer Therapy Evaluation Program

The Cancer Therapy Evaluation Program (CTEP) seeks to improve the lives of cancer patients by finding better treatments, control mechanisms, and cures for cancer. CTEP funds a national program of cancer research, sponsoring clinical trials to evaluate new anti-cancer agents.

Candidate Cancer Allele cDNA Collection

CTD2 researchers at the Broad Institute/DFCI have developed a collection of plasmids including mutant alleles found in sequencing studies of cancer. It includes somatic variants found in lung adenocarcinoma and across other cancer types. The clones enable researchers to characterize the function of the cancer variants in a high throughput experiments. These plasmids are collectively called the “Broad Target Accelerator Plasmid Collections”. The design and construction of these plasmids is described in the manuscripts listed below and are available through a distributor.

Kim E, et al. Systematic functional interrogation of rare cancer variants identifies oncogenic alleles. Cancer Discovery. 2016 Jul;6(7):714-26. (PMID: 27147599)
Berger AH, Brooks AN, Wu X, et al. High-throughput phenotyping of lung cancer somatic mutations. Cancer Cell. 2016 Aug 8;30(2):214-28. (PMID: 27478040)

CDE (Common Data Element) Browser User Guide

cDNA Clones with Rare and Recurrent Mutations Found in Cancers

The CTD2 Center at UT-MD Anderson Cancer Center has developed a High-Throughput Mutagenesis and Molecular Barcoding (HiTMMoB) pipeline to construct mutant alleles open reading frame expression clones that are either recurrent or rare in cancers. These barcoded genes can be used for context-specific functional validation, detection of novel biomarkers (pathway activation) and targets (drug sensitivity). The list of available gene expression clones can be accessed here: FileMDACC ORF Clones.xlsx

Contact: Gordon B. Mills

Dogruluk T, et al. Identification of variant-specific functions of PIK3CA by rapid phenotyping of rare mutations. Cancer Research. 2015 Dec 15;75(24):5341-54. (PMID: 26627007)

Tsang YH, et al. Functional annotation of rare gene aberration drivers of pancreatic cancer. Nature Communications. 2016 Jan 25;7:10500. (PMID: 26806015)

Center for Cancer Genomics

The Office of Cancer Genomics is within the Center for Cancer Genomics (CCG). CCG was established to unify NCI’s activities in cancer genomics with the goal of advancing genomics research and translating findings into the clinic to improve the precise diagnosis and treatment of cancers. 


Funded in large part by the Initiative for Chemical Genetics (ICG), Chembank is an interactive database for small molecules. It contains data from hundreds of biomedically relevant small molecule screens that involved hundreds-of-thousands of compounds. Chembank also provides analysis tools to facilitate data mining.

Childhood Cancers

Find comprehensive information on childhood cancers: current treatments, clinical trials, prevention, genetics, testing, and more.

Childhood Cancers in Spanish (Español)

Cáncer infantile

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRi) Plasmids

CTD2 researchers at the University of California in San Francisco developed a modified Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) CRISPR/dCas9 system. Catalytically inactive dCas9 enables modular and programmable RNA-guided genome regulation in eukaryotes. The CRISPR/dCas9 system has several advantages: i) enables robust gene repression (CRISPRi) or activation (CRISPRa) in human cells, ii) allows specific knockdown with minimal off-target effects in human cells, iii) works efficiently in human and yeast cells, and iv) does not cause double-strand breaks. Plasmid design and construction for CRISPRi (human and yeast cells) are described in the manuscript listed below and are available through a distributor.

Gilbert LA, et al. CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes. Cell. 2013 Jul 18;154(2):442-51. (PMID: 23849981)

Experimental Methods for the Burkitt Lymphoma Genome Sequencing Project

On this page, researchers can find data generation and data analysis protocols from the following manuscript :

Grande BM, Gerhard DS, Jiang A, et al. Genome-wide discovery of somatic coding and non-coding mutations in pediatric endemic and sporadic Burkitt lymphoma. Blood. March 2019; 21;133(12):1313-1324. (PMID: 30617194)

Gabriella Miller Kids First (GMKF/Kids First) Pediatric Research Program

The Gabriella Miller Kids First initiative is a trans-NIH effort to increase understanding of genetic changes associated with certain devastating pediatric conditions. The initiative will develop a centralized database of well-curated clinical and genetic sequence data from childhood cancer and structural birth defects cohorts comprising thousands of patients and their families.
To learn more about the initiative and the data available, please visit

Genome-wide Association Studies from the Cancer Genetic Markers of Susceptibility (CGEMS) Initiative

CGEMS identifies common inherited genetic variations associated with a number of cancers, including breast and prostate. Data from these genome-wide association studies (GWAS) are available through the Division of Cancer Epidemiology & Genetics website.

Genomic Data Commons (GDC)

NCI's Genomic Data Commons (GDC) is a unified data sharing platform that allows users to search, browse, download, and analyze data. The GDC serves as a single knowledge base which unifies genomic and clinical data from different research programs for the cancer research community. 

Guide to Accessing Program Data

Visit the Guide to Accessing Data page for a visual and interactive guide on how to access OCG program data. 

HCMI Case Report Forms (CRFs)

Human Cancer Models Initiative's cancer type-specific CRFs have been developed through collaborations with international clinical experts and the clinical data elements have been standardized through the Cancer Data Standards Registry and Repository (caDSR). Enrollment and Follow-up CRFs are available below for download. As the tumor types modeled through the HCMI are constantly updated, be sure to check back regularly for additional or updated CRFs. Note: In some cases, it is possible to collect tissues from different tumor sites (e.g. primary, metastatic, and/or recurrent) from the same patient for model development. New CRFs capture information about multiple models developed from the same patient within a single form. The multiple model CRFs are designated with "-multi" within the file name. Versioning is used if there are any subsequent edits to the CRFs. 

HCMI Searchable Catalog

HCMI Searchable Catalog is a continuously updated resource for querying the available next-generation models developed by HCMI. Within the catalog, users can search by patient demographics, tumor, and model elements including age at diagnosis, sex, treatment information, clinical tumor diagnosis, primary site, clinical stage, and type of model (e.g. 3D-organoid, 2D-conditionally reprogrammed cells), etc. For additional assistance in navigating the searchable catalog, please see the “HCMI Searchable Catalog User Guide”. 

HCMI Searchable Catalog 

HCMI Searchable Catalog User Guide

This guide provides users with a resource to effectively navigate the HCMI Searchable Catalog.

HCMI Searchable Catalog User Guide

HIV+ Tumor Molecular Characterization Project (HTMCP) Standard Operating Procedures (SOP) Manual

Resources Image

Human cDNA Library from the ORFeome Collaboration (OC)

The goal of the OC, an informal volunteer multi-institutional collaboration, is to provide the research community with validated, full open reading frame (ORF) cDNA clones for all of the currently defined human genes.  The ORF clones do not include 5’ and 3’ UTRs and can be easily sub-cloned into any type of expression vector. These clones are available to researchers worldwide through multiple distributors.

Informed Consent Template for Tissue Accrual to Enable Model Development and Distribution

Mammalian cDNA Library from the NIH Mammalian Gene Collection (MGC)

The MGC provides the research community full-length clones for most of the defined (as of 2006) human and mouse genes, along with selected clones of cow and rat genes. Clones were designed to allow easy transfer of the ORF sequences into nearly any type of expression vector. MGC provides protein ‘expression-ready’ clones for each of the included human genes. MGC is part of the ORFeome Collaboration (OC).

National Cancer Institute

Visit the NCI website for comprehensive cancer information.

NCI’s Lung Cancer page

Learn an abundance of current information on lung cancer.

NCI’s Non-Hodgkin Lymphoma page

Find comprehensive information on NHL, including testing, treatment, genetics, clinical trials, and more.

Office of HIV and AIDS Malignancies (OHAM)

NCI’s Office of HIV and AIDS Malignancies (OHAM) is a great resource for learning about HIV-associated cancers.

Online Bioinformatics Tutorials

Bioinformatics is a scientific discipline that applies computer science and information technology to help understand biological processes. The NIH provides a list of free online bioinformatics tutorials, either generated by the NIH Library or other institutes, which includes introductory lectures and "how to" videos on using various tools.

Open versus Controlled-Access Data

OCG employs stringent human subjects’ protection and data access policies to protect the privacy and confidentiality of the research participants. Depending on the risk of patient identification, OCG programs data are available to the scientific community in two tiers: open or controlled-access. Both types of data can be accessed through its corresponding OCG program-specific data matrix or portal.

Open-access Data

Data within this category presents minimal risk of participant identification. Much of OCG program data, excluding patient identifiers, are open-access. OCG provides the scientific community the maximum amount of open-access data allowable under HIPAA guidelines. Access to these data does not require user certification, and researchers may explore data content without restriction.

Controlled-access Data

Data within this category present a higher risk of patient identification. While stripped of direct patient identifiers as defined by HIPAA, controlled-access data contain specific demographic, clinical, and genotypic information that are excluded in open-access data. Controlled-access data are unique and valuable to research projects for which open-access data are insufficient. Access to protected data requires user certification which can be obtained through NCBI’s dbGaP (National Center for Biotechnology Information’s database of Genotypes and Phenotypes). 

To learn more and understand which data each OCG program provides, visit How to Access Multiple Datasets

Protein-Protein Interaction (PPI) Reagents

A large number of gene mutations give proteins new capabilities to bind cellular proteins and create new signaling pathways that drive tumor growth. To discover and validate mutation-created protein-protein interactions (PPI) as therapeutic targets for cancer, the CTD2 Center at Emory University has created PPI expression vector libraries. A list of available cancer-associated genes can be accessed here: FileEmory_CTD^2_PPI_Reagents.xlsx

Contact: Haian Fu 

Li Z, et al. The OncoPPi network of cancer-focused protein-protein interactions to inform biological insights and therapeutic strategies. Nature Communications. 2017 Feb 16;8:14356. (PMID: 28205554)

Resources for Researchers

Resources for Researchers is a directory of NCI-supported tools and services for cancer researchers. Resources are developed and maintained by NCI scientists or were created with grant funding. Most resources are free and available to anyone. Each resource is owned by an NCI division, office, or center.

Successful Standard Operating Procedures (SOPs)

Standard Operating Procedures (SOPs) are written instructions for doing a specific task in a certain way (Source: NCI).  The purpose of an SOP is to guide a novice to carry out a particular task in an accurate and consistent manner.

Download: PDF iconCharacteristics_of_Successful_SOPs.pdf

TARGET Project Experimental Methods

On this page researchers can find detailed information describing how TARGET data was generated by genomic platform, including protocols for establishing high-quality nucleic acid samples.

What is Cancer?

A brief explanation of how cancer forms, basic statistics, and links to additional resources.