OCG-Supported Resources

Burkitt Lymphoma Genome Sequencing Project (BLGSP): The Epstein-Barr Virus (EBV) Sequences from Burkitt Lymphoma Cases Published in Grande, Gerhard et al.,2019

The EBV sequences are available for download as BAM alignments from the Public directory at the DCC: https://cgci-data.nci.nih.gov/Public/BLGSP/WGS/L2/.  

The 106 BAM files made available by an open access are the Epstein-Barr virus (EBV) sequences that were extracted from the BLGSP patient cohort genomes included in the following publication:

Grande BM, Gerhard DS, Jiang A, et al. Genome-wide discovery of somatic coding and non-coding mutations in pediatric endemic and sporadic Burkitt lymphoma. Blood. March 2019; 21;133(12):1313-1324. (PMID: 30617194)

The following intentionally stringent criteria were used to ensure that no human reads were included in the BAMs.

  • Only reads aligned to the EBV genome (chrEBV) in the reference (GenBank accession AJ507799.2) were included. 
  • Unmapped reads were excluded. 
  • Reads whose mate did not align to the same chromosome (i.e. chrEBV) were excluded. 
  • Reads with more than 5 clipped bases (soft- or hard-clipped) in case of a split read (e.g. due to an EBV genome integration event) were excluded. 

As an additional check, the number of reads in EBV-negative tumors were counted with the expectation of finding virtually nothing if human reads are not “sneaking” through. Out of 35 EBV-negative genomes, 25 (71%) had exactly zero reads. The remaining genomes, with one exception (which had 90), had at most 19 (range: 1-19) reads. When a few randomly selected reads were attempted to align to the human genome, only short matches (20-30 bp) were found that were expected to be spurious. Therefore, it is believed that these are real EBV reads.

Given that EBV is ubiquitous (e.g. over 90% of adults globally and most African children are infected), it is possible that EBV-infected normal B cells were included at very low levels in otherwise EBV-negative tumor biopsies. This would explain the presence of a few EBV reads found in EBV-negative BL samples. In general, EBV reads are often found in DNA sequencing data. For more information, see http://www.cureffi.org/2013/02/01/the-decoy-genome/ .Therefore, we are confident that there are virtually no human reads in these EBV BAM files, consistent with the strict criteria that were used.

Cancer Genome Anatomy Project (CGAP)

CGAP generated a wide range of genomics data on cancerous cells that are accessible through easy-to-use online tools. Researchers, educators, and students can find "in silico" answers to biological questions through the CGAP website. Request a free copy of the CGAP Website Virtual Tour CD from ocg@mail.nih.gov to learn how to navigate the website.

The Cancer Genome Atlas (TCGA) Data Portal

The Cancer Genome Atlas Data Portal contains clinical information, genomic characterization data, and high-throughput sequencing analysis of over twenty different cancers. Search, download, and analyze datasets generated by TCGA.