Issue 9 : May, 2013

Featured Researchers
Investigating the Complex Relationship Between HIV and Cancer: An Interview with Dr. Corey Casper

Eugene Gillespie, Ph.D. and Shannon Behrman, Ph.D.
Dr. Corey Casper

Dr. Corey Casper studies the epidemiology, biology, and treatment of infection-related cancers primarily in Uganda, a global hot spot for such cancers. He contributes to the HIV+ Tumor Molecular Characterization Project (HTMCP), a project developed by the Office of Cancer Genomics (OCG) and Office of HIV and AIDS Malignancies (OHAM). The goal of HTMCP is to understand genetic differences between same tumor types in HIV positive and negative individuals, thereby allowing specifically tailored disease treatment. In this interview, Dr. Casper talks about his research background, the latest research in HIV-associated and other infection-related cancers, and how studies like HTMCP have the potential to improve cancer diagnosis and treatment throughout the world.

How did you first become interested in infection-related cancers?

I went to the Weill Cornell Medical College in New York City in the early 1990s at the height of the HIV epidemic, before the advent of antiretroviral therapies. Weill Cornell Medical College was part of the New York Presbyterian Hospital, which probably had several hundred HIV-infected patients on any given day during that time. A tremendous number of these patients had Kaposi’s sarcoma (KS), a previously rare tumor. It was a highly visible manifestation of the HIV epidemic. At the time, we had no idea what caused it, and why it was so much more common in people with HIV.

Several years later, it was discovered that an infectious agent called human herpesvirus 8 (HHV8) causes KS. I found this interesting and wanted to understand how this specific virus causes this cancer. In 2000, I started a Masters in Public Health (MPH) and an Infectious Diseases fellowship at the University of Washington and the Fred Hutchinson Cancer Research Center (FHCRC) to study this problem collaboratively with Larry Corey, the head of the Infectious Disease Division and now President and Director at FHCRC. 

How did you go from studying herpes viruses in Seattle to studying infection-related cancers in Uganda?

In the late 1990s, KS was still fairly prevalent in Seattle and many other cities in the US. However, by the time I began my research fellowship, KS was becoming increasingly rare due to the widespread use of antiretroviral treatment. There were only five new cases diagnosed in Seattle in 2000 and similar reductions in burden throughout the US. Unfortunately, KS was still common in Sub-Saharan Africa, where incidence of untreated HIV infection remained very high.  Uganda, in particular, proved an ideal candidate to study the etiology of infection-related cancers. Not only does Uganda have a high infection-related cancer burden, it also has a well-established cancer research institute (Uganda Cancer Institute) and a comprehensive cancer registry. 

The Uganda Cancer Institute (Courtesy of Dr. Corey Casper)

As I learned more about the cancer trends in Uganda, I discovered that cancer cases in Uganda are more frequently associated with infection than in many other parts of the world. Six out of the 10 most common cancers in Uganda are caused by an infectious disease, as compared to one out of 10 in Seattle. The reason for this discrepancy was largely unknown, and I was very intrigued by it. Consequently, I decided to expand my focus from the study of KS etiology to the broader relationship between infections and cancer.

What is the goal of HIV+ Tumor Molecular Characterization Project (HTMCP) and what types of tumors will you study?

HTMCP aims to identify mutations and altered pathways in HIV-associated cancers that may respond to therapy. To do so, researchers will compare the molecular profiles of the same tumor types from HIV-positive and HIV-negative cancer patients. The three cancers being studied in HTMCP are cervical cancer, lymphoma, and lung cancer.

With HTMCP, there is a unique opportunity to compare the genomics of the three tumor types in Uganda with those in the US to determine if the underlying biology is different. Based on differences in clinical presentation of tumors between these locations, I anticipate that the biology will not be identical. Some of the tumors seen in Uganda are more aggressive and more phenotypically extreme than their counterparts in the US. Studying extreme manifestations of HIV-associated cancers may be useful when trying to identify pathways that cause cancer and/or may be therapeutically targeted. The factor causing the extreme presentation could be more common or more dramatically up-regulated than in milder forms of the disease, making it easier for researchers to find the molecular causes of that cancer.

Interestingly, this study will also give us an opportunity to investigate the genetics of HIV-associated lung cancer in non-smokers. Such studies are difficult to carry out in the US because, unlike in Uganda, a very high percentage of HIV-infected individuals in the US are smokers.

Can you make comparisons between Uganda and the US in terms of differences in HIV-associated cancer burden?

The prevalence of HIV is probably 20 to 30 times higher in Uganda and other Sub-Saharan African countries as compared to the US, making HIV-associated malignancies a greater burden in Uganda in general. Additionally, the spectrum of cancers varies between these two countries. AIDS-defining cancers (e.g., KS, certain types of lymphoma, and cervical cancer) remain the overwhelming majority of cancers among HIV-positive patients in Uganda. This trend mirrors the US epidemic circa 1995, right before antiretroviral therapies were introduced. At that time, almost all of the cancers in HIV-positive patients in the US were the “AIDS-defining malignancies.” Now, with more availability to antiretroviral therapies, “non-AIDS defining cancers,” such as liver, anal, and lung cancer, have become much more common in HIV-positive patients in the US.

This difference in the proportions of HIV-associated malignancies between these two countries may be partially explained by the fact that HIV is diagnosed at a much later stage and treatment is less accessible in Uganda, as compared to the US. However, it is still unclear if more widespread use of antiretroviral therapies in Uganda will produce a change in prevalence of the cancer types seen in HIV positive patients, similar to that which occurred in the US. Interestingly, we haven’t seen any such change yet, even though over 40% of Ugandans living with HIV have access to treatment.

What factors contribute to the increased incidence of cancer in HIV-positive individuals? Also, how do antiretroviral therapies affect incidence?

A weakened immune system caused by HIV infection is one factor. Highly active antiretroviral therapy (HAART), which helps maintain the function of the immune system by decreasing HIV replication, has reduced the incidences of certain cancers caused by oncogenic viruses. For example, HAART has been very effective at reducing rates of lymphoma and KS, two “AIDS-defining cancers.”

There is also mounting evidence suggesting that a small amount of HIV replication, independent of HIV-mediated immunosuppression, increases the risk of cervical, anal, lung, and certain other cancers. Antiretroviral therapy in HIV-positive patients has not reduced the incidence of these cancers. With lung cancer, where CD4+ T-cell count and functional immunity are not clearly associated with a greater risk, it makes sense that HAART would be less effective. For cervical and anal cancers, the relationship is less clear.

To complicate matters, some cancers, like Hodgkin lymphoma and inflammatory breast cancer, have actually increased in the HIV-positive population with HAART. So, the impact of HIV infection and antiretroviral treatment on cancer incidence may partially depend on the relative contribution of inflammation and infection to a given cancer. In this way, HIV infection may be very useful for informing us about the different pathogenesis of each malignancy.   

Going beyond the study of HIV-associated cancers, what else can genomics reveal about infection-associated malignancies?

Ninety percent of the people in the world are infected with Epstein Barr virus (EBV), which is the cause of a number of different cancers. However, less than 1% of the global population actually gets one of those malignancies. Moreover, where a person lives seems to dictate the type of malignancy that develops. An adolescent in the US infected with EBV will most likely develop Hodgkin’s lymphoma, while that same virus predisposes children in Southeast Asia and Uganda to nasopharyngeal carcinoma and Burkitt lymphoma, respectively. It is likely that the underlying genomics and immunogenetics of the host play an important role in why these cancers have different incidences and clinical manifestations in different places. We can use genomics to compare different tumor types caused by the same virus to investigate this possibility.

 What progress has been made and what is the promise of research projects like HTMCP moving forward?

There have been and still are significant challenges to studying HIV-associated malignancies in Uganda, but there are also real advantages. Uganda has an incredible variety and number of HIV-associated cancers, which provides a great opportunity for learning more about the genomics of these tumors and how they may differ from those in HIV-negative individuals. We have spent the last eight years building an infrastructure in Uganda for studying these malignancies, and I think we are at the point where our efforts are starting to make meaningful improvements in patient care.  


Preview of the New OCG Website

Shannon Behrman, Ph.D., and Gene Gillespie, Ph.D.

The Office of Cancer Genomics (OCG) will soon introduce a new website to meet the evolving needs of the cancer research and broader communities. The modern website merges all of OCG’s active websites, which include the OCG flagship site and the sites for its three current programs, TARGET, CTD2, and CGCI, into one domain at (Figure 1).  It is a user-friendly “one-stop-shop” that provides important up-to-date information on all OCG programs, research, data/data access, news, resources, and more. 

The new website features an enhanced search function for quick access to specific items, such as resources or standard operating procedures (SOPs). It maintains several of the features from OCG’s former websites, including a comprehensive listing of publications, OCG program-related news and announcements, and the OCG e-Newsletter. The goal of the consolidation is to encourage visitors to explore across OCG programs and datasets, as well as to more effectively educate researchers and the public about OCG’s mission and its contributions to cancer genomics research.

Figure 1: New homepage


Exploring Programs: Start from the Top

Information pertaining to each of the three current OCG programs, including instructions for how to access program-specific data, project descriptions, resources, news, and publications, can be accessed directly through the dropdown mega-menu under the Programs tab in the main navigation bar (Figure 2).

Figure 2: Mega menu



Each program has its own welcome page with a left navigation sidebar for accessing all information pertaining to that program (Figure 3). Visitors access these pages two ways: 1) from the homepage by clicking on the program logos and titles, and 2) from any page by selecting the program from the menu under the main navigation bar’s “Programs” tab.

Figure 3TARGET program welcome page. Purple box indicates access point for TARGET data matrix.


Exploring Programs: Jump to the Data

Like before, OCG datasets may be accessed via program-specific data matrices.  The new site features several new matrix access points to allow instant navigation to data from any page on the site:

  • The View [Program] Data Matrix screenshot icon located at the top right of each current program welcome page is also a clickable icon that links directly to that program’s data matrix (Figure 3, purple rectangle).
  • The red ACCESS PROGRAM DATASETS tab to the right of the top navigation bar is a dropdown menu with links to each current program (CGCI, CTD2, and TARGET) data matrix. - (Figure 4, red arrow). To allow quick navigation, this dropdown menu appears on every page of the site.
  • The red Access Data Matrix arrow in the corner of each program box on the homepage is a clickable icon that links directly to the appropriate data matrix (Figure 4, blue rectangles).

Figure 4: Many ways to access data. Blue boxes indicate access points for each of the three current programs’ data matrices. Dropdown menu for accessing these matrices appears on every page of the site and is indicated here with a red arrow.

Although the majority of OCG program data is open-access, some data is protected and requires authorization. For investigators interested in learning how to access data from a particular program, visit the “Using [Program] Data” webpage for specific detailed instructions (See the “Using [Program] Data” pages (Figure 5) for each program).  

Figure 5: Using [Program] Data 


For investigators interested in accessing data from multiple OCG programs, the “How to Access Multiple Datasets” page (Figure 6) has data access policies for all programs. The link to this page is immediately below the “ACCESS PROGRAM DATASETS” tab and appears throughout the site (Figure 6, purple oval). 

Figures 6: How to Access Multiple Datasets. Persistent link to this page is circled in purple.


OCG News & Resources

Important updates will now appear in an alert banner that spans the top of the home page. All OCG news, announcements, and publications can be found on the News and Publications page. In addition to the milestones and news cataloged in the News and Publications, the new site has a beefed up resources section that caters to multiple audiences, including cancer researchers, the broader research community, and the general public. Scientific resources that OCG programs have generated are found on the OCG-Supported Resources page and general tutorials and primers on cancer genomics are on the Educational Resources page. Investigators participating in CGCI will find the OCG Standard Operating Procedure (SOP) manual on both the main Templates & Protocols and CGCI Resources pages.

A New Kind of Website

The new OCG website is supported by Drupal. Drupal is an open-source “content management system” that gives content editors the ability to control and customize content.  Because it is open-source and widely popular with Web developers, Drupal supports the latest Web technologies and provides an abundance of customizable features.  With such a modern platform, OCG can deliver critical information to the research community in real-time.

Help us Better Meet your Needs

OCG aims to meet the needs of the cancer research community and general public. Once the website goes live, we strongly encourage you to peruse the site and provide feedback on any aspects of the site that are challenging or especially useful. Please send your feedback to OCG will use this feedback to develop a “Frequently Asked Questions” page and also a process for efficiently resolving user problems.

Stay Connected

Keep up with the latest OCG news and updates, including the release of the new website this summer, by signing up for email updates or setting up RSS feeds.  


Exploring Cancer Genomes
Integrative Genomics Viewer: Visualizing Big Data

Shannon Behrman, Ph.D.
Exploring the Cancer Genome

Improvements in whole genome sequencing technologies and other genomic characterization approaches over the last two decades have propelled our understanding of the molecular mechanisms of cancer. Large-scale genomic studies, such as Therapeutically Applicable Research to Generate Effective Treatments (TARGET) and The Cancer Genome Atlas (TCGA), have generated rich and diverse genomic data sets for a variety of tumor types. These data sets, which may include copy number, gene expression, mutation, and clinical data, are often diverse and large in volume. This creates a challenge for researchers in their analysis of the data. Visualization tools that allow expert human review of large datasets, as well as computational algorithms, are critical for navigating this data analysis bottleneck and reaching critical insights into cancer biology.

In response to the growing need for integration of complex genomic datasets, a group of scientists and software engineers at the Broad Institute developed a data visualization tool called the Integrative Genomics Viewer (IGV) in 2007 (Figure 1). IGV allows users to visually inspect multiple genomic datasets together, making critical side-by-side comparisons.  Researchers can download IGV locally and use it to explore their own datasets, making it a valuable tool for the biomedical research community.

The IGV User Interface

New users may find the IGV user interface quite familiar, as it is similar to the UCSC Human Genome Browser interface (Figure 1). The top panel displays the genomic coordinates horizontally across the screen. Corresponding genomic data for each sample are listed in the middle panel, and genome features from a user-selected reference genome are shown in the bottom panel. Users have the option of loading a reference genome hosted by IGV, such as Human hg19, or loading an annotated genome of their choice. Each row is a “data track” that corresponds to one sample or experiment. The left-hand column displays sample annotations, such as tissue type, age, and data type, for each data track. Users may group and sort the data tracks based on these sample annotations (Figure 2).  Users can also view multiple types of data, such as copy number and expression data, so that observations across more than one experiment can be made. Mutation data, represented as black boxes, are overlaid on top of all the data types viewed.   

The IGV user interface is dynamic much like the user interface of web mapping applications (e.g., Google Maps). Users can zoom and pan across the genome at different resolutions, where varying degrees of information about the data are provided. The “views” can be separated broadly into three levels of resolution, with each one providing its own unique visualization advantage. Next generation sequencing data provides a great example of IGV’s dynamic interface at these views:

  • Whole genome - displays bar graphs of read coverage where the resolution is too low to discern individual aligned reads.  This view allows users to assess overall data quality and detect trends indicative of experimental error.
  • < 30 kb (by default) - displays individual aligned reads beneath coverage bar graphs starting at this threshold. In the aligned reads, color is used to indicate potential insertions or reads that align to a different chromosome. In the coverage graphs, color is used to indicate loci where greater than 20% of the quality reads deviate from the reference genome (Figure 3a). These features allow users to easily identify structural alterations and putative SNPs, respectively.  
  • Base pair - displays individual base pairs with aligned reads and coverage bar graphs (Figure 3b). This view uses color coding and transparency to highlight base mismatches and the quality of the “base call,” respectively. It also displays the corresponding amino acid sequence below the base pairs, so that users can identify the potential impact of base mismatches on the protein sequence.

The default view displays one genomic region at a time. However, multiple genomic regions may be viewed side-by-side to explore relationships independent of proximity on a chromosome. The “gene-list view” (Figure 4) shows data corresponding to multiple loci of user-defined or pre-defined gene sets. The “split-screen view” shows mate pairs from paired-end sequencing reads that map to two distant regions of the genome, suggesting a chromosomal rearrangement (Figure 5). Because each chromosome is assigned a distinct color, mate pairs that align to a different chromosome are color-coded with its matched chromosome. These color-coded mate pairs are easily discernible against a background of gray paired-end read alignments. Access the split-screen view simply by selecting a color-coded read and choosing “Show Mate Region” from the options menu.

Supported Data Formats

IGV (v2.2) supports a large number of data types in more than 30 different file formats, though there are specific formats that offer optimal performance. IGV provides a package of tools, called igvtools, which converts large source data files into the preferred binary TDF (Tiled Data File).

Some of the common data types that researchers can load into IGV are:

  • Sample metadata (e.g., age at diagnosis, tumor stage, gender)
  • Copy number
  • Loss of heterozygosity (LOH) data
  • Gene expression data
  • Genome annotations
  • Sequence alignment data
  • ChIP-Seq, RNA-Seq
  • Mutation data
  • Variant calls
  • RNAi data

Data Analysis and Data Set Comparison

Aside from visualizing putative SNPs, mutations, indels, and inversions/duplications, researchers may use IGV to perform a variety of advanced genomic analyses, such as identifying alternatively spliced variants, detecting somatic inter-chromosomal rearrangements, de novo mutation calling, and gene set or pathway analysis. Additionally, IGV is easily integrated into analytical pipelines that make use of other complementary analytical tools, such as the Broad Institute’s GenePattern and Memorial Sloan Kettering Cancer Center’s cBio Cancer Genomics Portal. Users can take results from GenePattern, e.g., SNP or gene expression analysis, and plug them immediately into IGV for visualization (learn more at Users visualizing genomic data of specific gene sets in IGV may analyze the same data using the cBio Cancer Genomics Portal to construct gene network maps. From the main menu, go to “Regions” -> “Gene Lists” -> “Select Genes” -> “Retrieve Network.”

After creating integrated visual representations of their data in IGV, researchers may save and share them with colleagues or the greater research community via IGV servers. They may also compare their data to public datasets, such as those from TCGA, 1000 Genomes Project, and ENCODE, by clicking on “File” and then, “Load files from servers,” in IGV’s main menu.

A Powerful Application

IGV is a robust data visualization tool that facilitates genomic data analysis and integration. It has an interactive interface and large number of customizable features that allow researchers to explore their own datasets, making meaningful insights about their data. Additional types of data and data files are forthcoming, as well as improved display features, and more reference genomes.

Visit the IGV website to learn how to launch, navigate, and customize this user-friendly program through easy-to-follow user guides, FAQs, and tutorials.


  1. IGV website 
  2. Helga Thorvaldsdóttir, James T. Robinson, Jill P. Mesirov. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration.  Briefings in Bioinformatics 2012.
  3. James T. Robinson, Helga Thorvaldsdóttir, Wendy Winckler, Mitchell Guttman, Eric S. Lander, Gad Getz, Jill P. Mesirov. Integrative Genomics Viewer. Nature Biotechnology 29, 24–26 (2011)

Guest Editorial
Functionally Prioritizing Somatic Driver Aberrations in Cancer

Kenneth L. Scott, Ph.D., Gordon B. Mills, M.D., Ph.D., Timothy P. Heffernan, Ph.D. and Lynda Chin, M.D.
Cancer Target Discovery and Development

Kenneth L. Scott, Gordon B. Mills, Timothy P. Heffernan and Lynda Chin are principal investigators participating in the Cancer Target Discovery and Development (CTD2) program administered by NCI’s Office of Cancer Genomics. Here, they describe their work to identify and characterize potential therapeutic targets from among the volumes of data being produced by large-scale genomics projects.

The Cancer Genome Atlas (TCGA), the Therapeutically Applicable Research to Generate Effective Treatments (TARGET), the Cancer Genome Characterization Initiative (CGCI), and the International Cancer Genomics Consortium, along with other molecular characterization projects, are cataloging genomic aberrations across major cancer lineages with the goal of identifying the most promising therapeutic targets and diagnostic biomarkers. The game-changing output from these projects promises to radically transform the way cancer science is conducted. At the same time, these efforts have revealed an extraordinary level of genome complexity made up of not only key “driver” events critical to pathogenesis, but also numerous biologically-neutral “passengers” that accompany unstable tumor genomes. The challenge now is to find ways to identify functional driver aberrations, as targeting such events or their activated pathways has the greatest hope of improving patient outcome.

This is a significant challenge, as the collective experience in target discovery and pharmaceutics has taught the community that computational analyses of genomics data alone are insufficient to identify new drug targets.  Rather, successful drug development requires a thorough mechanistic understanding of a target’s cancer activity and the specific biological and genetic context in which it operates. Robust pipelines that allow prioritization of the thousands of potential targets are therefore critical for directing the research community’s limited resources toward the most promising driver candidates.

Unfortunately, the underdevelopment of functional genomics technologies has created a significant roadblock hindering our ability to rapidly assess the biological consequence of somatic aberrations in cancer. While RNAi-based screening platforms have been successfully used to validate new tumor suppressor genes, little progress has been made toward developing gain-of-function screening systems for validating over-expressed or hyper-activated oncogenes. These gain-of-function alterations are especially attractive given the proven efficacy of antibody and small molecule inhibitor therapies that target them.

To complement the power of RNAi-based approaches, our center within the Cancer Target Discovery and Development (CTD2) Network is implementing a scalable gain-of-function screening infrastructure aimed at accelerating functional validation of oncogenic driver events identified by TCGA and other high throughput sequencing efforts (Figure 1). Because each aberration within a given gene might result in a different functional impact or response to clinical therapeutics, we reasoned that it would be necessary to categorize every somatic event within each candidate gene through a combination of literature mining, predictive algorithms and functional characterization.

Figure 1The authors are principal investigators at one of the Centers of the Cancer Target Discovery and Development (CTD2) Network, whose mission is to bridge the gap between cancer genomics research and precision therapies to improve patient care






To circumvent current technical bottlenecks limiting construction of numerous gene expression clones, Dr. Ken Scott’s research group (Baylor College of Medicine) devised a high-throughput mutagenesis and molecular barcoding (HiTMMoB) platform. The platform allows introduction of individual DNA mutations and small DNA insertions/deletions into a collection of over 32,000 sequence-verified human cDNA (or “gene”) clones, which were generated by the Orfeome Collaboration, Scott’s lab, and others. In addition to permitting efficient modeling of gene aberrations, HiTMMoB allows simultaneous “barcoding” of wild-type or mutant genes with unique 24-nuleotide fragments of DNA. Each barcode serves as a surrogate identifier for its associated gene, thus permitting detection and quantitation of driver genes in pooled gene screening strategies. Wild-type or mutant gene expression clones originating from HiTMMoB are subsequently entered into a series of screen-based assays designed to annotate their functionality as cancer drivers.  Indeed, an important feature of this approach is that target cells, once made to express a wild-type or mutant gene, can be entered into parallel screens to maximize discovery potential (Figure 2).

Figure 2: A stepwise prioritization pipeline integrates computational and functional genomic approaches to identify alterations in cancer that are more likely to serve as therapeutic targets or biomarkers.


To do this, Dr. Gordon Mills’ laboratory (University of Texas M.D. Anderson Cancer Center) has implemented a sensor cell screening platform, based on the concept of “Transfer of Driver Addiction.” This platform quantifies the ability of the HiTMMoB expression clones to induce cell survival and proliferation, two key therapeutically targetable phenotypes.  Specifically, the group employs the Ba/F3 myeloid cell system that is dependent on Interleukin-3 (IL3) for survival and proliferation. The goal of this approach is to assess whether addiction to IL3 can be transferred to a putative driver, an event detected by the ability of said driver to promote Ba/F3 proliferation in the absence of IL3.

An important feature of this strategy involves the ability to perform counter screens using a drug library containing “informer” therapeutic compounds targeting critical signaling pathways and cellular functions. IL3-independent growth drivers can then be assessed for their sensitivities to these agents to immediately provide clinically actionable information. This approach is linked to a high-throughput functional proteomics reverse phase protein array (RPPA) platform that can demonstrate functional consequences of each candidate gene. There is remarkable concordance between the therapeutic liabilities predicted by the informer drug screen and the functional consequences of the aberrant candidate gene as measured by RPPA.

This generalized Ba/F3 sensor system is used in parallel with other sensor cells, such as non-tumorigenic breast epithelial cells (MCF10A) and growth factor-dependent cancer cell lines, to identify drivers that alter viability and proliferation in different cellular contexts.  Together, these in vitro approaches permit broad evaluation of driver candidates across multiple cancer lineages. Our initial studies demonstrate that the mutational context under which a candidate is evaluated has a marked impact on functional outcome.

While cell-based screening systems are tractable, in vitro models do not fully recapitulate all hallmarks of tumorigenesis and metastasis. To address this challenge, Dr. Lynda Chin’s team (University of Texas M.D. Anderson Cancer Center) developed a Context-Specific Screen (CSS) platform that interrogates the tumorigenic potency of candidate driver genes under the appropriate in vivo genetic and microenvironment contexts. These in vivo screens utilize genetically defined target cells (e.g., non-transformed human primary cells engineered with known signature genetic alterations) and are performed at orthotopic sites in the mouse (e.g., mammary fat pad sites for breast cancer gene candidates) to ensure the correct microenvironment context. Following orthotopic implantation of target cells virally-infected to express a pool of Dr. Scott’s barcoded candidate genes, genes driving tumor progression are identified from tissues by barcode amplification and sequencing. Enrichment for a driver event is defined as barcoded genes that are significantly higher in output (tumor or metastases) than input (injected cells), following the notion that tumors and metastases positively select driver genes and lose those with no role in tumor progression (i.e., passengers). Importantly, this pooled virus and barcoding approach permits discovery of cooperating driver events that are co-selected in output tumors.

In summary, our work will provide the greater research community multi-level functional assessments of oncogenomics data collected by TCGA and other large scale genomic studies. This level of technology development and biological annotation will create unique opportunities for transformative cancer research by directing the community’s efforts toward likely driver aberrations, thereby accelerating drug development and implementation.