The International Cancer Genome Consortium

The completion of the human genome sequence in 2003 sparked a global revolution in human disease research. For cancer research, the complete sequence meant that scientists could compare the DNA and RNA sequences between cancer and case-matched normal tissues to determine the genetic features of these diseases. Large-scale cancer genome characterization projects were initiated at Johns Hopkins University (US)1, Wellcome Trust Sanger Institute (UK)2, and the National Cancer Institute/National Human Genome Research Institute (US)3. By leveraging advances in biomedical research, genomics technologies, and bioinformatics, each of these initiatives cataloged an array of genetic alterations in different tumor types. While the results were enlightening and exciting for the cancer research community, it called to attention the need for more open dialogue and data sharing between researchers participating in disparate cancer genome studies around the world.
To address this pressing need, cancer researchers and representatives of government institutions from 22 countries held a meeting in 2007 in Toronto, Canada. They discussed creating an international consortium that would serve as a hub for communication and exchange of “lessons learned,” as well as a venue to share results and provide opportunities for collaborations. The idea was well received by the meeting’s participants, and the International Cancer Genome Consortium (ICGC)4 was established. The ICGC formed a number of working groups to develop and refine policies for global cancer genome analysis and data sharing. These groups would draw from the insights gleaned from The Cancer Genome Atlas in the US, Cancer Genome Project in the UK, and other large-scale initiatives.
The ICGC Mission
The ICGC serves two main purposes. First, it serves as a centralized communications forum for the international scientific community, where researchers regularly share information and engage in helpful discussions about cancer genome research. Secondly, it deploys a bioinformatics database and portal (below) with the ultimate goal of warehousing the “genomic, transcriptomic and epigenomic changes in 50 different tumor types and/or subtypes which are of clinical and societal importance across the globe” 5. It is an incredibly ambitious effort, with 17 countries currently participating (Figure 1) and 42 cancer genome projects included to date.
To encourage submission of high quality, reliable, and comprehensive data for each of these projects, ICGC provides guidelines for participation. Strong recommendations are included for various aspects of genomic analysis, from tumor number and sample quality to data collection and generation. For example, data must be high quality and resolution (sequence-level is preferred but not required) with a minimum level of coverage. To allow for comprehensive analysis with clinically relevant discoveries, ICGC encourages researchers to provide a range of somatic mutations and alterations. This might include single nucleotide variants, indels, and chromosomal rearrangements, along with clinical, histopathological, and, if any, environmental data for each tumor type or subtype.
Introducing the ICGC Data Portal
Compiling vast amounts of data from ICGC projects across the globe and disseminating them to the international research community make data storage and management a herculean challenge. The ICGC developed a portal, called the ICGC Data Portal to provide a single location where all analyzed data in standardized formats would reside. Member-country projects, which house their own raw data in respective local databases in disparate formats, send synthesized data in universal formats to the ICGC Data Portal. The Portal, which was launched in April 2010, is an easy-to-use web platform where users can visualize, query, and download “open-access” ICGC project data.
To balance reliability with need for timeliness, the ICGC releases approved, universally formatted data on a quarterly basis. The most recent data release (#15) occurred on February 3, 2014. In future releases, the Portal will undergo restructuring to accommodate summary and other data. Previous versions reside in the ICGC Data Repository. The ICGC Data Coordination Center (DCC), which is housed in the Ontario Institute for Cancer Research, manages the data portal and releases.
Figure 1: A list of countries participating in ICGC as of February 2014
Protecting Patients: A Major Challenge in Global Data Sharing
Another enormous challenge in genome research is generating and sharing data that result in impactful discoveries without compromising the confidentiality and rights of patients. For ICGC, this issue is further complicated because each country has its own laws for patient protection, informed consent, and institutional review board (IRB) approval processes. ICGC has taken into account these legal and regulatory differences and developed suggested guidelines for informed consent, data access, and ethical oversight that minimize the risk of individual patient identification without impeding important research opportunities. These guidelines evolve over time to keep up with ever-changing laws of participating countries (read the latest version from 2013). Continued dissemination of such guidelines to the worldwide research community is a major objective of ICGC.
To safeguard patient identities, only data that cannot be directly linked to an individual are available through the ICGC Data Portal. Such data are stripped of direct identifiers (e.g. names and social security numbers) and may include patient and tumor information, such as gender, age range, and histologic type/subtype, as well as interpreted data, such as normalized gene expression, computed copy number, and somatic variants. Nevertheless, individually unique, but not directly identifying, patient data, such as genotypes found in primary sequence data, do pose a theoretical risk of patient re-identification. Such data are considered “controlled access,” which means approval from corresponding funding institutions’ data access committees is required for use. There are two entities within ICGC that oversee controlled- access: the Data Access Compliance Office (DACO), which handles controlled data access requests, and the International Data Access Committee (IDAC), which helps establish data access guidelines and supervises DACO activities.
It is important to note that the regulations and policies of the different member countries vary, and some (including the US) may not submit controlled-access data to the ICGC Portal. In this case, researchers must apply to the country’s local agencies in order to obtain data directly from that project’s data repository. Examples of such repositories include the National Center for Biotechnology Information’s Database of Genotypes and Phenotypes, National Cancer Institute’s Data Coordinating Centers, and the Cancer Genomics Hub in the US or The European Bioinformatics Institute’s European Genome-phenome Archive in the UK.
OCG’s Therapeutically Applicable Research to Generate Effective Treatments (TARGET) and Cancer Genome Characterization Initiative (CGCI) are contributing interpreted data as it is published and verified. TARGET Acute lymphoblastic lymphoma phase I data, including clinical, somatic mutation from whole genome sequencing and RNA-seq, and gene expression data from 229 donors, were released on September 26, 2013 in release #14. Copy number and gene expression data generated by TARGET’s Neuroblastoma project were released February 3, 2014 with the most recent data release (#15).
Inspiring a Global Shift Toward Open Communication
The ICGC has a ten-year goal of providing the international research community access to data that molecularly characterize over 50 types of cancer. Already ahead of schedule, the ICGC continues to expand the number of available datasets by adding or initiating new projects, such as the Singapore biliary tract cancer project announced in late 20136. The ICGC is structured to dynamically respond to and evolve with changes that happen in technology development, patient consent, and data sharing. As the number of tumor types and amount of data grow, ICGC will undoubtedly continue to provide a unique global resource for cancer genome data.
The forward thinking that ICGC applies to international communication is also having a wider reaching influence over genomics data sharing across diseases. In June 2013, representatives from many countries met and signed a “global alliance letter of intent” that represented a commitment to responsibly share genomic and clinical data worldwide7, while protecting patients. Representatives from both the National Cancer Institute and National Human Genome Research Institute, along with over 60 other institutions, signed this document. These efforts to communicate and share data internationally are steps in the right direction toward working as a global community to advance the genomics field and capitalize on discoveries to help understand and treat diseases.
References
- Wood LD, Parsons DW, Jones S, Lin J, Sjoblom T, et al. (2007). The genomic landscapes of human breast and colorectal cancers. Science 318:1108-1113 (PMID: 17932254)
- Greenman C, Stephens P, Smith R, Dlagliesh GL, Hunter C, et al. (2007). Patterns of somatic mutation in human cancer genomes. Nature 446:153-158 (PMID: 17344846)
- Cancer Genome Atlas Research Network (2008). Comprehensive genomics characterization defines human glioblastoma genes and core pathways. Nature 455:1061-1068 (PMID: 18772890)
- Hudson T J, Anderson W, Artez A, Barker A D, Bell C, Bernabé RR, et al. (2010). International network of cancer genome projects. Nature 464(7291):993–8 (PMID: 20393554)
- http://icgc.org
- http://icgc.org/files/icgc/ICGC%20News%20Release%203Nov2013.pdf
- http://www.phgfoundation.org/news/14050/