Ryan Morin is a bioinformatics graduate student in Dr. Marco Marra's lab at the British Columbia Cancer Agency's Genome Sciences Centre in Vancouver. Ryan develops tools and assembles pipelines to reconstruct the genomic events that contribute to cancer pathogenesis. He takes an integrated approach, employing a variety of next generation sequencing technologies such as RNA seq, genome and exome sequencing. Although juggling several projects, his main focus has been uncovering genes and pathways implicated in non-Hodgkin Lymphoma (NHL), and more recently, Acute Lymphoblastic Leukemia (ALL). Both of these projects are sponsored by the NCI's Office of Cancer Genomics (OCG) through the auspices of the Cancer Genome Characterization Initiative (CGCI)Opens in a New Tab and Therapeutically Applicable Research to Generate Effective Treatments (TARGET)Opens in a New Tab. After sequencing over 100 NHL tumors, Ryan and colleagues discovered a novel role for chromatin modification, a global form of gene regulation, in the development of this disease. More specifically, they observed the recurrence of mutations in several histone-modifying genes, including the two methyltransferases, EZH2 and MLL2. These findings are well documented in two recent publications in Nature Genetics (2010)Opens in a New Tab and Nature (2011)Opens in a New Tab.
Last year, Ryan contributed an article to the July 2010 issue of the OCG e-newsletter. In the article, he discussed his overall experience – both the toil and the reward - as a bioinformatician dealing with the deluge of data from genomic sequencing. One year later, this genomic information remains unrelenting for researchers like Ryan. Take, for example, the $1,000 genome project, in which researchers are encouraged to bring the cost of genome sequencing down to $1,000 or less by rapidly improving the technologies. This project, funded by the National Human Genome Research Institute (NHGRI)Opens in a New Tab, has incredible potential for accelerating our understanding and treatment of cancer as well as other diseases. NHGRI Director Eric Green said, "As genome sequencing costs continue to decline, researchers and clinicians can increase the scale and scope of their studies." This increase in the scale and scope of genomic data, however, brings certain challenges.
To gain more insight into this issue, we decided to bring Ryan back for an interview to get an up-to-date and more in depth perspective. We also asked him to look towards the future and project where he thinks the field of cancer genomics is going. And, finally, we asked him to share what he's doing currently to follow up on his NHL story.
What are the specific challenges you face as a bioinformatician studying the cancer genome in this era of rapidly advancing sequencing technologies?
I think the big issue right now is that you can sequence a genome fairly cheaply and quickly, but once you get the results, the accuracy, sensitivity, and specificity of picking up mutations are still questionable. We have a sense for how many of the mutations are real, and it's not 100%. This is a problem, especially in a clinical setting. You have to sequence more deeply to capture the mutations, which can help increase the sensitivity. It's known there are spots in the genome we can't sequence efficiently. This is due to nucleotide sequences that are very rich in guanines (G) and cytosines (C) or, alternatively, adenines (A) and thymines (T). This results in reduced sequence coverage for the first exons of many genes, because they are often rich in Gs and Cs. There have been solutions proposed for ameliorating this problem, for instance, capturing just the GC-rich exon on a separate array and then sequencing the exon. Additionally, the sensitivity is worse for exome sequencing. There are certain genes that were omitted from the original exome design, like MLL2 for example, which is possibly due to technical issues and restrictions of the design process itself.
Finally, you can get a genome sequenced in less than a month, but you then have to go through a second round of verification to improve the confidence in your results. The downside to this second round of verification is that it can take much longer than the original sequencing experiment.
What does the second round of verification entail?
Either targeted capture or PCR (polymerase chain reaction), depending on the scale of your study. Targeted capture uses a set of "baits" to pull down regions of the genome that are then sequenced or validated. It is very useful for looking at hundreds of mutations. For smaller studies, where the number of mutations is much less, PCR allows you to amplify and sequence specific regions of the genome where the mutations are present. Unfortunately, PCR doesn't always work the first round and may require multiple rounds of optimization.
Both capture and PCR can become an iterative process. I don't see it improving much in the near future. Hopefully, some of these rapid turnaround sequencers, such as MiSeq and Ion Torrent, along with streamlined library construction will accelerate the verification stage.
How would MiSeq and Ion Torrent speed up verification?
MiSeq is designed to perform sequencing faster and at a smaller scale. With less surface area and imaging time, there are fewer reagents, so the cost is less. Essentially, MiSeq compressed the entire sequencing schedule, so you can run a sample in a matter of a few days. Ion Torrent is even faster, because it uses different chemistry and doesn't use any fluorescence or imaging. Its sequencing runs are on the order of 2 hours. As these tools emerge, this second round of verification will hopefully become a final step, done quickly.
You've discussed the technical challenges, but what about the analytical challenges like distinguishing between a driver mutation (a mutation that 'drives' the cancer event) and a passenger mutation (a mutation that doesn't play a role in cancer)? What do you do to overcome these issues?
Analytical challenges are big problems. When you sequence a genome, you don't necessarily know how to distinguish the importance of the somatic events. I think the cancers that have very high mutation rates are going to be difficult. Diffuse Large B-cell Lymphoma (a type of NHL) doesn't have a high mutation rate, if you compare it to the spectrum of other cancer genomes. But, there are still passenger mutations present. Just because you see a mutation in EZH2, doesn't mean that it's a driver. At the end of the day, it will come down to doing in vitro experiments, where individual mutations will have to be explored.
We are at the discovery stage now, trying to create a working parts list of cancer. We have statistical tools that are intended to model the mutation pattern and identify the genes being selected beyond chance. There are commonly mutated drivers that are seen in 10%-50% or more of patients, and other mutations seen in a very small percentage of patients, which are still important. Will these genes mutated at lower incidence be important from a clinical standpoint for a drug target? Probably not. But if they are in a common pathway, then perhaps we can target that pathway, and that is important.
Any surprise advances that have facilitated analysis in the field of cancer genomics?
I'm amazed by how big the community around next generation sequencing has become. There is a lot more sharing of software these days. There are numerous short-read aligners that people have written, made open-source, and put online, such as the various tools for SNP calling and for finding structural alterations. Some examples of these open source tools are Samtools, BWA, Picard, GATK, and Galaxy. For bioinformaticians, these tools are facilitating data analysis, leaving more room to address biological questions from our data. I'm impressed by how much the community tackles these problems.
Where do you see the field going? Where would you like it to go?
It's really hard to say. I hope that people create tool kits, so you are not reinventing the wheel again and again. People have generated enough of these tools that could be placed into an analytical pipeline, making the analysis fairly automated. This automated analysis would allow one to ask specific questions of the cancer genome, such as 'What is common or different?' without having to fully understand the analysis itself. This is the model we hope to see from the genome sequencing and analysis company, Complete Genomics, which now offers structural rearrangements, somatic mutations, and SNP calls as part of its services. Ideally, that is what people are going to want.
What are you doing to follow up on your NHL story? Is there a drug in the pipeline?
Two papers from last year, one from our group and one from a company, showed the mutant EZH2 (Tyr641) might actually have enhanced enzymatic activity in the presence of the wild type enzyme, making it a possible gain-of-function mutation. These two studies demonstrated that mutant EZH2 has reduced function in catalyzing the first methylation step, but enhanced function in catalyzing the subsequent two methylation steps. The reason we didn't detect that in our original paper is because we only tested EZH2 (Tyr641) alone, without the wild type protein present. Now that it's been shown to be a gain-of-function mutation, EZH2 (Tyr641) is being pursued as a drug target in NHL here at the Cancer Agency, by Epizyme and potentially other companies. The Cancer Agency is looking at mouse models with the EZH2 (Tyr641) mutation and asking whether they get lymphoma. Once/if they develop lymphomas, we plan to inject them with small molecules that are predicted to inhibit this protein and look for a response. Also, we have a much larger list now of genes predicted to be under selection in NHL that we are currently following up on.