Dana-Farber Cancer Institute: Computational Correction of Copy-number Effect in CRISPR-Cas9 Screens of Cancer Cells

Genome-wide CRISPR-Cas9 screens were performed in 625 cell lines. The results were processed with the CERES algorithm to produce copy-number and guide-efficacy corrected gene-knockout effect estimates.

Experimental Approaches

Cancer cell lines were transduced with a lentiviral vector expressing the Cas9 nuclease under blasticidin selection (pXPR-311Cas9). Each Cas9-expressing cell line was subjected to a Cas9 activity assay1 to characterize the efficacy of CRISPR/Cas9 in these cell lines. Cell lines with less than 45% measured Cas9 activity were considered ineligible for screening. Stable polyclonal Cas9+ cell lines were then infected at low multiplicity of infection (MOI < 1) with a library of 76,106 unique sgRNAs (Avana), which upon remapping was composed of 72,753 targeting 18,566 genes (~4 sgRNAs per gene) annotated in the Consensus CoDing Sequence (CCDS) database, 3,353 targeting either non-coding sequences or sequences previously annotated as coding, and 995 non-targeting control sgRNAs. Cells were split into at least two replicates and selected in puromycin and blasticidin for 7 days and then passaged without selection while maintaining a representation of 500 cells per sgRNA until 21 days after infection. Genomic DNA was purified from endpoint cell pellets, the sgRNA barcodes were PCR amplified with sufficient gDNA to maintain representation, and the PCR products were sequenced using standard Illumina machines and protocols.

Cell lines that failed the Single Nucleotide Polymorphisms (SNPs) fingerprinting described above were removed. Raw sgRNA barcode counts were deconvoluted from sequence data using PoolQ software (https://portals.broadinstitute.org/gpp/public/software/poolq) and summed across sequencing lanes. Samples were removed if they failed to reach 15 million reads. Normalized read counts for each sample were calculated according to the procedure described in Cowley et al.2 Pairwise Pearson correlation coefficients between replicate samples from the same cell line were calculated to identify and remove poor quality replicates using a threshold of 0.7. All sample read counts were then divided by their representation in the starting plasmid DNA library (pDNA) to compute a Fold-Change (FC). Strictly Standardized Mean Difference (SSMD)3 statistics were computed for the replicates using FCs between non-targeting control sgRNAs and FCs from sgRNAs targeting the spliceosomal, ribosomal, or proteasomal genes in KEGG genesets (https://www.kegg.jp/kegg/). Replicates with SSMDs that fail to reach -0.5 were removed. logFC data were then normalized within each cell line replicate by subtracting the median logFC value and dividing by the Median Average Deviation (MAD) before input to CERES. After QC, 18 cell lines had one replicate, 206 cell lines had two replicates, 93 cell lines had three replicates, and 24 cell lines had four replicates.

Copy Number (CN) data for all cancer cell lines were obtained from the Cancer Cell Line Encyclopedia (CCLE)4 data portal (https://portals.broadinstitute.org/ccle). The dataset CCLE_copynumber_2013-12-03.seg.txt was used for analysis. This set was derived from Affymetrix SNP6.0 arrays. Segmentation of normalized log2 ratios was performed using the Circular Binary Segmentation (CBS) algorithm. All copy-number data presented represent a relative copy number for each cell line where a value of two represents the average ploidy of the cell line.

Subsequently, logFC scores for each sgRNA in each cell line and segmented copy number for each cell line were supplied to CERES, together with a mapping of sgRNAs to the hg19 reference genome to infer gene knockout effect and sgRNA on-target efficacy. Additional details regarding the experimental protocols and CERES algorithm can be found in Meyers et al 5.  Additional details regarding data processing can be found in Dempster et al 6


Access the Raw/Analyzed Data (figshare) 


For questions, please contact Joshua Dempster.



  1. Aguirre AJ, et al. (2016). Genomic Copy Number Dictates a Gene-Independent Cell Response to CRISPR/Cas9 Targeting. Cancer Discov. 6(8):914-929. (PMID: 27260156)
  2. Cowley GS, et al. (2014). Parallel genome-scale loss of function screens in 216 cancer cell lines for the identification of context-specific genetic dependencies. Sci Data. 1:140035. (PMID: 25984343)
  3. Zhang XD. (2007). A pair of new statistical parameters for quality control in RNA interference high-throughput screening assays. Genomics. 89(4):552-561. (PMID: 17276655)
  4. Barretina J, et al. (2012). The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 483(7391):603-607. (PMID: 22460905)
  5. Meyers RM, et al. (2017). Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nat Genet. 49(12):1779-1784. (PMID: 29083409)
  6. Dempster JM, et al. (2019). Extracting Biological Insights from the Project Achilles Genome-Scale CRISPR Screens in Cancer Cell lines. BioRxiv. (720243)
Last updated: December 19, 2019