HPV Typing and Integration

Data Generation Protocol Data Analysis Protocols
CaCxIcon for HTMCP- Cervical Cancer Project CaCxIcon for HTMCP- Cervical Cancer Project

Data Generation Protocols

The data generation protocols for the HTMCP-Cervical Cancer project were acquired from the following manuscript.

Gagliardi A, Porter VL, Zong Z, et al. Analysis of Ugandan cervical carcinomas identifies human papillomavirus clade-specific epigenome and transcriptome landscapes. Nat Genet. 2020;52(8):800-810. (PMID: 32747824)

HPV typing

Microbial detection, HPV typing and HPV integration detection were performed using BioBloom tools (BBT, v2.0.11b). Where 2 or more HPV types were integrated (n=3), the dominant type was determined by E6/E7 expression. Where no integration was found (n=9), the dominant HPV was determined by the type with the most read evidence.

Data Analysis Protocols

The data analysis protocols for the HTMCP-Cervical Cancer project were acquired from the following manuscript.

Gagliardi A, Porter VL, Zong Z, et al. Analysis of Ugandan cervical carcinomas identifies human papillomavirus clade-specific epigenome and transcriptome landscapes. Nat Genet. 2020;52(8):800-810. (PMID: 32747824)

HPV expression

To determine expression of HPV genes, fasta genome references and gff annotation files were downloaded from NCBI for 16 HPV strains. HPV51 did not have a gff file so one infected sample was excluded (samples with HPV expression n=117). Samples were aligned to their HPV strain using BWA-mem v0.7.6a Sambamba. The fraction of reads with sequencing quality greater than Q10 within gene boundaries were counted and normalized to reads per kilobase of exons per million reads mapped to HPV (RPKM).

Clustering analysis was performed on E1, E2, E6 and E7 log10(RPKM) with ConsenusClusterPlus R package (v1.38.0) using the ‘pearson’ method and ‘ward.D2’ linkage, and the sample scaled expression was visualized using the ‘pheatmap’ R package (v1.0.10). E4 and E5 z-scores were added after clustering as these genes are not present in every HPV type. Differential gene expression between clusters was done by running DESeq2 R package (v1.14.1) to compare each cluster to the two other clusters, and genes were filtered using an adjusted p-value <0.05, >1.5-fold change in mean expression, and a baseMean expression >1000. Functional enrichment of the significantly differentially expressed genes was performed using STRING (v11.0).

HPV integration events and ChIP

HPV integration sites were determined using chimeric reads mapping to both human and HPV genomes. Within each sample, integration sites were merged into a single integration event (n=257) if they were <500 kb apart. HPV integration hotspots were determined by counting the number of events that fell within a 500 kb bin across the genome.

ChIP-seq alterations at HPV events were clustered using the log2(fold-change) of normalized coverage (RPM) of the integrated sample versus the mean RPM of the unintegrated samples using the ‘pheatmap’ (v1.0.10, R) with a ‘ward.D2’ clustering method. Events <20 kb were extended to 20 kb to obtain adequate coverage of the region.

For each mark per event (6 modifications in 99 events), a control peakset was made by randomly selecting 1,000 peak regions of the same mark on the same chromosome as the event, and extending the peaks from the center to the same size as the event. Normalized ChIP-seq coverage of the histone modification at these 1,000 random peaks was counted in the 52 samples, and the log2(fold change) of coverage was calculated for the integrated sample. A p-value was calculated for the fold change of the integration event based on the distribution of fold changes in these control peaks. Benjamini-Hochberg adjusted p-values<0.05 was regarded as significant.

HPV integration events and expression

For each integration event (n=257), we identified all protein-coding genes (hg19 Ensembl (v75), n=20,232) that fell within the event +/-10 kb, which revealed 255 genes near integration events. Fold changes of integrated samples were calculated based on the mean expression of all samples lacking events, and p-values were derived from the distribution of expression of the gene across all samples. Oncogenes were identified by OncoKB. The same method was applied to identify ERVs upregulated at HPV integration events (n=34 events). Samples were labelled as having a statistically significant integration event if they had a fold change≥2 and Benjamini-Hochberg adjusted p-value≤0.05 for genes, and fold change≥10 and Benjamini-Hochberg adjusted p-value≤0.05 for ERVs based on the distribution of fold changes for each respectively. Samples with significant events were correlated with T cell infiltration scores from CIBERSORT, and genes from the gene ontologies for dsRNA sensing pathways (GO:0043330) and type I interferon signalling (GO:0060337).

Last updated: August 07, 2020