Gene Expression

Data Generation Protocols Data Analysis Protocols
Gene Chip® Human Exon ST Array (Affymetrix) NBLNeuroblastoma symbol , OSOsteosarcoma symbol
Gene Chip® Human Gene 1.1 ST (Affymetrix) AMLAML symbol
Gene Chip® Human Genome U133 Plus 2.0 Array (Affymetrix) ALL P1ALL symbol , ALL P2/MDLSALL symbol , CCSKKidney tumor symbol , WTKidney tumor symbol , PPTPPPTP symbol
SurePrint G3 Human Gene Expression Array (Agilent) PPTPPPTP symbol

Gene Chip® Human Exon ST Array (Affymetrix) for Neuroblastoma (NBL)

RNA was extracted from Optimal Cutting Temperature (OCT) embedded primary tumor tissues using TRIZOL based methods with QIAGEN RNAeasy clean up at either Children's Hospital Los Angeles, Children's Hospital of Philadelphia or the Children's Oncology Group Biopathology Center at Colombus, Ohio.

Manufacturer's protocol was used to label extract, hybridize, and scan the human exon arrays (Affymetrix Human Exon Array Labeled Extract, Affymetrix Human Exon Array Hybridization Protocol, Affymetrix Human Exon Array Scan Protocol).

Level 2 data from normalization and summariztion using rma-skectch analysis of Affymetrix APT tools (version 1.16.0). Level 2 batch effect corrected (BER) data were obtained by removing the batch effect observed related to RNA source of the specimens. Generalized linear model (GLM - R version 3.10) was used to remove institutional batch effect by fitting a model for each of the Human Exon array probeset regions (PSR) to the batch effect (RNA source by institution). This GLM model was adjusted for risk groups based on stage and MYCN amplification status.  This Level 2 data was used to generate all subsequent data transformations.

Level 3 based on PSRs that are part of the 'core' annotation.  The data was derived from Level 2 BER data.  First PSRs with low expression (less than median expression level of entire dataset) and low coefficient of variation (less than median cv of entire dataset) were removed (~10% of PSRs) prior to averaging of PSRs by Transcript ID (based on Affymetrix Annotation). Level 3 based on PSRs that are part of the 'extended' annotation.  The data was derived from Level 2 BER data.  First PSRs with low expression (less than median expression level of entire dataset) and low coefficient of variation (less than median cv of entire dataset) were removed (~10% of PSRs) prior to averaging of PSRs by Transcript ID (based on Affymetrix Annotation). Level 3 based on PSRs that are part of the 'full' annotation.  The data was derived from Level 2 BER data.  First PSRs with low expression (less than median expression level of entire dataset) and low coefficient of variation (less than median cv of entire dataset) were removed (~10% of PSRs) prior to averaging of PSRs by Transcript ID (based on Affymetrix Annotation). Level 3 based on PSRs that are part of the 'core' annotation.  The data was derived from Level 3 BER transcript data set where PSRs with low expression (less than median expression level of entire dataset) and low coefficient of variation (less than median cv of entire dataset) were removed (~10% of PSRs) prior to averaging of PSRs by Gene Symbol (based on BioCore Package Affymetrix huex10 annotation data - huex10stprobeset.db. Mappings were based on data provided by: Entrez Gene ftp://ftp.ncbi.nlm.nih.gov/gene/DATA, with a date stamp from the source of: 2014-Mar13).

Level 3 based on PSRs that are part of the 'extended' annotation.  The data was derived from Level 3 BER transcript data set where PSRs with low expression (less than median expression level of entire dataset) and low coefficient of variation (less than median cv of entire dataset) were removed (~10% of PSRs) prior to averaging of PSRs by Gene Symbol  (based on BioCore Package Affymetrix huex10 annotation data - huex10stprobeset.db. Mappings were based on data provided by: Entrez Gene ftp://ftp.ncbi.nlm.nih.gov/gene/DATA, with a date stamp from the source of: 2014-Mar13). Level 3 based on PSRs that are part of the 'full' annotation.  The data was derived from Level 3 BER transcript data set where PSRs with low expression (less than median expression level of entire dataset) and low coefficient of variation (less than median cv of entire dataset) were removed (~10% of PSRs) prior to averaging of PSRs by Gene Symbol (based on BioCore Package Affymetrix huex10 annotation data - huex10stprobeset.db. Mappings were based on data provided by: Entrez Gene ftp://ftp.ncbi.nlm.nih.gov/gene/DATA, with a date stamp from the source of: 2014-Mar13)

Gene Chip® Human Exon ST Array (Affymetrix) for Neuroblastoma (NBL)

*Protocols performed at the Children’s Hospital of Los Angeles and Texas Children’s Hospital.

RNA labeled using labeling protocol described by Affymetrix and reagents from Affymetrix.

Samples were hybridized using Affymetrix hybridization kit materials and protocols on the Affymetrix Fluidics Station 450.

Scanning of the microarrays was performed according to Affymetrix's recommended protocol for the Affymetrix Genechip Scanner 3000 7G.

Data preprocessing and normalization done using the affymetrix APT package with RMA.

Exon level data (L2) transfomred into gene level data (L3) by averaging the probesets per gene.

Protocols were performed at Hudson Alpha, Inc., and the Fred Hutchinson Cancer Research Center.

All microarray experiments were performed according to manufacturer’s protocol using the Ambion WT Expression Kit, the GeneChip WT Terminal Labeling and Controls Kit, and the GeneTitan. The arrays were hybridized according to manufacturer’s protocol to the Human Gene 1.1 ST 96-Array Plate using the Affymetrix GeneTitan.

Arrays were scanned and raw image data (intensity files) were generated using Affymetrix GeneChip Command Console Software.

Raw intensity files were imported into Affymetrix Expression Console Software and normalized using the Robust Multichip Analysis-sketch workflow to assess quality control parameters and ensure uniform performance across the data set.  All raw files were uploaded into Partek Genomics Suite (St. Louis, MO) and RMA normalized upon import. 

To assign a single value per gene ID, multiple cluster IDs mapping to the same gene ID were averaged into one value.  The level 3 file represents the average (if multiple cluster IDs are represented) value per gene.

Gene Chip® Human Genome U133 Plus 2.0 Array (Affymetrix)  for Acute Lymphoblastic Leukemia Phase I (ALL P1)

*This protocol was performed at University of New Mexico.

1-3 µg of total RNA was labeled and hybridized to Affymetrix U133_Plus_2 arrays according to the manufacturer's recommendations (Affymetrix). A mask to remove uninformative probe pairs and Affymetrix controls was applied to all the arrays (resulting in the removal of 171 probe sets) and the default Affymetrix MAS 5.0 normalization was used on the remaining 54,504 probe sets. Array experimental quality was assessed using the following parameters, and all arrays met these criteria for inclusion:

  • GAPDH more than 5000
  • more than 20% expressed genes
  • GAPDH 3./5. ratios less than 4
  • linear regression R2 values of spiked poly(A) controls more than 0.90.

This gene expression dataset may be accessed via the NCI caArray site or at Gene Expression Omnibus under accession number GSE11877.
Microarray gene expression profiling data were available from an initial 54,504 probe sets after masking and filtering of minimal probe sets and controls (Supplemental data). Three different unsupervised, unbiased methods were used to select genes for standard hierarchical clustering: High Coefficient of Variation (HC) as originally described by Eisen et al.1, Cancer Outlier Profile Analysis (COPA), and Recognition of Outliers by Sampling Ends (ROSE), a novel method similar to COPA developed in the Richard Harvey laboratory at the University of New Mexico2. In HC, the 54,504 probe sets were ordered by their coefficients of variation and the highest 254 probe sets were used for clustering; this method identifies probe sets having an overall high variance relative to mean intensities. COPA selects outlier probe sets, also in an unsupervised fashion, on the basis of their absolute deviation from median at a fixed point (typically the 95th percentile). ROSE was developed as an alternative to COPA, and selects probe sets both on the basis of the size of the outlier group they identify as well as the magnitude of the deviation from expected intensity (ROSE and COPA)2. For all 3 probe selection methods, the top 254 probe sets (Harvey et al.; supplemental Table 7A2) were clustered using EPCLUST (Version 0.9.23 beta, Euclidean distance, average linkage UPGMA). A threshold branch distance was applied, and the largest distinct branches above this threshold containing more than 8 patients were retained and labeled. The HC method was used as the basis of cluster definition and nomenclature, with each of the 8 predominant clusters first identified through HC being assigned a number (H1-H8). All clusters are prefixed by the method of their probe set selection (H indicates HC; C, COPA; and R, ROSE), with COPA and ROSE numbers being assigned based on the similarity of a specific cluster group's membership (patient membership) to that seen in the original H clusters. The top 100 median rank order probe sets for each ROSE cluster are provided in Supplemental data. In the validation cohort (COG CCG 1961), the same initial masking criteria were applied to the raw data, yielding 54 504 probe sets for analysis. Applying ROSE with the same parameters used for the COG P9906 ALL cohort2, 167 probe sets were identified for clustering. The selection criteria used for COG P9906 was also used for COPA and HC, and the top 167 probe sets derived from these methods were used for hierarchical clustering (Harvey et al.; supplemental Table 7A2).

RNA Sample Preparation Methodology
Gene Expression Profiling Method2 RNA was isolated from pretreatment diagnostic ALL samples in the 207 patients (131 bone marrow, 76 peripheral blood) using TRIzol (Invitrogen); all samples had more than 80% leukemic blasts.

References:

  1. Eisen MB, Spellman PT, Brown PO, Botstein D (1998). Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. 95 (25):14863-14868 (PMID: 9843981)
  2. Harvey RC, Mullighan CG, Wang X, Dobbin KK, Davidson GS, Bedrick EJ, Chen IM, Atlas SR, Kang H, Ar K, Wilson CS, Wharton W, Murphy M, Devidas M, Carroll AJ, Borowitz MJ, Bowman WP, Downing JR, Relling M, Yang J, Bhojwani D, Carroll WL, Camitta B, Reaman GH, Smith M, Hunger SP, Willman CL. (2010). Identification of novel cluster groups in pediatric high-risk B-precursor acute lymphoblastic leukemia with gene expression profiling: correlation with genome-wide DNA copy number alterations, clinical characteristics, and outcome. Blood. 116 (23), 4874-84 (PMID: 20699438)

Gene Chip® Human Genome U133 Plus 2.0 Array (Affymetrix)  for Acute Lymphoblastic Leukemia Phase II and Xenografts (ALL P2 & ALL MDLS)

*Protocols performed at the University of New Mexico.

cRNA for hybridization to U133_Plus_2.0 arrays was performed according to Affymetrix's recommendations (GeneChip Expression Analysis Technical Manual).  First, 300 ng of total RNA was converted to cDNA.  Biotinylated cRNA was generated from the cDNA and 15 µg was subjected to fragmentation.  Either the Affymetrix One-Cycle Target Labeling Kit or the Affymetrix 3' IVT Express Kit was used.  This v02 labeling protcol differs from v01 because the labeling kit changed. Affymetrix changed the IVT kit between 2008 and 2009.  While most of the gene expression patterns remain the same, there are some pronounced differences that may result in set effects when trying to merge data generated from the different labeling kits.

Hybridization of 12.5 µg fragmented biotinylated cRNA was performed according to Affymetrix's recommendations (GeneChip Expression Analysis Technical Manual).

Scanning of the microarrays was performed according to Affymetrix's recommended protocol (GeneChip Expression Analysis Technical Manual).

Data were masked according to the method outlined in Harvey et al, Blood 116:4874-4884, (2010) in order to remove uninformative probe pairs.  Default MAS 5.0 normalization was performed on the masked data using Expression Console software (Affymetrix).

The non-collapsed GCT file is simply the masked MAS 5.0 data from the CHP files formatted as a GCT file.  Level 3 data were generated by using the CollapseDataset algorithm of GenePattern:  http://www.broadinstitute.org/cancer/software/genepattern/.  In applying this software, the "maximum" (as opposed to "median") probeset setting was used, and the gene-to-probeset associations were obtained from the file AFFYMETRIX.chip downloaded from ftp://gseaftp.broadinstitute.org/pub/gsea/annotations/.

Gene Chip® Human Genome U133 Plus 2.0 Array (Affymetrix) for Clear Cell Sarcoma of the Kidney (CCSK)

Total RNA was used for gene expression analysis using the Affymetrix 133 plus 2.0 array (Affymetrix, Santa Clara, CA, USA), performed according to the manufacturer’s protocol. The arrays were analyzed using Gene-Chip Operating Software (GCOS) and Robust Multichip Average (RMA) normalization was performed. Differentially expressed genes were identified using a significance analysis of microarrays (SAM)1; q-values of < 0.01 and fold changes of > 2 were considered significant. Gene Set Enrichment Analysis (GSEA), version 2.0.142 was performed using 1000 permutations and phenotype permutation. Lists with at least 50 genes of canonical pathways, biologic processes and oncogenic signatures with a false discovery rates (FDR) of < 20% and p-value of < 0.05 were considered significant. Pearson correlation coefficient (PCC) calculation was performed using the RMA-normalized Level 3 gene expression data for 76 favorable histology Wilms tumors available in the TARGET Data Matrix. Hierarchical clustering was performed by using GenePattern’s Hierarchical Clustering module (column distance measure = Pearson correlation; row distance measure = Pearson correlation; clustering method = pairwise average-linkage) and were visualized by the HierarchicalClusteringViewer module.

Specifically:

RNA was extracted from tumor samples at Nationwide Children's BioPathology Center (BPC) by using the standard BPC protocol. RNA quality was assessed by a bioanalyzer and RNA samples were required to have a RIN > 7. Total RNA was provided to Lurie Children's Hospital Research Center at a concentration of 150 ng/ul (2 ug total) in sets of 16 samples. One WT sample for which sufficient column-purified RNA was available was selected to serve as a control sample (PAJMLZ). Each set of 16 samples received from the BPC included the WT control sample, which was therefore repeated throughout all steps of this procedure in order to ensure consistency among all steps.

250 ng of total RNA was labeled by using the Affymetrix GeneChip 3' IVT Express Kit at Lurie Children's Hospital Research Center.  All procedures, including 1st strand reverse transcription, 2nd strand synthesis, in vitro transcription of aRNA, aRNA purification, quantitation, and fragmentation were performed according to the manufacturer's protocol.

Nucleic acid hybridization to the array was performed at Lurie Children's Hospital Research Center by using the AffyMetrix GeneChip Hybridization, Wash and Stain Kit per the manufacturer's instructions.

The arrays were scanned at Lurie Children's Hospital Research Center by using the Gene-Chip Operating Software (GCOS).  Each .dat file was visually inspected for large scratches and/or misalignment of the grid. Gene-Chip Operating Software (GCOS) was used to generate .chp files (Level 2 data), which represent the consolidation of all individual probes within a probeset, from .cel files (Level 1 data). From .chp files, GCOS was used to generate .rpt files (Level 3 data), which show probe intensity values and QC values. All samples were inspected for several parameters. Background < 45 (actual range: 28.19–43.18). Noise (Raw Q) < 1.35 (actual range: 0.670–1.30). Scaling Factor < 65% (actual range: 11.487–52.965). % Present call > 35% (actual range: 38.4–57.7). 3'/5' GAPDH < 3.92 (actual range: 0.95–3.48). Samples with parameters outside of these limits were rerun starting at the step of RNA labeling. All .cel files (Level 1 data) were imported into the Broad Institute’s GenePattern server and Robust Multichip Average (RMA) normalization was performed using the ExpressionFileCreator module. Data were exported as a single .txt file (Level 2 data) containing probeset information for each individual tumor within a single spreadsheet. Several analytic quality control steps were performed. Principle component analysis (PCA) was performed to ensure that none of the samples were outliers. Pair-wise correlation coefficient analysis was performed using the data from the WT control sample that was included in each individual batch of samples. The normalized averages of the expression levels from each WT control run showed a correlation coefficient > 98%, indicating a high level of consistency. Six probesets corresponding to five genes were identified that closely correlated with gender (four male genes [RPS4Y1, DDX3Y, SMCY, and EIF1AY] and one female gene [XIST]). All samples were classified as male or female according to the expression patterns of these genes and the results were checked against the known gender of the patient. No discrepancies were detected.

For analyses, 9/10 replicates for PAJMLZ were removed from the RMA gene expression file. A collapsed data file was created by using the Broad Insitute's GenePattern CollapseDataset module with the default parameters and the maximum probe collapse method.

References:

  1. Tusher VG, Tibshirani R, Chu G. (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A. 98 (18), 10515 (PMID: 11309499)
  2. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES and Mesirov JP. (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 102, 15545-15550 (PMID: 16199517)

Gene Chip® Human Genome U133 Plus 2.0 Array (Affymetrix)  for Wilm's Tumor (WT)

*Protocol performed at Ann and Robert H. Lurie Children’s Hospital.

Gene expression analysis was performed with the Affymetrix U133+2 chip (Affymetrix, Santa Clara, CA, USA), according to the manufacturer’s protocol using the Gene-Chip Operating Software and normalized using robust multichip average normalization. Unsupervised analysis was performed using Non-negative Matrix Factorization Consensus Version 51. GSEA Version 2.0.142 was run using 1,000 permutations and phenotype permutation. Significant enrichment was defined as those lists with >50 genes, an FDR < 10%, and a p-value < 5%.

Specifically:

RNA quality was assessed by a bioanalyzer and RNA samples were required to have a RIN > 7. Total RNA was provided to Lurie Children's Hospital Research Center at a concentration of 150 ng/ul (2 ug total) in sets of 16 samples. One WT sample for which sufficient column-purified RNA was available was selected to serve as a control sample (PAJMLZ). Each set of 16 samples received from the BPC included the WT control sample, which was therefore repeated throughout all steps of this procedure in order to ensure consistency among all steps.

250 ng of total RNA was labeled by using the Affymetrix GeneChip 3' IVT Express Kit at Lurie Children's Hospital Research Center.  All procedures, including 1st strand reverse transcription, 2nd strand synthesis, in vitro transcription of aRNA, aRNA purification, quantitation, and fragmentation were performed according to the manufacturer's protocol.

Nucleic acid hybridization to the array was performed at Lurie Children's Hospital Research Center by using the AffyMetrix GeneChip Hybridization, Wash and Stain Kit per the manufacturer's instructions.

The arrays were scanned at Lurie Children's Hospital Research Center by using the Gene-Chip Operating Software (GCOS).  Each .dat file was visually inspected for large scratches and/or misalignment of the grid. Gene-Chip Operating Software (GCOS) was used to generate .chp files (Level 2 data), which represent the consolidation of all individual probes within a probeset, from .cel files (Level 1 data). From .chp files, GCOS was used to generate .rpt files (Level 3 files), which show probe intensity values and QC values. All samples were inspected for several parameters. Background < 45 (actual range: 28.19–43.18). Noise (Raw Q) < 1.35 (actual range: 0.670–1.30). Scaling Factor < 65% (actual range: 11.487–52.965). % Present call > 35% (actual range: 38.4–57.7). 3'/5' GAPDH < 3.92 (actual range: 0.95–3.48). Samples with parameters outside of these limits were rerun starting at the step of RNA labeling. All .cel files (Level 1 data) were imported into the Broad Institute’s GenePattern server and Robust Multichip Average (RMA) normalization was performed using the ExpressionFileCreator module. Data were exported as a single .txt file (Level 2 data) containing probeset information for each individual tumor within a single spreadsheet. Several analytic quality control steps were performed. Principle component analysis (PCA) was performed to ensure that none of the samples were outliers. Pair-wise correlation coefficient analysis was performed using the data from the WT control sample that was included in each individual batch of samples. The normalized averages of the expression levels from each WT control run showed a correlation coefficient > 98%, indicating a high level of consistency. Six probesets corresponding to five genes were identified that closely correlated with gender (four male genes [RPS4Y1, DDX3Y, SMCY, and EIF1AY] and one female gene [XIST]). All samples were classified as male or female according to the expression patterns of these genes and the results were checked against the known gender of the patient. No discrepancies were detected.

For analyses, 9/10 replicates for PAJMLZ were removed from the RMA gene expression file. A collapsed data file was created by using the Broad Insitute's GenePattern CollapseDataset module with the default parameters and the maximum probe collapse method.

SAM was used to compare gene expression in 51 tumors: favorable histology WT (FHWT) sequenced at CGI with the MLLT1 variant (5) vs the remainder of FHWT sequenced at CGI that do not have the MLLT1 variant (46). Gene expression data is not available for 1 FHWT with the MLLT1 variant. SAM was run using the Level 2 gene expression data. First, probesets that had absent "A" calls for 95% (48) or more samples were filtered out, resulting in the retention of 39913 probesets for analysis. The data were log transformed prior to running SAM. Two class unpaired analysis was run using 200 permutations; probesets with q < 0.05 were retained.

References:

  1. Brunet, J. P., Tamayo, P., Golub, T. R. & Mesirov, J. P. (2004) Metagenes and molecular pattern discovery using matrix factorization. Proc. Natl Acad. Sci. USA. 101, 4164–4169 (PMID: 15016911)
  2. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES and Mesirov JP. (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 102, 15545-15550 (PMID: 16199517)

Gene Chip® Human Genome U133 Plus 2.0 Array (Affymetrix)  for Pediatric Preclinical Testing Program (PPTP)

The RNA extraction was performed according to Qiagen manufacturer's protocol (RNeasy kit).

The labeling and array scanning was performed according to the manufacturer's protocol.

SurePrint G3 Human Gene Expression Array (Agilent) for Pediatric Preclinical Testing Program (PPTP)

The RNA extraction was performed according to Qiagen manufacturer's protocol (RNeasy kit).

The nucleic acid labeling was performed according to the manufacturer's protocol for One-Color Microarray-Based Gene Expression Analysis (Agilent Technologies). The Low Input Quick Amp Labeling Kit, One-Color generated fluorescent cRNA with a sample input RNA range between 10ng and 200ng of total RNA or a minimum of 5ng of poly A+ RNA for one-color processing. The method uses T& RNA Polymerase Blend (red cap)6 which simultaneously amplifies target material and incorporates Cyanine 3-CTP.

The nucleic acid hybridization to array was performed according to the manufacturer's protocol for One-Color Microarray-Based Gene Expression Analysis (Agilent Technologies). Briefly, the 10x blocking agent was prepared by adding 500ul of nuclease-free water to the 10x agent supplied with the kit, mixed on a vortex and centrifuged for 5-10 seconds.The RNA fragmentation reaction was performed at 60°C for 30 minutes, after which the samples were colled on ice for one minute and 2x Hi-RPM Hybridization Buffer was added to stop the reaction. These samples were further mixed, spun for 1 minute at room temperature at 13,000xg, placed on ice and loaded on array. The arrays were hybridized at 65°C for 17 hours. This step was followed by microarray slides wash with Gene Expression Wash Buffers I and II.

The array scanning was performed according to the manufacturer's protocol for One-Color Microarray-Based Gene Expression Analysis (Agilent Technologies). The assembled slide holders were put into the scanner cassette, after which the appropriate scanner protocol is selected and ran. In order to extract information from probe features from microarray scan data, the Feature Extraction process is performed using the software provided at Agilent web-site.

Last updated: January 26, 2018