Gene Expression: Microarray samples data

Name (link) Description Type Size
Microarray normalized profiles A matrix: rows are genes, columns are GEO samples RData ~850MB
Microarray sample labels matrix A binary matrix: rows are samples, columns are DO terms RData 46kb
Microarray sample annotation A matrix: rows are samples, columns represent sample information (e.g., dataset, platform, DO terms, and more) xlsx 2.57MB
Microarray platforms info An R list with an entry for each platform. The entry maps entrez genes to their probes RData 4.3MB
Sample to dataset mapping A named vector that maps each sample to its dataset id - useful for running cross validation RData 36kb

Gene Expression: RNASeq samples data

Name (link) Description Type Size
RNASeq expression profiles A matrix: rows are samples, columns are genes (>18,000) RData 78MB
RNASeq sample labels matrix A binary matrix: rows are samples, columns are DO terms RData 12kb
RNA-seq sample annotation A matrix: rows are samples, columns are genes xslx 120kb

Additional data: analysis and external databases

Name (link) Description Type Size
Gene to cancer subtype mapping, COSMIC analysis (1) Tab delimited: each row represents a gene-subtype pair (0.05 FDR) txt 1296kb
Drug to gene ids mapping (2) An R list: each drug id is mapped to its genes RData 23kb
Gene PB-ROC scores A matrix: rows are genes, columns are diseases RData 1274kb
Gene PN-ROC scores A matrix: rows are genes, columns are diseases RData 1120kb
Gene SMQ scores A matrix: rows are genes, columns are diseases RData 1299kb
Gene Entrez to Gene name A mapping between entrez ids and gene names txt 254kb
PPI network form IntAct (3) These PPIs were used in Figure 5 txt 600kb
Final Binary relevance model An R object that contains the selected multilabel classifier (can be uploaded and used in the code examples below) RData 49MB
Pathway to genes mapping An R list that maps each pathway (KEGG, NCI, Reactome, Biocarta) to its genes (entrez ids) RData 186kb
Our selected gene sets (Supplementary Table 1) A table with the selected genes for each disease in our analysis txt 340kb

(1) Forbes, S. A., Bindal, N., Bamford, S., Cole, C., Kok, C. Y., Beare, D., … Futreal, P. A. (2011). COSMIC: Mining complete cancer genomes in the catalogue of somatic mutations in cancer. Nucleic Acids Research, 39. doi:10.1093/nar/gkq929
(2) Law, V., Knox, C., Djoumbou, Y., Jewison, T., Guo, A. C., Liu, Y., … Wishart, D. S. (2014). DrugBank 4.0: Shedding new light on drug metabolism. Nucleic Acids Research, 42. doi:10.1093/nar/gkt1068
(3) Orchard, S., Ammari, M., Aranda, B., Breuza, L., Briganti, L., Broackes-Carter, F., … Hermjakob, H. (2014). The MIntAct project - IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Research, 42. doi:10.1093/nar/gkt1115