To use CytoMethIC, you need to install the package from Bioconductor. If you don’t have the BiocManager package installed, install it first:
CytoMethIC
is a comprehensive package that provides model data and functions for easily using machine learning models that use data from the DNA methylome to classify cancer type and phenotype from a sample. The primary motivation for the development of this package is to abstract away the granular and accessibility-limiting code required to utilize machine learning models in R. Our package provides this abstraction for RandomForest, e1071 Support Vector, Extreme Gradient Boosting, and Tensorflow models. This is paired with an ExperimentHub component, which contains our lab’s models developed for epigenetic cancer classification and phenotyping. This includes CNS tumor classification, Pan-cancer classification, race prediction, cell of origin classification, and subtype classification models.
For these examples, we’ll be using models from ExperimentHub and a sample from sesameData.
ModelID | PredictionLabelDescription |
---|---|
m_cancertype_CNS66 | CNS Tumor Class (N=66) |
m_cancertype_TCGA33 | TCGA cancer types (N=33) |
rfc_cancertype_CNS66 | CNS Tumor Class (N=66) |
svm_cancertype_CNS66 | CNS Tumor Class (N=66) |
mlp_cancertype_CNS66 | CNS Tumor Class (N=66) |
xgb_cancertype_CNS66 | CNS Tumor Class (N=66) |
rfc_cancertype_TCGA33 | TCGA cancer types (N=33) |
svm_cancertype_TCGA33 | TCGA cancer types (N=33) |
mlp_cancertype_TCGA33 | TCGA cancer types (N=33) |
xgb_cancertype_TCGA33 | TCGA cancer types (N=33) |
The below snippet shows a demonstration of the model abstraction working on random forest and support vector models from CytoMethIC models on ExperimentHub.
The below snippet shows a demonstration of the cmi_classify function working to predict the subtype of the cancer.
The below snippet shows a demonstration of the cmi_classify function working to predict the race of the patient.
The below snippet shows a demonstration of the cmi_classify function working to predict the cell of origin of the cancer.
cmi_model = readRDS(url("https://github.com/zhou-lab/CytoMethIC_models/raw/main/models/Race3_rfcTCGA_InfHum3.rds"))
betas = openSesame(sesameDataGet("EPICv2.8.SigDF")[[1]])
cmi_classify(betas, cmi_model)
## Platform set to: EPICv2
## R Under development (unstable) (2024-01-16 r85808)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] knitr_1.45 sesame_1.21.7 sesameData_1.21.9
## [4] CytoMethIC_0.99.13 ExperimentHub_2.11.1 AnnotationHub_3.11.1
## [7] BiocFileCache_2.11.1 dbplyr_2.4.0 BiocGenerics_0.49.1
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.0 dplyr_1.1.4
## [3] blob_1.2.4 filelock_1.0.3
## [5] Biostrings_2.71.2 bitops_1.0-7
## [7] fastmap_1.1.1 RCurl_1.98-1.14
## [9] digest_0.6.34 mime_0.12
## [11] lifecycle_1.0.4 KEGGREST_1.43.0
## [13] RSQLite_2.3.5 magrittr_2.0.3
## [15] compiler_4.4.0 rlang_1.1.3
## [17] sass_0.4.8 tools_4.4.0
## [19] utf8_1.2.4 yaml_2.3.8
## [21] S4Arrays_1.3.3 bit_4.0.5
## [23] curl_5.2.0 DelayedArray_0.29.4
## [25] plyr_1.8.9 RColorBrewer_1.1-3
## [27] abind_1.4-5 BiocParallel_1.37.0
## [29] withr_3.0.0 purrr_1.0.2
## [31] grid_4.4.0 stats4_4.4.0
## [33] preprocessCore_1.65.0 fansi_1.0.6
## [35] wheatmap_0.2.0 e1071_1.7-14
## [37] colorspace_2.1-0 ggplot2_3.4.4
## [39] MASS_7.3-60.2 scales_1.3.0
## [41] SummarizedExperiment_1.33.3 cli_3.6.2
## [43] rmarkdown_2.25 crayon_1.5.2
## [45] generics_0.1.3 reshape2_1.4.4
## [47] httr_1.4.7 tzdb_0.4.0
## [49] proxy_0.4-27 DBI_1.2.2
## [51] cachem_1.0.8 stringr_1.5.1
## [53] zlibbioc_1.49.0 parallel_4.4.0
## [55] AnnotationDbi_1.65.2 BiocManager_1.30.22
## [57] XVector_0.43.1 matrixStats_1.2.0
## [59] vctrs_0.6.5 Matrix_1.6-5
## [61] jsonlite_1.8.8 IRanges_2.37.1
## [63] hms_1.1.3 S4Vectors_0.41.3
## [65] bit64_4.0.5 jquerylib_0.1.4
## [67] glue_1.7.0 codetools_0.2-19
## [69] stringi_1.8.3 gtable_0.3.4
## [71] BiocVersion_3.19.1 GenomeInfoDb_1.39.6
## [73] GenomicRanges_1.55.3 munsell_0.5.0
## [75] tibble_3.2.1 pillar_1.9.0
## [77] rappdirs_0.3.3 htmltools_0.5.7
## [79] randomForest_4.7-1.1 GenomeInfoDbData_1.2.11
## [81] R6_2.5.1 evaluate_0.23
## [83] Biobase_2.63.0 lattice_0.22-5
## [85] readr_2.1.5 png_0.1-8
## [87] memoise_2.0.1 bslib_0.6.1
## [89] class_7.3-22 Rcpp_1.0.12
## [91] SparseArray_1.3.4 xfun_0.42
## [93] MatrixGenerics_1.15.0 pkgconfig_2.0.3