--- title: "CytoMethIC Quick Start" shorttitle: "CytoMethIC Quick Start" date: "`r BiocStyle::doc_date()`" package: CytoMethIC output: BiocStyle::html_document fig_width: 6 fig_height: 5 vignette: > %\VignetteEngine{knitr::knitr} %\VignetteIndexEntry{CytoMethIC User Guide} %\VignetteEncoding{UTF-8} --- # Installation To use CytoMethIC, you need to install the package from Bioconductor. If you don't have the BiocManager package installed, install it first: ```{r, message=FALSE, warning=FALSE, cache=TRUE, eval=FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } if (!requireNamespace("CytoMethIC", quietly = TRUE)) { BiocManager::install("CytoMethIC") } ``` # Introduction `CytoMethIC` is a comprehensive package that provides model data and functions for easily using machine learning models that use data from the DNA methylome to classify cancer type and phenotype from a sample. The primary motivation for the development of this package is to abstract away the granular and accessibility-limiting code required to utilize machine learning models in R. Our package provides this abstraction for RandomForest, e1071 Support Vector, Extreme Gradient Boosting, and Tensorflow models. This is paired with an ExperimentHub component, which contains our lab's models developed for epigenetic cancer classification and predicting phenotypes. This includes CNS tumor classification, Pan-cancer classification, race prediction, cell of origin classification, and subtype classification models. ```{r, cyto2, message=FALSE, warning=FALSE} library(CytoMethIC) library(ExperimentHub) library(sesame) sesameDataCache() ``` # Data from ExperimentHub For these examples, we'll be using models from ExperimentHub and a sample from sesameData. ```{r cyto3, result="asis", echo=FALSE} library(knitr) kable(cmi_models[, c("ModelID", "PredictionLabelDescription")], caption = "CytoMethIC supported models" ) ``` # Pan-Cancer type classification The below snippet shows a demonstration of the model abstraction working on random forest and support vector models from CytoMethIC models on ExperimentHub. ```{r cyto4, message=FALSE} cmi_predict(sesameDataGet("HM450.1.TCGA.PAAD")$betas, ExperimentHub()[["EH8395"]], lift_over=TRUE) cmi_predict(sesameDataGet("HM450.1.TCGA.PAAD")$betas, ExperimentHub()[["EH8396"]], lift_over=TRUE) ``` # Pan-Cancer subtype classification The below snippet shows a demonstration of the cmi_predict function working to predict the subtype of the cancer. ```{r cyto5, message=FALSE} cmi_predict(sesameDataGet("HM450.1.TCGA.PAAD")$betas, ExperimentHub()[["EH8422"]]) ``` # Ethnicity classification The below snippet shows a demonstration of the cmi_predict function working to predict the ethnicity of the patient. ```{r cyto6, message=FALSE} cmi_predict(sesameDataGet("HM450.1.TCGA.PAAD")$betas, ExperimentHub()[["EH8421"]]) ``` # Pan-Cancer COO classification The below snippet shows a demonstration of the cmi_predict function working to predict the cell of origin of the cancer. ```{r cyto7, message=FALSE} cmi_predict(sesameDataGet("HM450.1.TCGA.PAAD")$betas, ExperimentHub()[["EH8423"]]) ``` # Ethnicity From GitHub Link In addition to ExperimentHub Models, this package also supports using models from GitHub URLs. Note that https://github.com/zhou-lab/CytoMethIC_models will be the most frequently updated public repository of our lab's classifiers. ```{r cyto8} base_url <- "https://github.com/zhou-lab/CytoMethIC_models/raw/main/models" cmi_model <- readRDS(url(sprintf("%s/Race3_rfcTCGA_InfHum3.rds", base_url))) betas <- openSesame(sesameDataGet("EPICv2.8.SigDF")[[1]]) cmi_predict(betas, cmi_model, lift_over=TRUE) ``` ```{r} sessionInfo() ```