--- title: "atacInferCnv: CNV inference from scATAC-seq data" author: - name: Konstantin Okonechnikov affiliation: Hopp Children’s Cancer Center Heidelberg (KiTZ) email: k.okonechnikov@kitz-heidelberg.de package: atacInferCnv output: BiocStyle::html_document abstract: | The package prepares input from scATAC-seq data and adapts it for copy number variance profiling with InferCNV toolkit. It has also various paramters to control the analysis (e.g. external reference, metacells formation, bin size, etc) and custom plot visualizations. vignette: | %\VignetteIndexEntry{atacInferCnv: CNV inference from scATAC-seq data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Getting started By default to launch atacInferCnv, the required input are: - full epigenetic signals raw matrix; - annotation of cells to distinguish tumor and normal. To demonstrate how to apply atacInferCNV scAtac-seq data we prepared example test set from a medulloblastoma Group 3/4 MYCN tumor case snMultiomics-seq data. The default input data for the package are file path to raw signals matrix in text format and file path to cells annotation. The annotation of cells is required to distinguish tumor vs normal cells, that are applied as reference control. Typical format of annotation is a tab-delimited table with row names (cell IDs) and only with column names in the first line. For input matrix other data formats are also supported, as described further, but here we will start with the path to the standard matrix and annotation in text format. For the launch also the path and name of the result should be stated. We will use temporary folder for this: ```{r quick-start-1, message=FALSE, warning=FALSE} library(atacInferCnv) inPath = system.file("extdata", "MB183_ATAC_subset.tsv.gz", package = "atacInferCnv") sAnn = system.file("extdata", "MB183_ATAC_subset.CNV_blocks_ann.txt", package = "atacInferCnv" ) resPath = tempfile() ``` atacInferCNV analysis is based on two main steps: 1) process the input data, adjust it for CNV calling and 2) launch [inferCnv](https://bioconductor.org/packages/release/bioc/html/infercnv.html) on this custom generated input. First step is the function that processes input data to prepare it for [InferCnv](https://bioconductor.org/packages/release/bioc/html/infercnv.html). It has multiple options, but requires name of the column in the annotation to distinguish tumor vs normal as well as the name of normal cell type that will be used as a reference in case if normal cells are present in the input dataset. This command performs specific processing for the data based on usage of [Signac](https://github.com/stuart-lab/signac) R package application on scATAC-seq data and prepares the input for InferCnv: ```{r quick-start-3, message=FALSE, warning=FALSE} prepareAtacInferCnvInput(inPath,sAnn,resPath, targColumn = "cnvBlock", ctrlGrp = "Normal") ``` The results are saved in a specific format. Main generated output is saved in result folder: ```{r quick-start-31, message=FALSE, warning=FALSE} list.files(resPath) ``` It includes signal matrix per cell (*sample_raw_counts.txt.gz)*, cells annotation (*sample_cnv_ann.txt*) and peaks genomic regions (*sample_cnv_ref.txt*) info as well as configuration for InferCnv (*infercnv_config.yml*). Additional output includes Signac/Seurat RDS object (*sample_obj.RDS*) and UMAP visualization of the input data based on the annotation (*sample_UMAP.pdf*) for visual inspection. The second and final step is the wrapper function to launch InferCnv. It uses the generated configuration inside result path to customize the input: ```{r quick-start-4, message=FALSE, warning=FALSE} iObj <- runAtacInferCnv(resPath,returnObj = TRUE) ``` By default the function returns no output, but it's possible to return infercnv object to fully reflect command *infercnv::run()* using corresponding argument *returnObj*. In any case, all the generated output is saved inside the result path as infercnv subfolder: ```{r quick-start-41, message=FALSE, warning=FALSE} list.files(file.path(resPath,"sample_infercnv")) ``` The function has specific parameters that require adjustment for ATAC-data such as for example number of clusters (numClusters). Importantly, it supports all options of the original inferCnv functions. These details as well as description of output could be checked from InferCnv [documentation](https://github.com/broadinstitute/inferCNV/wiki). atacInferCnv can also generate pseudo-bulk CNV images from the assigned in annotation (if numClusters = 1 assigned in *runAtacInferCnv()*) or identified subclones (if numClusters \> 1 in the same function). For this purpose use the corresponding function after performing the analysis: ```{r quick-start-5, message=FALSE, warning=FALSE} plots <- plotCnvBlocks(resPath,iObj) plots[["C1"]] plots[["C2"]] ``` These figures show CNV patterns for subclones without (C1) and with (C1) MYCN amplification in chr2. # Custom settings The tool also provides various custom options to control the analysis. For example it supports as input 10X Multiomics or single cell ATAC processed data from CellRanger. If general scATAC anlaysis was already done by Signac/Seurat or some other tool, then created input Seurat or SingleCellExperiment object can be re-used without performing additional analysis to create it. This could be especially useful for merged or large samples and atacInferCnv has corresponding argument. Moreover several useful features such as external reference, meta-cell or genome binning are also available. We provide additional detailed tutorial and information at the project [wiki page](https://github.com/kokonech/atacInferCNV/wiki). # Applications Initial example application of the method was described in the following manuscript: [K. Okonechnikov et al "Oncogene aberrations drive medulloblastoma progression, not initiation", Nature, 2025](https://www.nature.com/articles/s41586-025-08973-5) # Session info {.unnumbered} ```{r sessionInfo, echo=FALSE} sessionInfo() ```