[
  {
    "objectID": "index.html",
    "href": "index.html",
    "title": "Orchestrating Hi-C analysis with Bioconductor",
    "section": "",
    "text": "Welcome\nPackage: OHCAAuthors: Jacques Serizay [aut, cre]Compiled: 2026-03-16Package version: 1.7.0R version: R Under development (unstable) (2026-03-05 r89546)BioC version: 3.23License: MIT + file LICENSE\nThis is the landing page of the “Orchestrating Hi-C analysis with Bioconductor” book. The primary aim of this book is to introduce the R user to Hi-C analysis. This book starts with key concepts important for the analysis of chromatin conformation capture and then presents Bioconductor tools that can be leveraged to process, analyze, explore and visualize Hi-C data.",
    "crumbs": [
      "Welcome"
    ]
  },
  {
    "objectID": "index.html#table-of-contents",
    "href": "index.html#table-of-contents",
    "title": "Orchestrating Hi-C analysis with Bioconductor",
    "section": "Table of contents",
    "text": "Table of contents\nThis book is divided in three parts:\nPart I: Introduction to Hi-C analysis\n\nChapter 1: General principles and Hi-C data pre-processing\nChapter 2: The different R classes implemented to analyze Hi-C\nChapter 3: Manipulating Hi-C data in R\nChapter 4: Hi-C data visualization\n\nPart II: In-depth Hi-C analysis\n\nChapter 5: Matrix-centric analysis\nChapter 6: Interactions-centric analysis\nChapter 7: Finding topological features from a Hi-C contact matrix\n\nPart III: Hi-C analysis workflows\n\nData gateways: accessing public Hi-C data portals\nInteroperability: Using HiCExperiment with other R packages\nWorkflow 1: Distance-dependent interactions across yeast mutants\nWorkflow 2: Chromosome compartment cohesion upon mitosis entry\nWorkflow 3: Inter-centromere interactions in yeast",
    "crumbs": [
      "Welcome"
    ]
  },
  {
    "objectID": "index.html#general-audience",
    "href": "index.html#general-audience",
    "title": "Orchestrating Hi-C analysis with Bioconductor",
    "section": "General audience",
    "text": "General audience\nThis books aims to demonstrate how to pre-process, parse and investigate Hi-C data in R. For this reason, a significant portion of this book consists of executable R code chunks. To be able to reproduce the examples demonstrated in this book and go further in the analysis of your real datasets, you will need to rely on several dependencies.\n\n\nR &gt;= 4.3 is required. You can check R version by typing version in an R console or in RStudio. If you do not have R &gt;= 4.3 installed, you will need to update your R version, as most extra dependencies will require R &gt;= 4.3.\n\n\n\n\n\n\n\nNoteInstalling R 4.3 👇\n\n\n\n\n\nDetailed instructions are available here to install R 4.3 on a Linux machine (Ubuntu 22.04).\nBriefly, to install pre-compiled version of R 4.3.0:\n\n# This is adapted from Posit (https://docs.posit.co/resources/install-r/)\nexport R_VERSION=4.3.0\n\n# Install curl and gdebi-core\nsudo apt update -qq\nsudo apt install curl gdebi-core -y\n\n# Fetching the `.deb` install file from Posit repository\ncurl -O https://cdn.rstudio.com/r/ubuntu-2204/pkgs/r-${R_VERSION}_1_amd64.deb\n\n# Install R\nsudo gdebi r-${R_VERSION}_1_amd64.deb --non-interactive -q\n\n# Optional: create a symlink to add R to your PATH\nsudo ln -s /opt/R/${R_VERSION}/bin/R /usr/local/bin/R\n\nIf you have some issues when installing the Hi-C packages listed below, you may need to install the following system libraries:\n\nsudo apt update -qq\nsudo apt install -y \\\n    automake make cmake fort77 gfortran \\\n    bzip2 unzip ftp build-essential \\\n    libc6 libreadline-dev \\\n    libpng-dev libjpeg-dev libtiff-dev \\\n    libx11-dev libxt-dev x11-common \\\n    libharfbuzz-dev libfribidi-dev \\\n    libfreetype6-dev libfontconfig1-dev \\\n    libbz2-dev liblzma-dev libtool \\\n    libxml2 libxml2-dev \\\n    libzstd-dev zlib1g-dev \\\n    libdb-dev libglu1-mesa-dev \\\n    libncurses5-dev libghc-zlib-dev libncurses-dev \\\n    libpcre3-dev libxml2-dev libblas-dev libzmq3-dev \\\n    libssl-dev libcurl4-openssl-dev \\\n    libgsl-dev libeigen3-dev libboost-all-dev \\\n    libgtk2.0-dev xvfb xauth xfonts-base apt-transport-https \\\n    libhdf5-dev libudunits2-dev libgdal-dev libgeos-dev \\\n    libproj-dev libnode-dev libmagick++-dev\n\n\n\n\n\n\nBioconductor &gt;= 3.18 is also required. You can check whether Bioconductor is available and its version in R by typing BiocManager::version(). If you do not have BiocManager &gt;= 3.18 installed, you will need to update it as follows:\n\n\nif (!require(\"BiocManager\", quietly = TRUE))\n    install.packages(\"BiocManager\")\nBiocManager::install(version = \"3.18\")\n\n\nYou will also need important packages, which will be described in length in this book. The following R code will set up most of the extra dependencies:\n\n\nBiocManager::install(\"HiCExperiment\", ask = FALSE)\nBiocManager::install(\"HiCool\", ask = FALSE)\nBiocManager::install(\"HiContacts\", ask = FALSE)\nBiocManager::install(\"HiContactsData\", ask = FALSE)\nBiocManager::install(\"fourDNData\", ask = FALSE)\nBiocManager::install(\"DNAZooData\", ask = FALSE)",
    "crumbs": [
      "Welcome"
    ]
  },
  {
    "objectID": "index.html#developers",
    "href": "index.html#developers",
    "title": "Orchestrating Hi-C analysis with Bioconductor",
    "section": "Developers",
    "text": "Developers\nFor developers or advanced R users, the devel versions of these packages can be installed by installing Bioc devel version prior to package installation:\n\nBiocManager::install(version = \"devel\")\nBiocManager::install(\"HiCExperiment\", ask = FALSE)\nBiocManager::install(\"HiCool\", ask = FALSE)\nBiocManager::install(\"HiContacts\", ask = FALSE)\nBiocManager::install(\"HiContactsData\", ask = FALSE)\nBiocManager::install(\"fourDNData\", ask = FALSE)\nBiocManager::install(\"DNAZooData\", ask = FALSE)",
    "crumbs": [
      "Welcome"
    ]
  },
  {
    "objectID": "index.html#docker-image",
    "href": "index.html#docker-image",
    "title": "Orchestrating Hi-C analysis with Bioconductor",
    "section": "Docker image",
    "text": "Docker image\nIf you have docker installed, the easiest approach would be to run the following command in a shell terminal:\n\ndocker run -it ghcr.io/js2264/ohca:latest R\n\nThis will fetch a docker image with the latest development versions of the aforementioned packages pre-installed, and initiate an interactive R session.",
    "crumbs": [
      "Welcome"
    ]
  },
  {
    "objectID": "index.html#building-book",
    "href": "index.html#building-book",
    "title": "Orchestrating Hi-C analysis with Bioconductor",
    "section": "Building book",
    "text": "Building book\nThe OHCA book has been rendered in R thanks to a number of packages, including but not only:\n\nBiocBook\ndevtools\nquarto\nrebook\n\nTo build this book locally, you can run:\n\ngit clone git@github.com:js2264/OHCA.git && cd OHCA\nquarto render\n\n\n\n\n\n\n\nWarning\n\n\n\nAll dependencies listed above will be required!\n\n\nThe actual rendering of this book is done by GitHub Actions, and the rendered static website is hosted by GitHub Pages.",
    "crumbs": [
      "Welcome"
    ]
  },
  {
    "objectID": "pages/preamble.html",
    "href": "pages/preamble.html",
    "title": "Preamble",
    "section": "",
    "text": "Hi-C is an experimental method to quantify spatial interactions between any pair of genomic loci. While a number of command-line interfaces (CLI) exist to process and manipulate Hi-C data (e.g. cooler (Abdennur & Mirny (2019)), juicer (Durand et al. (2016)) and HiC-Pro (Servant et al. (2015))), they generally suffer from several limitations often found in emerging genomics techniques:\n\nNo genomic representation of Hi-C processed data: the existing CLIs can efficiently parse Hi-C data as a numerical matrix and perform a few standard quantitative operations (e.g. contact matrix binning and normalization, dimensionality reduction, etc). However, they systematically fail to represent a Hi-C contact matrix as a genomic object. Qualitative analyses (e.g. intersecting chromatin loops with genomic features, finding genes overlapping with domains, etc) therefore remain extremely tedious.\nNo format-agnostic analysis libraries. Three competing file format standards (.(m)cool, .hic and HiC-Pro files) currently exist to store Hi-C processed data and dedicated CLIs propose sets of tools specifically working with their corresponding Hi-C processed data file format. This has curbed the development of generic Hi-C data analysis libraries by favoring the emergence of several redundant tools.\nLack of integration within a biology-oriented community. While rapid development of Hi-C analysis methodology is ongoing, it is primarily driven by small-scale teams rather than by a community as a whole. This oriented development is less likely to fulfill the needs met by other investigators.\n\nIn this book, we provide an overview of a set of tools that enable processing, visualization and in-depth investigation of Hi-C data in R, ensuring intuitive integration of Hi-C data in the existing Bioconductor ecosystem. We introduce a high-level HiCExperiment data structure to represent Hi-C data, directly extending robust, pre-existing core genomic classes offered by Bioconductor. This guarantees a stable and intuitive Hi-C data representation in R as a genomic entity, which is highly interoperable and can be used by all existing analysis packages in R.\n\nOn top of the HiCExperiment data structure, the HiContacts package offers extended functionalities to perform matrix-centric and interaction-centric analysis directly on HiCExperiment objects and provides powerful visualization tools specifically designed for Hi-C data to facilitate exploratory data analysis. In addition, the HiCool package implements a processing workflow based on a lightweight library to process raw Hi-C data into binned Hi-C contact matrices ready to be imported as HiCExperiment objects. Finally, the fourDNData and DNAZooData packages offer a gateway to major public data repositories directly in R.\n\n\nPackage status\n\n\n\nGithub repo 💾\nDoc 📘\nGithub checks ✅\nBioc builds 🏗\nLifecycle 🌱\n\n\n\n\nHiCExperiment\nDoc\n\nBioc release: Bioc devel:\n\n\n\nHiContacts\nDoc\n\nBioc release: Bioc devel:\n\n\n\nHiCool\nDoc\n\nBioc release: Bioc devel:\n\n\n\nHiContactsData\nDoc\n\nBioc release: Bioc devel:\n\n\n\nDNAZooData\nDoc\n\nBioc release: Bioc devel:\n\n\n\nfourDNData\nDoc\n\nBioc release: Bioc devel:\n\n\n\n\n\n\nReferences\n\n\n\n\nAbdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, 36(1), 311–316. https://doi.org/10.1093/bioinformatics/btz540\n\n\nDurand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002\n\n\nServant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x\n\n\n\n\n Back to top",
    "crumbs": [
      "Preamble"
    ]
  },
  {
    "objectID": "pages/principles.html",
    "href": "pages/principles.html",
    "title": "\n1  Hi-C pre-processing steps\n",
    "section": "",
    "text": "1.1 Experimental considerations",
    "crumbs": [
      "Fundamentals concepts",
      "<span class='chapter-number'>1</span>  <span class='chapter-title'>Hi-C pre-processing steps</span>"
    ]
  },
  {
    "objectID": "pages/principles.html#experimental-considerations",
    "href": "pages/principles.html#experimental-considerations",
    "title": "\n1  Hi-C pre-processing steps\n",
    "section": "",
    "text": "1.1.1 Experimental approach\nThe Hi-C procedure (Lieberman-Aiden et al. (2009)) stems from the clever combination of high-throughput sequencing and Chromatin Conformation Capture (3C) experimental approach (Dekker et al. (2002)).\nIn Hi-C, chromatin is crosslinked within intact nuclei and enzymatically digested (usually with one or several restriction enzymes, but Hi-C variants using MNase or DNase exist). End-repair introduces biotinylated dNTPs and is followed by religation, which generates chimeric DNA fragments consisting of genomic loci originally lying in spatial proximity, usually crosslinked to a shared protein complex. After religation, DNA fragments are sheared, biotin-containing fragments are pulled-down and converted into a sequencing library.\n\n\n1.1.2 C variants\nA number of C variants have been proposed since the publication of the original 3C method (reviewed by Davies et al. (2017)), the main ones being Capture-C and ChIA-PET (see procedure below).\n\nCapture-C is useful to quantify interactions between a set of regulatory elements of interest. ChIA-PET, on the other hand, can identify interactions mediated by a specific protein of interest. Finally, an increasing number of Hi-C approaches rely on long-read sequencing (e.g. Deshpande et al. (2022), Tavares-Cadete et al. (2020)) to identify clusters of 3D contacts.\n\n1.1.3 Sequencing\nHi-C libraries are traditionally sequenced with short-read technology, and are by essence paired-end libraries. For this reason, the end result of the experimental side of the Hi-C consists of two fastq files, each one containing sequences for one extremity of the DNA fragments purified during Hi-C. These are the two files we need to move on to the computational side of Hi-C.\nFastq files are plain text files (usually compressed, with the .gz extension). They are generated by the sequencing machine during a sequencing run, and for Hi-C, necessarily come in pairs, generally called *_R1.fq.gz and *_R2.fq.gz.\nHere is the first read listed in sample_R1.fq.gz file:\n\n\nsample-R1.fq.gz\n\n@SRR5399542.1.1 DH1DQQN1:393:H9GEWADXX:1:1101:1187:2211 length=24\nCAACTTCAATACCAGCAGCAGCAA\n+\nCCCFFFFFHHHHHJJJJJIJJJJJ\n\nAnd here is the first read listed in sample_R2.fq.gz file:\n\n\nsample-R2.fq.gz\n\n@SRR5399542.1.1 DH1DQQN1:393:H9GEWADXX:1:1101:1187:2211 length=24\nGCTGTTGTTGTTGTTGTATTTGCA\n+\n@@@FFFFFFHHHHIJJIJJHIIEH\n\nThese two reads are the first listed in their respective file. Notice how they bear the same name (first line): they form a pair. The second line corresponds to the sequence read by the sequencer, the third line is a single + separator, and the last line indicates the per-base sequencing quality following a nebulous cypher.",
    "crumbs": [
      "Fundamentals concepts",
      "<span class='chapter-number'>1</span>  <span class='chapter-title'>Hi-C pre-processing steps</span>"
    ]
  },
  {
    "objectID": "pages/principles.html#hi-c-file-formats",
    "href": "pages/principles.html#hi-c-file-formats",
    "title": "\n1  Hi-C pre-processing steps\n",
    "section": "\n1.2 Hi-C file formats",
    "text": "1.2 Hi-C file formats\nTwo important output files are typically generated during Hi-C data pre-processing:\n\nA “pairs” file;\nA binned “contact matrix” file\n\nWe will now describe the structure of these different types of files. Directly jump to the next chapter if you want to know more about importing data from a contact matrix or a pairs file in R.\n\n1.2.1 Pairs files\nA “pairs” file (optionally, but generally filtered and sorted) is the direct output of processing Hi-C fastq files. It stores information about putative proximity contacts identified by digestion/religation, in the lossless, human-readable, indexable format: the .pairs format.\nA .pairs file is organized in a header followed by a body:\n\n\nheader: starts with #\n\nRequired entries\n\nFirst line: ## pairs format v1.0\n\n\n#columns: column contents and ordering (e.g. #columns: readID chr1 pos1 chr2 pos2 strand1 strand2 &lt;column_name&gt; &lt;column_name&gt; ...)\n\n#chromsize: chromosome names and their size in bp, one chromosome per line, in the same order that defines ordering between mates (e.g. #chromsize: chr1 230218). Chromosome order is actually defined by this header, not by the order of pairs listed in the body!\n\n\nOptional entries with reserved header keys (sorted, shape, command, genome_assembly)\n\n\n#sorted: to indicate the sorting mechanism (e.g. #sorted: chr1-chr2-pos1-pos2, #sorted: chr1-pos1, #sorted: none)\n\n#shape: to specify whether the matrix is stored as upper triangle or lower triangle (#shape: upper triangle, #shape: lower triangle)\n\n#command: to specify any command, e.g. the command used to generate the pairs file (#command: bam2pairs mysample.bam mysample)\n\n#genome_assembly: to specify the genome assembly (e.g. #genome_assembly: hg38)\n\n\n\n\n\nbody: tab-separated columns\n\n7 reserved (4 of them required) columns:\n\nreadID, chr1, pos1, chr2, pos2, strand1, strand2\nColumns 2-5 (chr1, pos1, chr2, pos2) are required and cannot have missing values\nFor column 1, 6 & 7: missing values are annotated with a single-character dummy (.)\n\n\n2 extra reserved, optional column names:\n\n\nfrag1, frag2: restriction enzyme fragment index used by Juicer\n\n\n\nAny number of optional columns can be added\n\n\n\n\n\nsample.pairs\n\n## pairs format v1.0\n#sorted: chr1-chr2-pos1-pos2\n#shape: upper triangle\n#genome_assembly: hg38\n#chromsize: chr1 249250621\n#chromsize: chr2 243199373\n#chromsize: chr3 198022430\n...\n#columns: readID chr1 pos1 chr2 pos2 strand1 strand2\nEAS139:136:FC706VJ:2:2104:23462:197393 chr1 10000 chr1 20000 + +\nEAS139:136:FC706VJ:2:8762:23765:128766 chr1 50000 chr1 70000 + +\nEAS139:136:FC706VJ:2:2342:15343:9863 chr1 60000 chr2 10000 + +\nEAS139:136:FC706VJ:2:1286:25:275154 chr1 30000 chr3 40000 + -\n\nMore information about the conventions related to this text file are provided by the 4DN consortium, which originally formalized the specifications of this file format.\n\n1.2.2 Binned contact matrix files\n\n1.2.2.1 Binning pairs into a matrix\nThe action of “binning” a .pairs file into a contact matrix consists in (1) discretizing a genome reference into genomic bins, (2) attributing bins for each pair’s extremity and (3) computing the interaction frequency between any pair of genomic bins, i.e. the “contact matrix”.\nFor instance, here is a dummy .pairs file with a total of 5 pairs:\n\n\ndummy.pairs\n\n## pairs format v1.0\n#sorted: chr1-chr2-pos1-pos2\n#columns: readID chr1 pos1 chr2 pos2 strand1 strand2\n#chromsize: chr1 389\n. chr1 162 chr1 172 . . \n. chr1 180 chr1 192 . . \n. chr1 183 chr1 254 . .\n. chr1 221 chr1 273 . . \n. chr1 254 chr1 298 . . \n\nNote that this genome reference is made of a single chromosome (chr1), very short (length of 389). By binning this chromosome in 100bp-wide bins (100 bp is the resolution), one would obtain the following four bins:\n\n\nbins.bed\n\n&lt;chr&gt;  &lt;pos&gt; &lt;bin&gt;\nchr1   1     100\nchr1   101   200\nchr1   201   300\nchr1   301   389\n\nEach pair extremity can be changed to an integer indicating the position of the bin it falls in, e.g. for the left-hand extremity of the pairs file printed hereinabove (bin1):\n&lt;chr1&gt;  &lt;pos1&gt;  -&gt;  &lt;bin1&gt;\nchr1    162     -&gt;  2\nchr1    180     -&gt;  2\nchr1    183     -&gt;  2\nchr1    221     -&gt;  3\nchr1    254     -&gt;  3\nSimilarly for the right-hand extremity of the pairs file (bin2):\n&lt;chr2&gt;  &lt;pos2&gt;  -&gt;  &lt;bin2&gt;\nchr1    172     -&gt;  chr1 2\nchr1    192     -&gt;  chr1 2\nchr1    254     -&gt;  chr1 3\nchr1    273     -&gt;  chr1 3\nchr1    298     -&gt;  chr1 3\nBy pasting side-to-side the left-hand and right-hand extremities of each pair, the .pairs file can be turned into something like:\n&lt;bin1&gt; &lt;bin2&gt;\n2      2\n2      2\n2      3\n3      3\n3      3\nAnd if we now count the number of each &lt;bin1&gt; &lt;bin2&gt; combination, adding a third &lt;count&gt; column, we end up with a count.matrix text file:\n\n\ncount.matrix\n\n&lt;bin1&gt; &lt;bin2&gt;  &lt;count&gt;\n2      2       2\n2      3       1\n3      3       2\n\nThis count.matrix file lists a total of 5 pairs, and in which bin each extremity of each pair is contained. Thus, a count matrix is a lossy file format, as it “rounds up” the position of each pair’s extremity to the genomic bin containing it.\nThis “i-j-x” 3-column format, in which i-j relate to a pair of “coordinates” indices (or a pair of genomic bin indices) in a matrix, and x relates to a score associated with the pair of indices, is generally called a “COO sparse matrix”.\nIn this context, the regions.bed acts as a secondary “dictionary” describing the nature of i and j indices, i.e. the location of genomic bins.\n\n1.2.2.2 Plain-text matrices: HiC-Pro style\nThe HiC-Pro pipeline (Servant et al. (2015)) outputs 2 text files: a regions.bed file and a count.matrix file. They are generated by the exact process explained above.\nTogether, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files.\n.(m)cool and .hic file formats are two standards addressing these limitations.\n\n1.2.2.3 .(m)cool matrices\nThe .cool format has been formally defined in Abdennur & Mirny (2019) and is a particular type of HDF5 (Hierarchical Data Format) file. It is an indexed archive file storing rectangular tables called:\n\n\nbins: containing the same information than the regions.bed file;\n\npixels: containing the same information than the count.matrix (each “pixel” is a pair of 2 bins and has one or several associated scores);\n\nchroms: summarizing the order and length of the chromosomes present in a Hi-C contact matrix;\n\nindexes: allowing random access, i.e. parsing of only a subset of the data without having to read through the entire set of data.\n\n\nA single .pairs file binned at different resolutions can also be saved into a single, multi-resolution .mcool file. .mcool essentially consists of nested .cool files.\nImportantly, as an HDF5-based format, .cool files are binarized, indexed and highly-compressed. This has two major benefits:\n\nSmaller disk storage footprint\n\nRapid subsetting of the data through random access\n\n\nMoreover, parsing .cool files is possible using HDF standard APIs.\n\n1.2.2.4 .hic matrices\nThe .hic format is another type of binarized, indexed and highly-compressed file (Durand et al. (2016)). It can store virtually the same information than a .cool file. However, parsing .hic files is not as straightforward as .cool files, as it does not rely on a generic file standard. Still, the straw library has been implemented in several computing languages to facilitate parsing of .hic files (Durand et al. (2016)).",
    "crumbs": [
      "Fundamentals concepts",
      "<span class='chapter-number'>1</span>  <span class='chapter-title'>Hi-C pre-processing steps</span>"
    ]
  },
  {
    "objectID": "pages/principles.html#pre-processing-hi-c-data",
    "href": "pages/principles.html#pre-processing-hi-c-data",
    "title": "\n1  Hi-C pre-processing steps\n",
    "section": "\n1.3 Pre-processing Hi-C data",
    "text": "1.3 Pre-processing Hi-C data\n\n1.3.1 Processing workflow\nFundamentally, the main steps performed to pre-process Hi-C are:\n\nSeparate read mapping\nPairs parsing\nPairs sorting\nPairs filtering\nPairs binning into a contact matrix\nNormalization of contact matrix and multi-resolution matrix generation\n\n\nIn practice, a minimal workflow to pre-process Hi-C data is the following (adapted from Open2C et al. (2023)):\n\n## This chunk of code is not executed when rendering this book. \n## Note these fields have to be replaced by appropriate variables: \n##    &lt;index&gt;\n##    &lt;input.R1.fq.gz&gt;\n##    &lt;input.R2.fq.gz&gt;\n##    &lt;chromsizes.txt&gt;\n##    &lt;prefix&gt;\nbwa mem2 -SP5M &lt;index&gt; &lt;input.R1.fq.gz&gt; &lt;input.R2.fq.gz&gt; \\\n    | pairtools parse -c &lt;chromsizes.txt&gt; \\\n    | pairtools sort \\\n    | pairtools dedup \\\n    | cooler cload pairs -c1 2 -p1 3 -c2 4 -p2 5 &lt;chromsizes.txt&gt;:10000 - &lt;prefix&gt;.cool\ncooler zoomify --balance --nproc 32 --resolutions 5000N --out &lt;prefix&gt;.mcool &lt;prefix&gt;.cool\n\nSeveral pipelines have been developed to facilitate Hi-C data pre-processing. A few of them stand out from the crowd:\n\n\nnf-distiller: a combination of an aligner + pairtools + cooler\n\n\nHiC-pro (Servant et al. (2015))\n\nJuicer (Durand et al. (2016))\n\n\n\n\n\n\n\nNote\n\n\n\nFor larger genomes (&gt; 1Gb) with more than few tens of M of reads per fastq (e.g. &gt; 100M), we recommend pre-processing data on an HPC cluster. Aligners, pairs processing and matrix binning can greatly benefit from parallelization over multiple CPUs (Open2C et al. (2023))).\nTo scale up data pre-processing, we recommend to rely on an efficient read mapper such as bwa, followed by pairs parsing, sorting and deduplication with pairtools and binning with cooler.\n\n\n\n1.3.2 hicstuff: lightweight Hi-C pipeline\nhicstuff is an integrated workflow to process Hi-C data. Some advantages compared to solutions mentioned above are its simplicity, flexibility and lightweight. For shallow sequencing or Hi-C on smaller genomes, it efficiently parses fastq reads and processes data into binned contact matrices with a single terminal command.\nhicstuff provides both a command-line interface (CLI) and a python API to process fastq reads into a binned contact matrix. A processing pipeline can be launched using the standard command pipeline as follows:\n\n## This chunk of code is not executed when rendering this book. \n## Note these fields have to be replaced by appropriate variables: \n##    &lt;hicstuff-options&gt;\n##    &lt;genome.fa&gt;\n##    &lt;input.R1.fq.gz&gt;\n##    &lt;input.R2.fq.gz&gt;\nhicstuff pipeline \\\n   &lt;hicstuff-options&gt; \\\n   --genome &lt;genome.fa&gt; \\\n   &lt;input.R1.fq.gz&gt; \\\n   &lt;input.R2.fq.gz&gt;  \n\nhicstuff documentation website is available here: https://hicstuff.readthedocs.io/ to read more about available options and internal processing steps.\n\n1.3.3 HiCool: hicstuff within R\nhicstuff is available as a standalone (conda install -c bioconda hicstuff it!). It is also shipped in an R package: HiCool. Thus, HiCool can process fastq files directly within an R console.\n\n1.3.3.1 Executing HiCool\nTo demonstrate this, we first fetch example .fastq files:\n\nlibrary(HiContactsData)\nr1 &lt;- HiContactsData(sample = 'yeast_wt', format = 'fastq_R1')\n##  see ?HiContactsData and browseVignettes('HiContactsData') for documentation\n##  loading from cache\nr2 &lt;- HiContactsData(sample = 'yeast_wt', format = 'fastq_R2')\n##  see ?HiContactsData and browseVignettes('HiContactsData') for documentation\n##  loading from cache\n\nr1\n##                                                        EH7783 \n##  \"/home/biocbuild/.cache/R/ExperimentHub/38d33f440fc43b_7833\"\n\nr2\n##                                                        EH7784 \n##  \"/home/biocbuild/.cache/R/ExperimentHub/38d33f4ca9e912_7834\"\n\nWe then load the HiCool library and execute the main HiCool function.\n\n## Note that HiCool processing is not actually performed when rendering this book \nlibrary(HiCool)\nHiCool(\n    r1, \n    r2, \n    restriction = 'DpnII,HinfI', \n    resolutions = c(4000, 8000, 16000), \n    genome = 'R64-1-1', \n    output = './HiCool/'\n)\n\n\n\n\n\n\n\nImportantImportant note:\n\n\n\nHiCool relies on basilisk R package to set up an underlying, self-managed python environment. Some packages from this environment are not yet available for ARM chips (e.g. M1/2/3 in newer on macbooks) or Windows. For this reason, HiCool-supported features are not available on these machines.\n\n\n\n1.3.3.2 HiCool arguments\nSeveral arguments can be passed to HiCool and some are worth mentioning them:\n\n\nrestriction: (default: \"DpnII,HinfI\")\n\n\nresolutions: (default: NULL, automatically inferring resolutions based on genome size)\n\n\niterative: (default: TRUE)\n\n\nfilter: (default: TRUE)\n\n\nbalancing_args: (default: \" --cis-only --min-nnz 3 --mad-max 7 \")\n\n\nthreads: (default: 1L)\n\nOther HiCool arguments can be listed by checking HiCool documentation in R: ?HiCool::HiCool.\n\n1.3.3.3 HiCool outputs\nWe can check the generated output files placed in the HiCool/ directory.\n\n## This chunk of code is not executed when rendering this book. \nfs::dir_tree('HiCool/')\n\n\nThe *.pairs and *.mcool files are the pairs and contact matrix files, respectively. These are the output files the end-user is generally looking for.\n\nThe *.html file is a report summarizing pairs numbers, filtering, etc…\nThe *.log file contains all output and error messages, as well as the full list of commands that have been executed to pre-process the input dataset.\nThe *.pdf graphic files provide a visual representation of the distribution of informative/non-informative pairs.\n\n\n\n\n\n\n\nTip\n\n\n\nAll the files generated by a single HiCool pipeline execution contain the same 6-letter unique hash to make sure they are not overwritten if re-executing the same command.",
    "crumbs": [
      "Fundamentals concepts",
      "<span class='chapter-number'>1</span>  <span class='chapter-title'>Hi-C pre-processing steps</span>"
    ]
  },
  {
    "objectID": "pages/principles.html#exploratory-data-analysis-of-processed-hi-c-files",
    "href": "pages/principles.html#exploratory-data-analysis-of-processed-hi-c-files",
    "title": "\n1  Hi-C pre-processing steps\n",
    "section": "\n1.4 Exploratory data analysis of processed Hi-C files",
    "text": "1.4 Exploratory data analysis of processed Hi-C files\nOnce Hi-C raw data has been transformed into a set of processed files, exploratory data analysis is typically conducted following two main routes:\n\nData visualization;\nData investigation.\n\nDuring the last decade, a number of softwares have been developed to unlock Hi-C data visualization and investigation. Here we provide a non-exhaustive list of notable tools developed throughout the recent years for downstream Hi-C analysis, selected from this longer list.\n\n\n2012-2015:\n\nHiTC (2012)\nHiCCUPS (2014)\nHiCseg (2014)\nFit-Hi-C (2014)\nHiC-Pro (2015)\ndiffHic (2015)\ncooltools (2015)\nHiCUP (2015)\nHiCPlotter (2015)\nHiFive (2015)\n\n\n\n2016-2019:\n\nCHiCAGO (2016)\nTADbit (2017)\nHiCRep (2017)\nHiC-DC (2017)\nGoTHIC (2017)\nHiCExplorer (2018)\nBoost-HiC (2018)\nHiCcompare (2018)\nHiPiler (2018)\ncoolpuppy (2019)\n\n\n\n2020-present:\n\nSerpentine (2020)\nCHESS (2020)\nDeepHiC (2020)\nChromosight (2020)\nMustache (2020)\nTADcompare (2020)\nPOSSUM (2021)\nCalder (2021)\nHICDCPlus (2021)\nplotgardener (2021)\nGENOVA (2021)\n\n\n\nAll references as well as many other softwares and references are available here.",
    "crumbs": [
      "Fundamentals concepts",
      "<span class='chapter-number'>1</span>  <span class='chapter-title'>Hi-C pre-processing steps</span>"
    ]
  },
  {
    "objectID": "pages/data-representation.html",
    "href": "pages/data-representation.html",
    "title": "\n2  Hi-C data structures in R\n",
    "section": "",
    "text": "2.1 GRanges class\nGRanges is a shorthand for GenomicRanges, a core class in Bioconductor. This class is primarily used to describe genomic ranges of any nature, e.g.  sets of promoters, SNPs, chromatin loop anchors, etc.\nThe data structure has been published in the seminal 2015 publication by the Bioconductor team (Huber et al. (2015)).",
    "crumbs": [
      "Fundamentals concepts",
      "<span class='chapter-number'>2</span>  <span class='chapter-title'>Hi-C data structures in R</span>"
    ]
  },
  {
    "objectID": "pages/data-representation.html#granges-class",
    "href": "pages/data-representation.html#granges-class",
    "title": "\n2  Hi-C data structures in R\n",
    "section": "",
    "text": "2.1.1 GRanges fundamentals\nThe easiest way to generate a GRanges object is to coerce it from a vector of genomic coordinates in the UCSC format (e.g. \"chr2:2004-4853\"):\n\nlibrary(GenomicRanges)\ngr &lt;- GRanges(c(\n    \"chr2:2004-7853:+\", \n    \"chr4:4482-9873:-\", \n    \"chr5:1943-4203:+\", \n    \"chr5:4103-5004:+\"  \n))\ngr\n##  GRanges object with 4 ranges and 0 metadata columns:\n##        seqnames    ranges strand\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt;\n##    [1]     chr2 2004-7853      +\n##    [2]     chr4 4482-9873      -\n##    [3]     chr5 1943-4203      +\n##    [4]     chr5 4103-5004      +\n##    -------\n##    seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\nA single GRanges object can contain one or several “ranges”, or genomic intervals. To navigate between these ranges, GRanges can be subset using the standard R single bracket notation [:\n\ngr[1]\n##  GRanges object with 1 range and 0 metadata columns:\n##        seqnames    ranges strand\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt;\n##    [1]     chr2 2004-7853      +\n##    -------\n##    seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\ngr[1:3]\n##  GRanges object with 3 ranges and 0 metadata columns:\n##        seqnames    ranges strand\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt;\n##    [1]     chr2 2004-7853      +\n##    [2]     chr4 4482-9873      -\n##    [3]     chr5 1943-4203      +\n##    -------\n##    seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\nGenomicRanges objects aim to provide a natural description of genomic intervals (ranges) and are incredibly versatile. They have four required pieces of information:\n\n\nseqnames (i.e. chromosome names) (accessible with seqnames())\n\nstart (accessible with start())\n\nend (accessible with end())\n\nstrand (accessible with strand())\n\n\nseqnames(gr)\n##  factor-Rle of length 4 with 3 runs\n##    Lengths:    1    1    2\n##    Values : chr2 chr4 chr5\n##  Levels(3): chr2 chr4 chr5\n\nstart(gr)\n##  [1] 2004 4482 1943 4103\n\nend(gr)\n##  [1] 7853 9873 4203 5004\n\nstrand(gr)\n##  factor-Rle of length 4 with 3 runs\n##    Lengths: 1 1 2\n##    Values : + - +\n##  Levels(3): + - *\n\nHere is a graphical representation of a GRanges object, taken from Bioconductor course material:\n\nWe will now delve into the detailed structure and operability of GRanges objects.\n\n2.1.2 GRanges metadata\nAn important aspect of GRanges objects is that each entry (range) can have extra optional metadata. This metadata is stored in a rectangular DataFrame. Each column can contain a different type of information, e.g. a numerical vector, a factor, a list, …\nOne can directly access this DataFrame using the mcols() function, and individual columns of metadata using the $ notation:\n\nmcols(gr)\n##  DataFrame with 4 rows and 0 columns\nmcols(gr)$GC &lt;- c(0.45, 0.43, 0.44, 0.42)\nmcols(gr)$annotation &lt;- factor(c(NA, 'promoter', 'enhancer', 'centromere'))\nmcols(gr)$extended.info &lt;- c(\n    list(c(NA)), \n    list(c(date = 2023, source = 'manual')), \n    list(c(date = 2021, source = 'manual')), \n    list(c(date = 2019, source = 'homology'))\n)\nmcols(gr)\n##  DataFrame with 4 rows and 3 columns\n##           GC annotation extended.info\n##    &lt;numeric&gt;   &lt;factor&gt;        &lt;list&gt;\n##  1      0.45 NA                    NA\n##  2      0.43 promoter     2023,manual\n##  3      0.44 enhancer     2021,manual\n##  4      0.42 centromere 2019,homology\n\nWhen metadata columns are defined for a GRanges object, they are pasted next to the minimal 4 required GRanges fields, separated by a | character.\n\ngr\n##  GRanges object with 4 ranges and 3 metadata columns:\n##        seqnames    ranges strand |        GC annotation extended.info\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt;   &lt;factor&gt;        &lt;list&gt;\n##    [1]     chr2 2004-7853      + |      0.45 NA                  &lt;NA&gt;\n##    [2]     chr4 4482-9873      - |      0.43 promoter     2023,manual\n##    [3]     chr5 1943-4203      + |      0.44 enhancer     2021,manual\n##    [4]     chr5 4103-5004      + |      0.42 centromere 2019,homology\n##    -------\n##    seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n\n2.1.3 Genomic arithmetics on individual GRanges objects\nA GRanges object primarily describes a set of genomic ranges (it is in the name!). Useful genomic-oriented methods have been implemented to investigate individual GRanges object from a genomic perspective.\n\n2.1.3.1 Intra-range methods\nStandard genomic arithmetics are possible with GRanges, e.g.  shifting ranges, resizing, trimming, … These methods are referred to as “intra-range” methods as they work “one-region-at-a-time”.\n\n\n\n\n\n\nNoteNote\n\n\n\n\nEach range of the input GRanges object is modified independently from the other ranges in the following code chunks.\nIntra-range operations are endomorphisms: they all take GRanges inputs and always return GRanges objects.\n\n\n\n\nShifting each genomic range in a GRanges object by a certain number of bases:\n\n\ngr\n##  GRanges object with 4 ranges and 3 metadata columns:\n##        seqnames    ranges strand |        GC annotation extended.info\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt;   &lt;factor&gt;        &lt;list&gt;\n##    [1]     chr2 2004-7853      + |      0.45 NA                  &lt;NA&gt;\n##    [2]     chr4 4482-9873      - |      0.43 promoter     2023,manual\n##    [3]     chr5 1943-4203      + |      0.44 enhancer     2021,manual\n##    [4]     chr5 4103-5004      + |      0.42 centromere 2019,homology\n##    -------\n##    seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n# ----- Shift all genomic ranges towards the \"right\" (downstream in `+` strand), by 1000bp:\nshift(gr, 1000)\n##  GRanges object with 4 ranges and 3 metadata columns:\n##        seqnames     ranges strand |        GC annotation extended.info\n##           &lt;Rle&gt;  &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt;   &lt;factor&gt;        &lt;list&gt;\n##    [1]     chr2  3004-8853      + |      0.45 NA                  &lt;NA&gt;\n##    [2]     chr4 5482-10873      - |      0.43 promoter     2023,manual\n##    [3]     chr5  2943-5203      + |      0.44 enhancer     2021,manual\n##    [4]     chr5  5103-6004      + |      0.42 centromere 2019,homology\n##    -------\n##    seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n# ----- Shift all genomic ranges towards the \"left\" (upstream in `+` strand), by 1000bp:\nshift(gr, -1000)\n##  GRanges object with 4 ranges and 3 metadata columns:\n##        seqnames    ranges strand |        GC annotation extended.info\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt;   &lt;factor&gt;        &lt;list&gt;\n##    [1]     chr2 1004-6853      + |      0.45 NA                  &lt;NA&gt;\n##    [2]     chr4 3482-8873      - |      0.43 promoter     2023,manual\n##    [3]     chr5  943-3203      + |      0.44 enhancer     2021,manual\n##    [4]     chr5 3103-4004      + |      0.42 centromere 2019,homology\n##    -------\n##    seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n\nNarrowing each genomic range in a GRanges object by a certain number of bases:\n\n\ngr\n##  GRanges object with 4 ranges and 3 metadata columns:\n##        seqnames    ranges strand |        GC annotation extended.info\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt;   &lt;factor&gt;        &lt;list&gt;\n##    [1]     chr2 2004-7853      + |      0.45 NA                  &lt;NA&gt;\n##    [2]     chr4 4482-9873      - |      0.43 promoter     2023,manual\n##    [3]     chr5 1943-4203      + |      0.44 enhancer     2021,manual\n##    [4]     chr5 4103-5004      + |      0.42 centromere 2019,homology\n##    -------\n##    seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n# ----- Extract 21st-40th subrange for each range in `gr`:\nnarrow(gr, start = 21, end = 40)\n##  GRanges object with 4 ranges and 3 metadata columns:\n##        seqnames    ranges strand |        GC annotation extended.info\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt;   &lt;factor&gt;        &lt;list&gt;\n##    [1]     chr2 2024-2043      + |      0.45 NA                  &lt;NA&gt;\n##    [2]     chr4 4502-4521      - |      0.43 promoter     2023,manual\n##    [3]     chr5 1963-1982      + |      0.44 enhancer     2021,manual\n##    [4]     chr5 4123-4142      + |      0.42 centromere 2019,homology\n##    -------\n##    seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\nwidth(narrow(gr, start = 21, end = 40))\n##  [1] 20 20 20 20\n\n\nResizing each genomic range in a GRanges object to a certain number of bases:\n\n\ngr\n##  GRanges object with 4 ranges and 3 metadata columns:\n##        seqnames    ranges strand |        GC annotation extended.info\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt;   &lt;factor&gt;        &lt;list&gt;\n##    [1]     chr2 2004-7853      + |      0.45 NA                  &lt;NA&gt;\n##    [2]     chr4 4482-9873      - |      0.43 promoter     2023,manual\n##    [3]     chr5 1943-4203      + |      0.44 enhancer     2021,manual\n##    [4]     chr5 4103-5004      + |      0.42 centromere 2019,homology\n##    -------\n##    seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n# ----- Resize `gr` entries to 100, fixed at the start of each range:\nresize(gr, 100, fix = \"start\")\n##  GRanges object with 4 ranges and 3 metadata columns:\n##        seqnames    ranges strand |        GC annotation extended.info\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt;   &lt;factor&gt;        &lt;list&gt;\n##    [1]     chr2 2004-2103      + |      0.45 NA                  &lt;NA&gt;\n##    [2]     chr4 9774-9873      - |      0.43 promoter     2023,manual\n##    [3]     chr5 1943-2042      + |      0.44 enhancer     2021,manual\n##    [4]     chr5 4103-4202      + |      0.42 centromere 2019,homology\n##    -------\n##    seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n# ----- Resize `gr` entries to 100, fixed at the start of each range, disregarding strand information:\nresize(gr, 100, fix = \"start\", ignore.strand = TRUE)\n##  GRanges object with 4 ranges and 3 metadata columns:\n##        seqnames    ranges strand |        GC annotation extended.info\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt;   &lt;factor&gt;        &lt;list&gt;\n##    [1]     chr2 2004-2103      + |      0.45 NA                  &lt;NA&gt;\n##    [2]     chr4 4482-4581      - |      0.43 promoter     2023,manual\n##    [3]     chr5 1943-2042      + |      0.44 enhancer     2021,manual\n##    [4]     chr5 4103-4202      + |      0.42 centromere 2019,homology\n##    -------\n##    seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n# ----- Resize `gr` entries to 1 bp, fixed at the center of each range:\nresize(gr, 1, fix = \"center\")\n##  GRanges object with 4 ranges and 3 metadata columns:\n##        seqnames    ranges strand |        GC annotation extended.info\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt;   &lt;factor&gt;        &lt;list&gt;\n##    [1]     chr2      4928      + |      0.45 NA                  &lt;NA&gt;\n##    [2]     chr4      7177      - |      0.43 promoter     2023,manual\n##    [3]     chr5      3073      + |      0.44 enhancer     2021,manual\n##    [4]     chr5      4553      + |      0.42 centromere 2019,homology\n##    -------\n##    seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n\nExtracting flanking coordinates for each entry in gr:\n\n\ngr\n##  GRanges object with 4 ranges and 3 metadata columns:\n##        seqnames    ranges strand |        GC annotation extended.info\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt;   &lt;factor&gt;        &lt;list&gt;\n##    [1]     chr2 2004-7853      + |      0.45 NA                  &lt;NA&gt;\n##    [2]     chr4 4482-9873      - |      0.43 promoter     2023,manual\n##    [3]     chr5 1943-4203      + |      0.44 enhancer     2021,manual\n##    [4]     chr5 4103-5004      + |      0.42 centromere 2019,homology\n##    -------\n##    seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n# ----- Extract 100bp UPSTREAM of each genomic range, according to range strandness:\nflank(gr, 100, start = TRUE)\n##  GRanges object with 4 ranges and 3 metadata columns:\n##        seqnames    ranges strand |        GC annotation extended.info\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt;   &lt;factor&gt;        &lt;list&gt;\n##    [1]     chr2 1904-2003      + |      0.45 NA                  &lt;NA&gt;\n##    [2]     chr4 9874-9973      - |      0.43 promoter     2023,manual\n##    [3]     chr5 1843-1942      + |      0.44 enhancer     2021,manual\n##    [4]     chr5 4003-4102      + |      0.42 centromere 2019,homology\n##    -------\n##    seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n# ----- Extract 1bp DOWNSTREAM of each genomic range, according to range strandness:\nflank(gr, 1, start = FALSE)\n##  GRanges object with 4 ranges and 3 metadata columns:\n##        seqnames    ranges strand |        GC annotation extended.info\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt;   &lt;factor&gt;        &lt;list&gt;\n##    [1]     chr2      7854      + |      0.45 NA                  &lt;NA&gt;\n##    [2]     chr4      4481      - |      0.43 promoter     2023,manual\n##    [3]     chr5      4204      + |      0.44 enhancer     2021,manual\n##    [4]     chr5      5005      + |      0.42 centromere 2019,homology\n##    -------\n##    seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\nNote how here again, strand information is crucial and correctly leveraged to extract “upstream” or “downstream” flanking regions in agreement with genomic range orientation.\n\nSeveral arithmetics operators can also directly work with GRanges:\n\n\ngr\n##  GRanges object with 4 ranges and 3 metadata columns:\n##        seqnames    ranges strand |        GC annotation extended.info\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt;   &lt;factor&gt;        &lt;list&gt;\n##    [1]     chr2 2004-7853      + |      0.45 NA                  &lt;NA&gt;\n##    [2]     chr4 4482-9873      - |      0.43 promoter     2023,manual\n##    [3]     chr5 1943-4203      + |      0.44 enhancer     2021,manual\n##    [4]     chr5 4103-5004      + |      0.42 centromere 2019,homology\n##    -------\n##    seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\ngr + 100 # ----- Extend each side of the `GRanges` by a given number of bases\n##  GRanges object with 4 ranges and 3 metadata columns:\n##        seqnames    ranges strand |        GC annotation extended.info\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt;   &lt;factor&gt;        &lt;list&gt;\n##    [1]     chr2 1904-7953      + |      0.45 NA                  &lt;NA&gt;\n##    [2]     chr4 4382-9973      - |      0.43 promoter     2023,manual\n##    [3]     chr5 1843-4303      + |      0.44 enhancer     2021,manual\n##    [4]     chr5 4003-5104      + |      0.42 centromere 2019,homology\n##    -------\n##    seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\ngr - 200 # ----- Shrink each side of the `GRanges` by a given number of bases \n##  GRanges object with 4 ranges and 3 metadata columns:\n##        seqnames    ranges strand |        GC annotation extended.info\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt;   &lt;factor&gt;        &lt;list&gt;\n##    [1]     chr2 2204-7653      + |      0.45 NA                  &lt;NA&gt;\n##    [2]     chr4 4682-9673      - |      0.43 promoter     2023,manual\n##    [3]     chr5 2143-4003      + |      0.44 enhancer     2021,manual\n##    [4]     chr5 4303-4804      + |      0.42 centromere 2019,homology\n##    -------\n##    seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\ngr * 1000 # ----- Zoom in by a given factor (effectively decreasing the `GRanges` width by the same factor)\n##  GRanges object with 4 ranges and 3 metadata columns:\n##        seqnames    ranges strand |        GC annotation extended.info\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt;   &lt;factor&gt;        &lt;list&gt;\n##    [1]     chr2 4926-4930      + |      0.45 NA                  &lt;NA&gt;\n##    [2]     chr4 7175-7179      - |      0.43 promoter     2023,manual\n##    [3]     chr5 3072-3073      + |      0.44 enhancer     2021,manual\n##    [4]     chr5 4554-4553      + |      0.42 centromere 2019,homology\n##    -------\n##    seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n\n\n\n\n\n\nWarningGoing further\n\n\n\nTo fully grasp how to operate GRanges objects, we highly recommend reading the detailed documentation for this class by typing ?GenomicRanges and ?GenomicRanges::`intra-range-methods`.\n\n\n\n2.1.3.2 Inter-range methods\nCompared to “intra-range” methods described above, inter-range methods involve comparisons between ranges in a single GRanges object.\n\n\n\n\n\n\nNoteNote\n\n\n\nCompared to previous section, the result of each function described below depends on the entire set of ranges in the input GRanges object.\n\n\n\nComputing the “inverse” genomic ranges, i.e. ranges in-between the input ranges:\n\n\ngaps(gr)\n##  GRanges object with 3 ranges and 0 metadata columns:\n##        seqnames    ranges strand\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt;\n##    [1]     chr2    1-2003      +\n##    [2]     chr4    1-4481      -\n##    [3]     chr5    1-1942      +\n##    -------\n##    seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n\nFor each entry in a GRanges, finding the index of the preceding/following/nearest genomic range:\n\n\nprecede(gr)\n##  [1] NA NA NA NA\n\nfollow(gr)\n##  [1] NA NA NA NA\n\nnearest(gr)\n##  [1] NA NA  4  3\n\n\nComputing a coverage over a genome, optionally indicated a “score” column from metadata:\n\n\ncoverage(gr, weight = 'GC')\n##  RleList of length 3\n##  $chr2\n##  numeric-Rle of length 7853 with 2 runs\n##    Lengths: 2003 5850\n##    Values : 0.00 0.45\n##  \n##  $chr4\n##  numeric-Rle of length 9873 with 2 runs\n##    Lengths: 4481 5392\n##    Values : 0.00 0.43\n##  \n##  $chr5\n##  numeric-Rle of length 5004 with 4 runs\n##    Lengths: 1942 2160  101  801\n##    Values : 0.00 0.44 0.86 0.42\n\n\n\n\n\n\n\nWarningGoing further\n\n\n\nTo fully grasp how to operate GRanges objects, we highly recommend reading the detailed documentation for this class by typing ?GenomicRanges::`inter-range-methods`.\n\n\n\n2.1.4 Comparing multiple GRanges objects\nGenomic analysis typically requires intersection of two sets of genomic ranges, e.g. to find which ranges from one set overlap with those from another set.\nIn the next examples, we will use two GRanges:\n\n\npeaks represents dummy 8 ChIP-seq peaks\n\n\npeaks &lt;- GRanges(c(\n    'chr1:320-418',\n    'chr1:512-567',\n    'chr1:843-892',\n    'chr1:1221-1317', \n    'chr1:1329-1372', \n    'chr1:1852-1909', \n    'chr1:2489-2532', \n    'chr1:2746-2790'\n))\npeaks\n##  GRanges object with 8 ranges and 0 metadata columns:\n##        seqnames    ranges strand\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt;\n##    [1]     chr1   320-418      *\n##    [2]     chr1   512-567      *\n##    [3]     chr1   843-892      *\n##    [4]     chr1 1221-1317      *\n##    [5]     chr1 1329-1372      *\n##    [6]     chr1 1852-1909      *\n##    [7]     chr1 2489-2532      *\n##    [8]     chr1 2746-2790      *\n##    -------\n##    seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\n\n\nTSSs represents dummy 3 gene promoters (± 10bp around the TSS)\n\n\ngenes &lt;- GRanges(c(\n    'chr1:358-1292:+',\n    'chr1:1324-2343:+', \n    'chr1:2732-2751:+'\n))\nTSSs &lt;- resize(genes, width = 1, fix = 'start') + 10\nTSSs\n##  GRanges object with 3 ranges and 0 metadata columns:\n##        seqnames    ranges strand\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt;\n##    [1]     chr1   348-368      +\n##    [2]     chr1 1314-1334      +\n##    [3]     chr1 2722-2742      +\n##    -------\n##    seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\nLet’s see how they overlap by plotting them:\n\nlibrary(ggplot2)\npeaks$type &lt;- 'peaks'\nTSSs$type &lt;- 'TSSs'\nggplot() + \n    ggbio::geom_rect(c(peaks, TSSs), aes(fill = type), facets = type~.) + \n    ggbio::theme_alignment() + \n    coord_fixed(ratio = 300)\n##  Scale for y is already present.\n##  Adding another scale for y, which will replace the existing scale.\n\n\n\n\n\n\n\n\n2.1.4.1 Finding overlaps between two GRanges sets\n\nFinding overlaps between a query and a subject\n\nIn our case, we want to identify which ChIP-seq peaks overlap with a TSS: the query is the set of peaks and the subject is the set of TSSs.\nfindOverlaps returns a Hits object listing which query ranges overlap with which subject ranges.\n\nov &lt;- findOverlaps(query = peaks, subject = TSSs)\nov\n##  Hits object with 3 hits and 0 metadata columns:\n##        queryHits subjectHits\n##        &lt;integer&gt;   &lt;integer&gt;\n##    [1]         1           1\n##    [2]         4           2\n##    [3]         5           2\n##    -------\n##    queryLength: 8 / subjectLength: 3\n\nThe Hits output clearly describes what overlaps with what:\n\nThe query (peak) #1 overlaps with subject (TSS) #1\n\nThe query (peak) #5 overlaps with subject (TSS) #2\n\n\n\n\n\n\n\n\nNoteNote\n\n\n\nBecause no other query index or subject index is listed in the ov output, none of the remaining ranges from query overlap with ranges from subject.\n\n\n\nSubsetting by overlaps between a query and a subject\n\nTo directly subset ranges from query overlapping with ranges from a subject (e.g. to only keep peaks overlapping a TSS), we can use the subsetByOverlaps function. The output of subsetByOverlaps is a subset of the original GRanges object provided as a query, with retained ranges being unmodified.\n\nsubsetByOverlaps(peaks, TSSs)\n##  GRanges object with 3 ranges and 1 metadata column:\n##        seqnames    ranges strand |        type\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;character&gt;\n##    [1]     chr1   320-418      * |       peaks\n##    [2]     chr1 1221-1317      * |       peaks\n##    [3]     chr1 1329-1372      * |       peaks\n##    -------\n##    seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\n\nCounting overlaps between a query and a subject\n\nFinally, the countOverlaps is used to count, for each range in a query, how many ranges in the subject it overlaps with.\n\ncountOverlaps(query = peaks, subject = TSSs)\n##  [1] 1 0 0 1 1 0 0 0\n\n\n\n\n\n\n\nNoteNote\n\n\n\nNote that which GRanges goes in query or subject is crucial! Counting for each peak, the number of TSSs it overlaps with is very different from for each TSS, how many peaks it overlaps with.\nIn our case example, it would also be informative to count how many peaks overlap with each TSS, so we’d need to swap query and subject:\n\ncountOverlaps(query = TSSs, subject = peaks)\n##  [1] 1 2 0\n\nWe can add these counts to the original query object:\n\nTSSs$n_peaks &lt;- countOverlaps(query = TSSs, subject = peaks)\nTSSs\n##  GRanges object with 3 ranges and 2 metadata columns:\n##        seqnames    ranges strand |        type   n_peaks\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;character&gt; &lt;integer&gt;\n##    [1]     chr1   348-368      + |        TSSs         1\n##    [2]     chr1 1314-1334      + |        TSSs         2\n##    [3]     chr1 2722-2742      + |        TSSs         0\n##    -------\n##    seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\n\n\n\n\n%over%, %within%, %outside% : handy operators\n\nHandy operators exist that return logical vectors (same length as the query). They essentially are short-hands for specific findOverlaps() cases.\n&lt;query&gt; %over% &lt;subject&gt;:\n\npeaks %over% TSSs\n##  [1]  TRUE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE\n\npeaks[peaks %over% TSSs] # ----- Equivalent to `subsetByOverlaps(peaks, TSSs)`\n##  GRanges object with 3 ranges and 1 metadata column:\n##        seqnames    ranges strand |        type\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;character&gt;\n##    [1]     chr1   320-418      * |       peaks\n##    [2]     chr1 1221-1317      * |       peaks\n##    [3]     chr1 1329-1372      * |       peaks\n##    -------\n##    seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\n&lt;query&gt; %within% &lt;subject&gt;:\n\npeaks %within% TSSs\n##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE\n\nTSSs %within% peaks\n##  [1]  TRUE FALSE FALSE\n\n&lt;query&gt; %outside% &lt;subject&gt;:\n\npeaks %outside% TSSs\n##  [1] FALSE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE\n\n\n\n\n\n\n\nWarningGoing further\n\n\n\nTo fully grasp how to find overlaps between GRanges objects, we highly recommend reading the detailed documentation by typing ?IRanges::`findOverlaps-methods`.\n\n\n\n2.1.4.2 Find nearest range from a subject for each range in a query\n*Overlaps methods are not always enough to match a query to a subject. For instance, some peaks in the query might be very near to some TSSs in the subject, but not quite overlapping.\n\npeaks[8]\n##  GRanges object with 1 range and 1 metadata column:\n##        seqnames    ranges strand |        type\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;character&gt;\n##    [1]     chr1 2746-2790      * |       peaks\n##    -------\n##    seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\nTSSs[3]\n##  GRanges object with 1 range and 2 metadata columns:\n##        seqnames    ranges strand |        type   n_peaks\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;character&gt; &lt;integer&gt;\n##    [1]     chr1 2722-2742      + |        TSSs         0\n##    -------\n##    seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\n\nnearest()\n\nRather than finding the overlapping range in a subject for each range in a query, we can find the nearest range.\nFor each range in the query, this returns the index of the range in the subject to which the query is the nearest.\n\nnearest(peaks, TSSs)\n##  [1] 1 1 2 2 2 2 3 3\n\nTSSs[nearest(peaks, TSSs)]\n##  GRanges object with 8 ranges and 2 metadata columns:\n##        seqnames    ranges strand |        type   n_peaks\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt; | &lt;character&gt; &lt;integer&gt;\n##    [1]     chr1   348-368      + |        TSSs         1\n##    [2]     chr1   348-368      + |        TSSs         1\n##    [3]     chr1 1314-1334      + |        TSSs         2\n##    [4]     chr1 1314-1334      + |        TSSs         2\n##    [5]     chr1 1314-1334      + |        TSSs         2\n##    [6]     chr1 1314-1334      + |        TSSs         2\n##    [7]     chr1 2722-2742      + |        TSSs         0\n##    [8]     chr1 2722-2742      + |        TSSs         0\n##    -------\n##    seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\n\ndistance()\n\nAlternatively, one can simply ask to calculate the distanceToNearest between ranges in a query and ranges in a subject.\n\ndistanceToNearest(peaks, TSSs)\n##  Hits object with 8 hits and 1 metadata column:\n##        queryHits subjectHits |  distance\n##        &lt;integer&gt;   &lt;integer&gt; | &lt;integer&gt;\n##    [1]         1           1 |         0\n##    [2]         2           1 |       143\n##    [3]         3           2 |       421\n##    [4]         4           2 |         0\n##    [5]         5           2 |         0\n##    [6]         6           2 |       517\n##    [7]         7           3 |       189\n##    [8]         8           3 |         3\n##    -------\n##    queryLength: 8 / subjectLength: 3\n\npeaks$distance_to_nearest_TSS &lt;- mcols(distanceToNearest(peaks, TSSs))$distance\n\nNote how close from a TSS the 8th peak was. It could be worth considering this as an overlap!",
    "crumbs": [
      "Fundamentals concepts",
      "<span class='chapter-number'>2</span>  <span class='chapter-title'>Hi-C data structures in R</span>"
    ]
  },
  {
    "objectID": "pages/data-representation.html#ginteractions-class",
    "href": "pages/data-representation.html#ginteractions-class",
    "title": "\n2  Hi-C data structures in R\n",
    "section": "\n2.2 GInteractions class",
    "text": "2.2 GInteractions class\nGRanges describe genomic ranges and hence are of general use to study 1D genome organization. To study chromatin interactions, we need a way to link pairs of GRanges. This is exactly what the GInteractions class does. This data structure is defined in the InteractionSet package and has been published in the 2016 paper by Lun et al. (Lun et al. (2016)).\n\n\n2.2.1 Building a GInteractions object from scratch\nLet’s first define two parallel GRanges objects (i.e. two GRanges of same length). Each GRanges will contain 5 ranges.\n\ngr_first &lt;- GRanges(c(\n    'chr1:1-100', \n    'chr1:1001-2000', \n    'chr1:5001-6000', \n    'chr1:8001-9000', \n    'chr1:7001-8000'  \n))\ngr_second &lt;- GRanges(c(\n    'chr1:1-100', \n    'chr1:3001-4000', \n    'chr1:8001-9000', \n    'chr1:7001-8000', \n    'chr2:13000-14000'  \n))\n\nBecause these two GRanges objects are of same length (5), one can “bind” them together by using the GInteractionsfunction. This effectively associate each entry from one GRanges to the entry aligned in the other GRanges object.\n\nlibrary(InteractionSet)\ngi &lt;- GInteractions(gr_first, gr_second)\ngi\n##  GInteractions object with 5 interactions and 0 metadata columns:\n##        seqnames1   ranges1     seqnames2     ranges2\n##            &lt;Rle&gt; &lt;IRanges&gt;         &lt;Rle&gt;   &lt;IRanges&gt;\n##    [1]      chr1     1-100 ---      chr1       1-100\n##    [2]      chr1 1001-2000 ---      chr1   3001-4000\n##    [3]      chr1 5001-6000 ---      chr1   8001-9000\n##    [4]      chr1 8001-9000 ---      chr1   7001-8000\n##    [5]      chr1 7001-8000 ---      chr2 13000-14000\n##    -------\n##    regions: 7 ranges and 0 metadata columns\n##    seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\nThe way GInteractions objects are printed in an R console mimics that of GRanges, but pairs two “ends” (a.k.a. anchors) of an interaction together, each end being represented as a separate GRanges range.\n\nNote that it is possible to have interactions joining two identical anchors.\n\n\ngi[1]\n##  GInteractions object with 1 interaction and 0 metadata columns:\n##        seqnames1   ranges1     seqnames2   ranges2\n##            &lt;Rle&gt; &lt;IRanges&gt;         &lt;Rle&gt; &lt;IRanges&gt;\n##    [1]      chr1     1-100 ---      chr1     1-100\n##    -------\n##    regions: 7 ranges and 0 metadata columns\n##    seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\n\nIt is also technically possible (though not advised) to have interactions for which the “first” end is located after the “second” end along the chromosome.\n\n\ngi[4]\n##  GInteractions object with 1 interaction and 0 metadata columns:\n##        seqnames1   ranges1     seqnames2   ranges2\n##            &lt;Rle&gt; &lt;IRanges&gt;         &lt;Rle&gt; &lt;IRanges&gt;\n##    [1]      chr1 8001-9000 ---      chr1 7001-8000\n##    -------\n##    regions: 7 ranges and 0 metadata columns\n##    seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\n\nFinally, it is possible to define inter-chromosomal interactions (a.k.a. trans interactions).\n\n\ngi[5]\n##  GInteractions object with 1 interaction and 0 metadata columns:\n##        seqnames1   ranges1     seqnames2     ranges2\n##            &lt;Rle&gt; &lt;IRanges&gt;         &lt;Rle&gt;   &lt;IRanges&gt;\n##    [1]      chr1 7001-8000 ---      chr2 13000-14000\n##    -------\n##    regions: 7 ranges and 0 metadata columns\n##    seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\n\n2.2.2 GInteractions specific slots\nCompared to GRanges, extra slots are available for GInteractions objects, e.g. anchors and regions.\n\n2.2.2.1 Anchors\n“Anchors” of a single genomic interaction refer to the two ends of this interaction. These anchors can be extracted from a GInteractions object using the anchors() function. This outputs a list of two GRanges, the first corresponding to the “left” end of interactions (when printed to the console) and the second corresponding to the “right” end of interactions (when printed to the console).\n\n# ----- This extracts the two sets of anchors (\"first\" and \"second\") from a GInteractions object\nanchors(gi)\n##  $first\n##  GRanges object with 5 ranges and 0 metadata columns:\n##        seqnames    ranges strand\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt;\n##    [1]     chr1     1-100      *\n##    [2]     chr1 1001-2000      *\n##    [3]     chr1 5001-6000      *\n##    [4]     chr1 8001-9000      *\n##    [5]     chr1 7001-8000      *\n##    -------\n##    seqinfo: 2 sequences from an unspecified genome; no seqlengths\n##  \n##  $second\n##  GRanges object with 5 ranges and 0 metadata columns:\n##        seqnames      ranges strand\n##           &lt;Rle&gt;   &lt;IRanges&gt;  &lt;Rle&gt;\n##    [1]     chr1       1-100      *\n##    [2]     chr1   3001-4000      *\n##    [3]     chr1   8001-9000      *\n##    [4]     chr1   7001-8000      *\n##    [5]     chr2 13000-14000      *\n##    -------\n##    seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\n# ----- We can query for the \"first\" or \"second\" set of anchors directly\nanchors(gi, \"first\")\n##  GRanges object with 5 ranges and 0 metadata columns:\n##        seqnames    ranges strand\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt;\n##    [1]     chr1     1-100      *\n##    [2]     chr1 1001-2000      *\n##    [3]     chr1 5001-6000      *\n##    [4]     chr1 8001-9000      *\n##    [5]     chr1 7001-8000      *\n##    -------\n##    seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\nanchors(gi, \"second\")\n##  GRanges object with 5 ranges and 0 metadata columns:\n##        seqnames      ranges strand\n##           &lt;Rle&gt;   &lt;IRanges&gt;  &lt;Rle&gt;\n##    [1]     chr1       1-100      *\n##    [2]     chr1   3001-4000      *\n##    [3]     chr1   8001-9000      *\n##    [4]     chr1   7001-8000      *\n##    [5]     chr2 13000-14000      *\n##    -------\n##    seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\n\n2.2.2.2 Regions\n“Regions” of a set of interactions refer to the universe of unique anchors represented in a set of interactions. Therefore, the length of the regions can only be equal to or strictly lower than twice the length of anchors.\nThe regions function returns the regions associated with a GInteractions object, stored as a GRanges object.\n\nregions(gi)\n##  GRanges object with 7 ranges and 0 metadata columns:\n##        seqnames      ranges strand\n##           &lt;Rle&gt;   &lt;IRanges&gt;  &lt;Rle&gt;\n##    [1]     chr1       1-100      *\n##    [2]     chr1   1001-2000      *\n##    [3]     chr1   3001-4000      *\n##    [4]     chr1   5001-6000      *\n##    [5]     chr1   7001-8000      *\n##    [6]     chr1   8001-9000      *\n##    [7]     chr2 13000-14000      *\n##    -------\n##    seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\nlength(regions(gi))\n##  [1] 7\n\nlength(anchors(gi, \"first\"))\n##  [1] 5\n\n\n2.2.3 GInteractions methods\nGInteractions behave as an extension of GRanges. For this reason, many methods that work with GRanges will work seamlessly with GInteractions.\n\n2.2.3.1 Metadata\nOne can add metadata columns directly to a GInteractions object.\n\nmcols(gi)\n##  DataFrame with 5 rows and 0 columns\nmcols(gi) &lt;- data.frame(\n    idx = seq(1, length(gi)),\n    type = c(\"cis\", \"cis\", \"cis\", \"cis\", \"trans\")\n)\ngi\n##  GInteractions object with 5 interactions and 2 metadata columns:\n##        seqnames1   ranges1     seqnames2     ranges2 |       idx        type\n##            &lt;Rle&gt; &lt;IRanges&gt;         &lt;Rle&gt;   &lt;IRanges&gt; | &lt;integer&gt; &lt;character&gt;\n##    [1]      chr1     1-100 ---      chr1       1-100 |         1         cis\n##    [2]      chr1 1001-2000 ---      chr1   3001-4000 |         2         cis\n##    [3]      chr1 5001-6000 ---      chr1   8001-9000 |         3         cis\n##    [4]      chr1 8001-9000 ---      chr1   7001-8000 |         4         cis\n##    [5]      chr1 7001-8000 ---      chr2 13000-14000 |         5       trans\n##    -------\n##    regions: 7 ranges and 0 metadata columns\n##    seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\ngi$type\n##  [1] \"cis\"   \"cis\"   \"cis\"   \"cis\"   \"trans\"\n\nImportantly, metadata columns can also be directly added to regions of a GInteractions object, since these regions are a GRanges object themselves!\n\nregions(gi)\n##  GRanges object with 7 ranges and 0 metadata columns:\n##        seqnames      ranges strand\n##           &lt;Rle&gt;   &lt;IRanges&gt;  &lt;Rle&gt;\n##    [1]     chr1       1-100      *\n##    [2]     chr1   1001-2000      *\n##    [3]     chr1   3001-4000      *\n##    [4]     chr1   5001-6000      *\n##    [5]     chr1   7001-8000      *\n##    [6]     chr1   8001-9000      *\n##    [7]     chr2 13000-14000      *\n##    -------\n##    seqinfo: 2 sequences from an unspecified genome; no seqlengths\nregions(gi)$binID &lt;- seq_along(regions(gi))\nregions(gi)$type &lt;- c(\"P\", \"P\", \"P\", \"E\", \"E\", \"P\", \"P\")\nregions(gi)\n##  GRanges object with 7 ranges and 2 metadata columns:\n##        seqnames      ranges strand |     binID        type\n##           &lt;Rle&gt;   &lt;IRanges&gt;  &lt;Rle&gt; | &lt;integer&gt; &lt;character&gt;\n##    [1]     chr1       1-100      * |         1           P\n##    [2]     chr1   1001-2000      * |         2           P\n##    [3]     chr1   3001-4000      * |         3           P\n##    [4]     chr1   5001-6000      * |         4           E\n##    [5]     chr1   7001-8000      * |         5           E\n##    [6]     chr1   8001-9000      * |         6           P\n##    [7]     chr2 13000-14000      * |         7           P\n##    -------\n##    seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\n\n2.2.3.2 Sorting GInteractions\n\nThe sort function works seamlessly with GInteractions objects. It sorts the interactions using a similar approach to that performed by pairtools sort ... for disk-stored .pairs files, sorting on the “first” anchor first, then for interactions with the same “first” anchors, sorting on the “second” anchor.\n\ngi\n##  GInteractions object with 5 interactions and 2 metadata columns:\n##        seqnames1   ranges1     seqnames2     ranges2 |       idx        type\n##            &lt;Rle&gt; &lt;IRanges&gt;         &lt;Rle&gt;   &lt;IRanges&gt; | &lt;integer&gt; &lt;character&gt;\n##    [1]      chr1     1-100 ---      chr1       1-100 |         1         cis\n##    [2]      chr1 1001-2000 ---      chr1   3001-4000 |         2         cis\n##    [3]      chr1 5001-6000 ---      chr1   8001-9000 |         3         cis\n##    [4]      chr1 8001-9000 ---      chr1   7001-8000 |         4         cis\n##    [5]      chr1 7001-8000 ---      chr2 13000-14000 |         5       trans\n##    -------\n##    regions: 7 ranges and 2 metadata columns\n##    seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\nsort(gi)\n##  GInteractions object with 5 interactions and 2 metadata columns:\n##        seqnames1   ranges1     seqnames2     ranges2 |       idx        type\n##            &lt;Rle&gt; &lt;IRanges&gt;         &lt;Rle&gt;   &lt;IRanges&gt; | &lt;integer&gt; &lt;character&gt;\n##    [1]      chr1     1-100 ---      chr1       1-100 |         1         cis\n##    [2]      chr1 1001-2000 ---      chr1   3001-4000 |         2         cis\n##    [3]      chr1 5001-6000 ---      chr1   8001-9000 |         3         cis\n##    [4]      chr1 7001-8000 ---      chr2 13000-14000 |         5       trans\n##    [5]      chr1 8001-9000 ---      chr1   7001-8000 |         4         cis\n##    -------\n##    regions: 7 ranges and 2 metadata columns\n##    seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\n\n2.2.3.3 Swapping GInteractions anchors\nFor an individual interaction contained in a GInteractions object, the “first” and “second” anchors themselves can be sorted as well. This is called “pairs swapping”, and it is performed similarly to pairtools flip ... for disk-stored .pairs files. This ensures that interactions, when represented as a contact matrix, generate an upper-triangular matrix.\n\ngi\n##  GInteractions object with 5 interactions and 2 metadata columns:\n##        seqnames1   ranges1     seqnames2     ranges2 |       idx        type\n##            &lt;Rle&gt; &lt;IRanges&gt;         &lt;Rle&gt;   &lt;IRanges&gt; | &lt;integer&gt; &lt;character&gt;\n##    [1]      chr1     1-100 ---      chr1       1-100 |         1         cis\n##    [2]      chr1 1001-2000 ---      chr1   3001-4000 |         2         cis\n##    [3]      chr1 5001-6000 ---      chr1   8001-9000 |         3         cis\n##    [4]      chr1 8001-9000 ---      chr1   7001-8000 |         4         cis\n##    [5]      chr1 7001-8000 ---      chr2 13000-14000 |         5       trans\n##    -------\n##    regions: 7 ranges and 2 metadata columns\n##    seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\nswapAnchors(gi)\n##  GInteractions object with 5 interactions and 2 metadata columns:\n##        seqnames1   ranges1     seqnames2     ranges2 |       idx        type\n##            &lt;Rle&gt; &lt;IRanges&gt;         &lt;Rle&gt;   &lt;IRanges&gt; | &lt;integer&gt; &lt;character&gt;\n##    [1]      chr1     1-100 ---      chr1       1-100 |         1         cis\n##    [2]      chr1 1001-2000 ---      chr1   3001-4000 |         2         cis\n##    [3]      chr1 5001-6000 ---      chr1   8001-9000 |         3         cis\n##    [4]      chr1 7001-8000 ---      chr1   8001-9000 |         4         cis\n##    [5]      chr1 7001-8000 ---      chr2 13000-14000 |         5       trans\n##    -------\n##    regions: 7 ranges and 2 metadata columns\n##    seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\n\n\n\n\n\n\nWarningNote\n\n\n\n“Sorting” and “swapping” a GInteractions object are two entirely different actions:\n\n“sorting” reorganizes all rows (interactions);\n“swapping” anchors reorganizes “first” and “second” anchors for each interaction independently.\n\n\n\n\n2.2.3.4 GInteractions distance method\n“Distance”, when applied to genomic interactions, typically refers to the genomic distance between the two anchors of a single interaction. For GInteractions, this is computed using the pairdist function.\n\ngi\n##  GInteractions object with 5 interactions and 2 metadata columns:\n##        seqnames1   ranges1     seqnames2     ranges2 |       idx        type\n##            &lt;Rle&gt; &lt;IRanges&gt;         &lt;Rle&gt;   &lt;IRanges&gt; | &lt;integer&gt; &lt;character&gt;\n##    [1]      chr1     1-100 ---      chr1       1-100 |         1         cis\n##    [2]      chr1 1001-2000 ---      chr1   3001-4000 |         2         cis\n##    [3]      chr1 5001-6000 ---      chr1   8001-9000 |         3         cis\n##    [4]      chr1 8001-9000 ---      chr1   7001-8000 |         4         cis\n##    [5]      chr1 7001-8000 ---      chr2 13000-14000 |         5       trans\n##    -------\n##    regions: 7 ranges and 2 metadata columns\n##    seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\npairdist(gi)\n##  [1]    0 2000 3000 1000   NA\n\nNote that for “trans” inter-chromosomal interactions, i.e. interactions with anchors on different chromosomes, the notion of genomic distance is meaningless and for this reason, pairdist returns a NA value.\nThe type argument of the pairdist() function can be tweaked to specify which type of “distance” should be computed:\n\n\nmid: The distance between the midpoints of the two regions (rounded down to the nearest integer) is returned (Default).\n\ngap: The length of the gap between the closest points of the two regions is computed - negative lengths are returned for overlapping regions, indicating the length of the overlap.\n\nspan: The distance between the furthermost points of the two regions is computed.\n\ndiag: The difference between the anchor indices is returned. This corresponds to a diagonal on the interaction space when bins are used in the ‘regions’ slot of ‘x’.\n\n2.2.3.5 GInteractions overlap methods\n“Overlaps” for genomic interactions could be computed in different contexts:\n\nCase 1: Overlap between any of the two anchors of an interaction with a genomic range\nCase 2: Overlap between anchors of an interaction with anchors of another interaction\nCase 3: Spanning of the interaction “across” a genomic range\n\n\nCase 1: Overlap between any of the two anchors of an interaction with a genomic range\n\nThis is the default behavior of findOverlaps when providing a GInteractions object as query and a GRanges as a subject.\n\ngr &lt;- GRanges(c(\"chr1:7501-7600\", \"chr1:8501-8600\"))\nfindOverlaps(query = gi, subject = gr)\n##  Hits object with 4 hits and 0 metadata columns:\n##        queryHits subjectHits\n##        &lt;integer&gt;   &lt;integer&gt;\n##    [1]         3           2\n##    [2]         4           1\n##    [3]         4           2\n##    [4]         5           1\n##    -------\n##    queryLength: 5 / subjectLength: 2\n\ncountOverlaps(gi, gr)\n##  [1] 0 0 1 2 1\n\nsubsetByOverlaps(gi, gr)\n##  GInteractions object with 3 interactions and 2 metadata columns:\n##        seqnames1   ranges1     seqnames2     ranges2 |       idx        type\n##            &lt;Rle&gt; &lt;IRanges&gt;         &lt;Rle&gt;   &lt;IRanges&gt; | &lt;integer&gt; &lt;character&gt;\n##    [1]      chr1 5001-6000 ---      chr1   8001-9000 |         3         cis\n##    [2]      chr1 8001-9000 ---      chr1   7001-8000 |         4         cis\n##    [3]      chr1 7001-8000 ---      chr2 13000-14000 |         5       trans\n##    -------\n##    regions: 7 ranges and 2 metadata columns\n##    seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\nHere again, the order matters!\n\ncountOverlaps(gr, gi)\n##  [1] 2 2\n\nAnd again, the %over% operator can be used here:\n\ngi %over% gr\n##  [1] FALSE FALSE  TRUE  TRUE  TRUE\n\ngi[gi %over% gr] # ----- Equivalent to `subsetByOverlaps(gi, gr)`\n##  GInteractions object with 3 interactions and 2 metadata columns:\n##        seqnames1   ranges1     seqnames2     ranges2 |       idx        type\n##            &lt;Rle&gt; &lt;IRanges&gt;         &lt;Rle&gt;   &lt;IRanges&gt; | &lt;integer&gt; &lt;character&gt;\n##    [1]      chr1 5001-6000 ---      chr1   8001-9000 |         3         cis\n##    [2]      chr1 8001-9000 ---      chr1   7001-8000 |         4         cis\n##    [3]      chr1 7001-8000 ---      chr2 13000-14000 |         5       trans\n##    -------\n##    regions: 7 ranges and 2 metadata columns\n##    seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\n\nCase 2: Overlap between anchors of an interaction with anchors of another interaction\n\nThis slightly different scenario involves overlapping two sets of interactions, to see whether any interaction in Set-1 has its two anchors overlapping anchors from an interaction in Set-2.\n\ngi2 &lt;- GInteractions(\n    GRanges(\"chr1:1081-1090\"), \n    GRanges(\"chr1:3401-3501\")\n)\ngi %over% gi2\n##  [1] FALSE  TRUE FALSE FALSE FALSE\n\nNote that both anchors of an interaction from a query have to overlap to a pair of anchors of a single interaction from a subject with this method!\n\ngi3 &lt;- GInteractions(\n    GRanges(\"chr1:1-1000\"), \n    GRanges(\"chr1:3401-3501\")\n)\ngi %over% gi3\n##  [1] FALSE FALSE FALSE FALSE FALSE\n\n\nCase 3 : Spanning of the interaction “across” a genomic range\n\nThis requires a bit of wrangling, to mimic an overlap between two GRanges objects:\n\ngi &lt;- swapAnchors(gi) # ----- Make sure anchors are correctly sorted\ngi &lt;- sort(gi) # ----- Make sure interactions are correctly sorted\ngi &lt;- gi[!is.na(pairdist(gi))] # ----- Remove inter-chromosomal interactions\nspanning_gi &lt;- GRanges(\n    seqnames = seqnames(anchors(gi)[[1]]), \n    ranges = IRanges(\n        start(anchors(gi)[[1]]), \n        end(anchors(gi)[[2]])\n    )\n)\nspanning_gi \n##  GRanges object with 4 ranges and 0 metadata columns:\n##        seqnames    ranges strand\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt;\n##    [1]     chr1     1-100      *\n##    [2]     chr1 1001-4000      *\n##    [3]     chr1 5001-9000      *\n##    [4]     chr1 7001-9000      *\n##    -------\n##    seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\nspanning_gi %over% gr\n##  [1] FALSE FALSE  TRUE  TRUE\n\n\n\n\n\n\n\nWarningGoing further\n\n\n\nA detailed manual of overlap methods available for GInteractions object can be read by typing ?`Interaction-overlaps` in R.",
    "crumbs": [
      "Fundamentals concepts",
      "<span class='chapter-number'>2</span>  <span class='chapter-title'>Hi-C data structures in R</span>"
    ]
  },
  {
    "objectID": "pages/data-representation.html#contactfile-class",
    "href": "pages/data-representation.html#contactfile-class",
    "title": "\n2  Hi-C data structures in R\n",
    "section": "\n2.3 ContactFile class",
    "text": "2.3 ContactFile class\nHi-C contacts can be stored in four different formats (see previous chapter):\n\nAs a .(m)cool matrix (multi-scores, multi-resolution, indexed)\nAs a .hic matrix (multi-scores, multi-resolution, indexed)\nAs a HiC-pro derived matrix (single-score, single-resolution, non-indexed)\nUn-binned, Hi-C contacts can be stored in .pairs files\n\n\n2.3.1 Accessing example Hi-C files\nExample contact files can be downloaded using HiContactsData function.\n\nlibrary(HiContactsData)\ncoolf &lt;- HiContactsData('yeast_wt', 'mcool')\n##  see ?HiContactsData and browseVignettes('HiContactsData') for documentation\n##  loading from cache\n\nThis fetches files from the cloud, download them locally and returns the path of the local file.\n\ncoolf\n##                                                       EH7702 \n##  \"/home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752\"\n\nSimilarly, example files are available for other file formats:\n\nhicf &lt;- HiContactsData('yeast_wt', 'hic')\n##  see ?HiContactsData and browseVignettes('HiContactsData') for documentation\n##  loading from cache\nhicpromatrixf &lt;- HiContactsData('yeast_wt', 'hicpro_matrix')\n##  see ?HiContactsData and browseVignettes('HiContactsData') for documentation\n##  loading from cache\nhicproregionsf &lt;- HiContactsData('yeast_wt', 'hicpro_bed')\n##  see ?HiContactsData and browseVignettes('HiContactsData') for documentation\n##  loading from cache\npairsf &lt;- HiContactsData('yeast_wt', 'pairs.gz')\n##  see ?HiContactsData and browseVignettes('HiContactsData') for documentation\n##  loading from cache\n\nWe can even check the content of some of these files to make sure they are actually what they are:\n\n# ---- HiC-Pro generates a tab-separated `regions.bed` file\nreadLines(hicproregionsf, 25)\n##   [1] \"I\\t0\\t1000\"      \"I\\t1000\\t2000\"   \"I\\t2000\\t3000\"   \"I\\t3000\\t4000\"  \n##   [5] \"I\\t4000\\t5000\"   \"I\\t5000\\t6000\"   \"I\\t6000\\t7000\"   \"I\\t7000\\t8000\"  \n##   [9] \"I\\t8000\\t9000\"   \"I\\t9000\\t10000\"  \"I\\t10000\\t11000\" \"I\\t11000\\t12000\"\n##  [13] \"I\\t12000\\t13000\" \"I\\t13000\\t14000\" \"I\\t14000\\t15000\" \"I\\t15000\\t16000\"\n##  [17] \"I\\t16000\\t17000\" \"I\\t17000\\t18000\" \"I\\t18000\\t19000\" \"I\\t19000\\t20000\"\n##  [21] \"I\\t20000\\t21000\" \"I\\t21000\\t22000\" \"I\\t22000\\t23000\" \"I\\t23000\\t24000\"\n##  [25] \"I\\t24000\\t25000\"\n\n# ---- Pairs are also tab-separated \nreadLines(pairsf, 25)\n##   [1] \"## pairs format v1.0\"                                                             \n##   [2] \"#sorted: chr1-pos1-chr2-pos2\"                                                     \n##   [3] \"#columns: readID chr1 pos1 chr2 pos2 strand1 strand2 frag1 frag2\"                 \n##   [4] \"#chromsize: I 230218\"                                                             \n##   [5] \"#chromsize: II 813184\"                                                            \n##   [6] \"#chromsize: III 316620\"                                                           \n##   [7] \"#chromsize: IV 1531933\"                                                           \n##   [8] \"#chromsize: V 576874\"                                                             \n##   [9] \"#chromsize: VI 270161\"                                                            \n##  [10] \"#chromsize: VII 1090940\"                                                          \n##  [11] \"#chromsize: VIII 562643\"                                                          \n##  [12] \"#chromsize: IX 439888\"                                                            \n##  [13] \"#chromsize: X 745751\"                                                             \n##  [14] \"#chromsize: XI 666816\"                                                            \n##  [15] \"#chromsize: XII 1078177\"                                                          \n##  [16] \"#chromsize: XIII 924431\"                                                          \n##  [17] \"#chromsize: XIV 784333\"                                                           \n##  [18] \"#chromsize: XV 1091291\"                                                           \n##  [19] \"#chromsize: XVI 948066\"                                                           \n##  [20] \"#chromsize: Mito 85779\"                                                           \n##  [21] \"NS500150:527:HHGYNBGXF:3:21611:19085:3986\\tII\\t105\\tII\\t48548\\t+\\t-\\t1358\\t1681\"  \n##  [22] \"NS500150:527:HHGYNBGXF:4:13604:19734:2406\\tII\\t113\\tII\\t45003\\t-\\t+\\t1358\\t1658\"  \n##  [23] \"NS500150:527:HHGYNBGXF:2:11108:25178:11036\\tII\\t119\\tII\\t687251\\t-\\t+\\t1358\\t5550\"\n##  [24] \"NS500150:527:HHGYNBGXF:1:22301:8468:1586\\tII\\t160\\tII\\t26124\\t+\\t-\\t1358\\t1510\"   \n##  [25] \"NS500150:527:HHGYNBGXF:4:23606:24037:2076\\tII\\t169\\tII\\t39052\\t+\\t+\\t1358\\t1613\"\n\n\n2.3.2 ContactFile fundamentals\nA ContactFile object establishes a connection with a disk-stored Hi-C file (e.g. a .cool file, or a .pairs file, …). ContactFile classes are defined in the HiCExperiment package.\nContactFiles come in four different flavors:\n\n\nCoolFile: connection to a .(m)cool file\n\nHicFile: connection to a .hic file\n\nHicproFile: connection to output files generated by HiC-Pro\n\nPairsFile: connection to a .pairs file\n\nTo create each flavor of ContactFile, one can use the corresponding function:\n\nlibrary(HiCExperiment)\n\n# ----- This creates a connection to a `.(m)cool` file (path stored in `coolf`)\nCoolFile(coolf)\n##  CoolFile object\n##  .mcool file: /home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752 \n##  resolution: 1000 \n##  pairs file: \n##  metadata(0):\n\n# ----- This creates a connection to a `.hic` file (path stored in `hicf`)\nHicFile(hicf)\n##  HicFile object\n##  .hic file: /home/biocbuild/.cache/R/ExperimentHub/38d20278711f3c_7836 \n##  resolution: 1000 \n##  pairs file: \n##  metadata(0):\n\n# ----- This creates a connection to output files from HiC-Pro\nHicproFile(hicpromatrixf, hicproregionsf)\n##  HicproFile object\n##  HiC-Pro files:\n##    $ matrix:   /home/biocbuild/.cache/R/ExperimentHub/38d2022f710c24_7837 \n##    $ regions:  /home/biocbuild/.cache/R/ExperimentHub/38d2023a35e834_7838 \n##  resolution: 1000 \n##  pairs file: \n##  metadata(0):\n\n# ----- This creates a connection to a pairs file\nPairsFile(pairsf)\n##  PairsFile object\n##  resource: /home/biocbuild/.cache/R/ExperimentHub/38d2025bce7760_7753\n\n\n2.3.3 ContactFile slots\nSeveral “slots” (i.e. pieces of information) are attached to a ContactFile object:\n\nThe path to the disk-stored contact matrix;\nThe active resolution (by default, the finest resolution available in a multi-resolution contact matrix);\nOptionally, the path to a matching pairs file (see below);\nSome metadata.\n\nSlots of a CoolFile object can be accessed as follow:\n\ncf &lt;- CoolFile(coolf)\ncf\n##  CoolFile object\n##  .mcool file: /home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752 \n##  resolution: 1000 \n##  pairs file: \n##  metadata(0):\n\nresolution(cf)\n##  [1] 1000\n\npairsFile(cf)\n##  NULL\n\nmetadata(cf)\n##  list()\n\n\n\n\n\n\n\nWarningImportant!\n\n\n\nContactFile objects are only connections to a disk-stored HiC file. Although metadata is available, they do not contain actual data!\n\n\n\n2.3.4 ContactFile methods\nTwo useful methods are available for ContactFiles:\n\n\navailableResolutions checks which resolutions are available in a ContactFile.\n\n\navailableResolutions(cf)\n##  resolutions(5): 1000 2000 4000 8000 16000\n##  \n\n\n\navailableChromosomes checks which chromosomes are available in a ContactFile, along with their length.\n\n\navailableChromosomes(cf)\n##  Seqinfo object with 16 sequences from an unspecified genome:\n##    seqnames seqlengths isCircular genome\n##    I            230218       &lt;NA&gt;   &lt;NA&gt;\n##    II           813184       &lt;NA&gt;   &lt;NA&gt;\n##    III          316620       &lt;NA&gt;   &lt;NA&gt;\n##    IV          1531933       &lt;NA&gt;   &lt;NA&gt;\n##    V            576874       &lt;NA&gt;   &lt;NA&gt;\n##    ...             ...        ...    ...\n##    XII         1078177       &lt;NA&gt;   &lt;NA&gt;\n##    XIII         924431       &lt;NA&gt;   &lt;NA&gt;\n##    XIV          784333       &lt;NA&gt;   &lt;NA&gt;\n##    XV          1091291       &lt;NA&gt;   &lt;NA&gt;\n##    XVI          948066       &lt;NA&gt;   &lt;NA&gt;",
    "crumbs": [
      "Fundamentals concepts",
      "<span class='chapter-number'>2</span>  <span class='chapter-title'>Hi-C data structures in R</span>"
    ]
  },
  {
    "objectID": "pages/data-representation.html#hicexperiment-class",
    "href": "pages/data-representation.html#hicexperiment-class",
    "title": "\n2  Hi-C data structures in R\n",
    "section": "\n2.4 HiCExperiment class",
    "text": "2.4 HiCExperiment class\nBased on the previous sections, we have different Bioconductor classes relevant for Hi-C:\n\n\nGInteractions which can be used to represent genomic interactions in R\n\nContactFiles which can be used to establish a connection with disk-stored Hi-C files\n\nHiCExperiment objects are created when parsing a ContactFile in R. The HiCExperiment class reads a ContactFile in memory and store genomic interactions as GInteractions. The HiCExperiment class is, quite obviously, defined in the HiCExperiment package.\n\n2.4.1 Creating a HiCExperiment object\n\n2.4.1.1 Importing a ContactFile\n\nIn practice, to create a HiCExperiment object from a ContactFile, one can use the import method.\n\n\n\n\n\n\nImportantCaution\n\n\n\n\nCreating a HiCExperiment object means importing data from a Hi-C matrix (e.g.  from a ContactFile) in memory in R.\n\nCreating a HiCExperiment object from large disk-stored contact matrices can potentially take a long time.\n\n\n\n\ncf &lt;- CoolFile(coolf)\nhic &lt;- import(cf)\nhic\n##  `HiCExperiment` object with 8,757,906 contacts over 12,079 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752\" \n##  focus: \"whole genome\" \n##  resolutions(5): 1000 2000 4000 8000 16000\n##  active resolution: 1000 \n##  interactions: 2945692 \n##  scores(2): count balanced \n##  topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n##  pairsFile: N/A \n##  metadata(0):\n\nPrinting a HiCExperiment to the console will not reveal the actual data stored in the object (it would most likely crash your R session!). Instead, it gives a summary of the data stored in the object:\n\nThe fileName, i.e. the path to the disk-stored data file\nThe focus, i.e. the genomic location for which data has been imported (in the example above, \"whole genome\" implies that all the data has been imported in R)\n\nresolutions available in the disk-stored data file (this will be identical to availableResolutions(cf))\n\nactive resolution indicates at which resolution the data is currently imported\n\ninteractions refers to the actual GInteractions imported in R and “hidden” (for now!) in the HiCExperiment object\n\nscores refer to different interaction frequency estimates. These can be raw counts, balanced (if the contact matrix has been previously normalized), or whatever score the end-user want to attribute to each interaction (e.g. ratio of counts between two Hi-C maps, …)\n\ntopologicalFeatures is a list of GRanges or GInteractions objects to describe important topological features.\n\npairsFile is a pointer to an optional disk-stored .pairs file from which the contact matrix has been created. This is often useful to estimate some Hi-C metrics.\n\nmetadata is a list to further describe the experiment.\n\nThese pieces of information are called slots. They can be directly accessed using getter functions, bearing the same name than the slot.\n\nfileName(hic)\n##  [1] \"/home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752\"\n\nfocus(hic)\n##  NULL\n\nresolutions(hic)\n##  [1]  1000  2000  4000  8000 16000\n\nresolution(hic)\n##  [1] 1000\n\ninteractions(hic)\n##  GInteractions object with 2945692 interactions and 4 metadata columns:\n##              seqnames1       ranges1     seqnames2       ranges2 |   bin_id1\n##                  &lt;Rle&gt;     &lt;IRanges&gt;         &lt;Rle&gt;     &lt;IRanges&gt; | &lt;numeric&gt;\n##          [1]         I        1-1000 ---         I        1-1000 |         0\n##          [2]         I        1-1000 ---         I     1001-2000 |         0\n##          [3]         I        1-1000 ---         I     2001-3000 |         0\n##          [4]         I        1-1000 ---         I     3001-4000 |         0\n##          [5]         I        1-1000 ---         I     4001-5000 |         0\n##          ...       ...           ... ...       ...           ... .       ...\n##    [2945688]       XVI 940001-941000 ---       XVI 942001-943000 |     12070\n##    [2945689]       XVI 940001-941000 ---       XVI 943001-944000 |     12070\n##    [2945690]       XVI 941001-942000 ---       XVI 941001-942000 |     12071\n##    [2945691]       XVI 941001-942000 ---       XVI 942001-943000 |     12071\n##    [2945692]       XVI 941001-942000 ---       XVI 943001-944000 |     12071\n##                bin_id2     count  balanced\n##              &lt;numeric&gt; &lt;numeric&gt; &lt;numeric&gt;\n##          [1]         0        15 0.0663491\n##          [2]         1        21 0.1273505\n##          [3]         2        21 0.0738691\n##          [4]         3        38 0.0827051\n##          [5]         4        17 0.0591984\n##          ...       ...       ...       ...\n##    [2945688]     12072        11 0.0575550\n##    [2945689]     12073         1       NaN\n##    [2945690]     12071        74 0.0504615\n##    [2945691]     12072        39 0.1624599\n##    [2945692]     12073         1       NaN\n##    -------\n##    regions: 12079 ranges and 4 metadata columns\n##    seqinfo: 16 sequences from an unspecified genome\n\nscores(hic)\n##  List of length 2\n##  names(2): count balanced\n\ntopologicalFeatures(hic)\n##  List of length 4\n##  names(4): compartments borders loops viewpoints\n\npairsFile(hic)\n##  NULL\n\nmetadata(hic)\n##  list()\n\nimport also works for other types of ContactFile (HicFile, HicproFile, PairsFile), e.g. \n\nFor HicFile and HicproFile, import seamlessly returns a HiCExperiment as well:\n\n\nhf &lt;- HicFile(hicf)\nhic &lt;- import(hf)\nhic\n##  `HiCExperiment` object with 13,681,280 contacts over 12,165 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/ExperimentHub/38d20278711f3c_7836\" \n##  focus: \"whole genome\" \n##  resolutions(5): 1000 2000 4000 8000 16000\n##  active resolution: 1000 \n##  interactions: 2965693 \n##  scores(2): count balanced \n##  topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n##  pairsFile: N/A \n##  metadata(0):\n\n\nFor PairsFile, the returned object is a representation of Hi-C “pairs” in R, i.e. GInteractions\n\n\n\npf &lt;- PairsFile(pairsf)\npairs &lt;- import(pf)\npairs\n##  GInteractions object with 471364 interactions and 3 metadata columns:\n##             seqnames1   ranges1     seqnames2   ranges2 |     frag1     frag2\n##                 &lt;Rle&gt; &lt;IRanges&gt;         &lt;Rle&gt; &lt;IRanges&gt; | &lt;numeric&gt; &lt;numeric&gt;\n##         [1]        II       105 ---        II     48548 |      1358      1681\n##         [2]        II       113 ---        II     45003 |      1358      1658\n##         [3]        II       119 ---        II    687251 |      1358      5550\n##         [4]        II       160 ---        II     26124 |      1358      1510\n##         [5]        II       169 ---        II     39052 |      1358      1613\n##         ...       ...       ... ...       ...       ... .       ...       ...\n##    [471360]        II    808605 ---        II    809683 |      6316      6320\n##    [471361]        II    808609 ---        II    809917 |      6316      6324\n##    [471362]        II    808617 ---        II    809506 |      6316      6319\n##    [471363]        II    809447 ---        II    809685 |      6319      6321\n##    [471364]        II    809472 ---        II    809675 |      6319      6320\n##              distance\n##             &lt;integer&gt;\n##         [1]     48443\n##         [2]     44890\n##         [3]    687132\n##         [4]     25964\n##         [5]     38883\n##         ...       ...\n##    [471360]      1078\n##    [471361]      1308\n##    [471362]       889\n##    [471363]       238\n##    [471364]       203\n##    -------\n##    regions: 549331 ranges and 0 metadata columns\n##    seqinfo: 17 sequences from an unspecified genome\n\n\n2.4.1.2 Customizing the import\n\nTo reduce the import to only parse the data that is relevant to the study, two arguments can be passed to import, along with a ContactFile.\n\n\n\n\n\n\nWarningKey import arguments:\n\n\n\n\n\nfocus: This can be used to only parse data for a specific genomic location.\n\nresolution: This can be used to choose which resolution to parse the contact matrix at (this is ignored if the ContactFile is not multi-resolution, e.g. .cool or HiC-Pro generated matrices)\n\n\n\n\nImport interactions within a single chromosome:\n\n\nhic &lt;- import(cf, focus = 'II', resolution = 2000)\n\nregions(hic) # ---- `regions()` work on `HiCExperiment` the same way than on `GInteractions`\n##  GRanges object with 407 ranges and 4 metadata columns:\n##                     seqnames        ranges strand |    bin_id    weight   chr\n##                        &lt;Rle&gt;     &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt; &lt;numeric&gt; &lt;Rle&gt;\n##           II_1_2000       II        1-2000      * |       116       NaN    II\n##        II_2001_4000       II     2001-4000      * |       117       NaN    II\n##        II_4001_6000       II     4001-6000      * |       118       NaN    II\n##        II_6001_8000       II     6001-8000      * |       119       NaN    II\n##       II_8001_10000       II    8001-10000      * |       120 0.0461112    II\n##                 ...      ...           ...    ... .       ...       ...   ...\n##    II_804001_806000       II 804001-806000      * |       518 0.0493107    II\n##    II_806001_808000       II 806001-808000      * |       519 0.0611355    II\n##    II_808001_810000       II 808001-810000      * |       520       NaN    II\n##    II_810001_812000       II 810001-812000      * |       521       NaN    II\n##    II_812001_813184       II 812001-813184      * |       522       NaN    II\n##                        center\n##                     &lt;integer&gt;\n##           II_1_2000      1000\n##        II_2001_4000      3000\n##        II_4001_6000      5000\n##        II_6001_8000      7000\n##       II_8001_10000      9000\n##                 ...       ...\n##    II_804001_806000    805000\n##    II_806001_808000    807000\n##    II_808001_810000    809000\n##    II_810001_812000    811000\n##    II_812001_813184    812592\n##    -------\n##    seqinfo: 16 sequences from an unspecified genome\n\ntable(seqnames(regions(hic)))\n##  \n##     I   II  III   IV    V   VI  VII VIII   IX    X   XI  XII XIII  XIV   XV \n##     0  407    0    0    0    0    0    0    0    0    0    0    0    0    0 \n##   XVI \n##     0\n\nanchors(hic) # ---- `anchors()` work on `HiCExperiment` the same way than on `GInteractions`\n##  $first\n##  GRanges object with 34063 ranges and 4 metadata columns:\n##            seqnames        ranges strand |    bin_id    weight   chr\n##               &lt;Rle&gt;     &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt; &lt;numeric&gt; &lt;Rle&gt;\n##        [1]       II        1-2000      * |       116       NaN    II\n##        [2]       II        1-2000      * |       116       NaN    II\n##        [3]       II        1-2000      * |       116       NaN    II\n##        [4]       II        1-2000      * |       116       NaN    II\n##        [5]       II        1-2000      * |       116       NaN    II\n##        ...      ...           ...    ... .       ...       ...   ...\n##    [34059]       II 804001-806000      * |       518 0.0493107    II\n##    [34060]       II 806001-808000      * |       519 0.0611355    II\n##    [34061]       II 806001-808000      * |       519 0.0611355    II\n##    [34062]       II 806001-808000      * |       519 0.0611355    II\n##    [34063]       II 808001-810000      * |       520       NaN    II\n##               center\n##            &lt;integer&gt;\n##        [1]      1000\n##        [2]      1000\n##        [3]      1000\n##        [4]      1000\n##        [5]      1000\n##        ...       ...\n##    [34059]    805000\n##    [34060]    807000\n##    [34061]    807000\n##    [34062]    807000\n##    [34063]    809000\n##    -------\n##    seqinfo: 16 sequences from an unspecified genome\n##  \n##  $second\n##  GRanges object with 34063 ranges and 4 metadata columns:\n##            seqnames        ranges strand |    bin_id    weight   chr\n##               &lt;Rle&gt;     &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt; &lt;numeric&gt; &lt;Rle&gt;\n##        [1]       II        1-2000      * |       116       NaN    II\n##        [2]       II     4001-6000      * |       118       NaN    II\n##        [3]       II     6001-8000      * |       119       NaN    II\n##        [4]       II    8001-10000      * |       120 0.0461112    II\n##        [5]       II   10001-12000      * |       121 0.0334807    II\n##        ...      ...           ...    ... .       ...       ...   ...\n##    [34059]       II 810001-812000      * |       521       NaN    II\n##    [34060]       II 806001-808000      * |       519 0.0611355    II\n##    [34061]       II 808001-810000      * |       520       NaN    II\n##    [34062]       II 810001-812000      * |       521       NaN    II\n##    [34063]       II 808001-810000      * |       520       NaN    II\n##               center\n##            &lt;integer&gt;\n##        [1]      1000\n##        [2]      5000\n##        [3]      7000\n##        [4]      9000\n##        [5]     11000\n##        ...       ...\n##    [34059]    811000\n##    [34060]    807000\n##    [34061]    809000\n##    [34062]    811000\n##    [34063]    809000\n##    -------\n##    seqinfo: 16 sequences from an unspecified genome\n\n\nImport interactions within a segment of a chromosome:\n\n\nhic &lt;- import(cf, focus = 'II:40000-60000', resolution = 1000)\n\nregions(hic) \n##  GRanges object with 21 ranges and 4 metadata columns:\n##                   seqnames      ranges strand |    bin_id    weight   chr\n##                      &lt;Rle&gt;   &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt; &lt;numeric&gt; &lt;Rle&gt;\n##    II_39001_40000       II 39001-40000      * |       270 0.0220798    II\n##    II_40001_41000       II 40001-41000      * |       271 0.0246775    II\n##    II_41001_42000       II 41001-42000      * |       272 0.0269232    II\n##    II_42001_43000       II 42001-43000      * |       273 0.0341849    II\n##    II_43001_44000       II 43001-44000      * |       274 0.0265386    II\n##               ...      ...         ...    ... .       ...       ...   ...\n##    II_55001_56000       II 55001-56000      * |       286 0.0213532    II\n##    II_56001_57000       II 56001-57000      * |       287 0.0569839    II\n##    II_57001_58000       II 57001-58000      * |       288 0.0338612    II\n##    II_58001_59000       II 58001-59000      * |       289 0.0294531    II\n##    II_59001_60000       II 59001-60000      * |       290 0.0306662    II\n##                      center\n##                   &lt;integer&gt;\n##    II_39001_40000     39500\n##    II_40001_41000     40500\n##    II_41001_42000     41500\n##    II_42001_43000     42500\n##    II_43001_44000     43500\n##               ...       ...\n##    II_55001_56000     55500\n##    II_56001_57000     56500\n##    II_57001_58000     57500\n##    II_58001_59000     58500\n##    II_59001_60000     59500\n##    -------\n##    seqinfo: 16 sequences from an unspecified genome\n\nanchors(hic)\n##  $first\n##  GRanges object with 210 ranges and 4 metadata columns:\n##          seqnames      ranges strand |    bin_id    weight   chr    center\n##             &lt;Rle&gt;   &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt; &lt;numeric&gt; &lt;Rle&gt; &lt;integer&gt;\n##      [1]       II 40001-41000      * |       271 0.0246775    II     40500\n##      [2]       II 40001-41000      * |       271 0.0246775    II     40500\n##      [3]       II 40001-41000      * |       271 0.0246775    II     40500\n##      [4]       II 40001-41000      * |       271 0.0246775    II     40500\n##      [5]       II 40001-41000      * |       271 0.0246775    II     40500\n##      ...      ...         ...    ... .       ...       ...   ...       ...\n##    [206]       II 57001-58000      * |       288 0.0338612    II     57500\n##    [207]       II 57001-58000      * |       288 0.0338612    II     57500\n##    [208]       II 58001-59000      * |       289 0.0294531    II     58500\n##    [209]       II 58001-59000      * |       289 0.0294531    II     58500\n##    [210]       II 59001-60000      * |       290 0.0306662    II     59500\n##    -------\n##    seqinfo: 16 sequences from an unspecified genome\n##  \n##  $second\n##  GRanges object with 210 ranges and 4 metadata columns:\n##          seqnames      ranges strand |    bin_id    weight   chr    center\n##             &lt;Rle&gt;   &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt; &lt;numeric&gt; &lt;Rle&gt; &lt;integer&gt;\n##      [1]       II 40001-41000      * |       271 0.0246775    II     40500\n##      [2]       II 41001-42000      * |       272 0.0269232    II     41500\n##      [3]       II 42001-43000      * |       273 0.0341849    II     42500\n##      [4]       II 43001-44000      * |       274 0.0265386    II     43500\n##      [5]       II 44001-45000      * |       275 0.0488968    II     44500\n##      ...      ...         ...    ... .       ...       ...   ...       ...\n##    [206]       II 58001-59000      * |       289 0.0294531    II     58500\n##    [207]       II 59001-60000      * |       290 0.0306662    II     59500\n##    [208]       II 58001-59000      * |       289 0.0294531    II     58500\n##    [209]       II 59001-60000      * |       290 0.0306662    II     59500\n##    [210]       II 59001-60000      * |       290 0.0306662    II     59500\n##    -------\n##    seqinfo: 16 sequences from an unspecified genome\n\n\nImport interactions between two chromosomes:\n\n\nhic2 &lt;- import(cf, focus = 'II|XV', resolution = 4000)\n\nregions(hic2)\n##  GRanges object with 477 ranges and 4 metadata columns:\n##                       seqnames          ranges strand |    bin_id    weight\n##                          &lt;Rle&gt;       &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt; &lt;numeric&gt;\n##             II_1_4000       II          1-4000      * |        58       NaN\n##          II_4001_8000       II       4001-8000      * |        59       NaN\n##         II_8001_12000       II      8001-12000      * |        60 0.0274474\n##        II_12001_16000       II     12001-16000      * |        61 0.0342116\n##        II_16001_20000       II     16001-20000      * |        62 0.0195128\n##                   ...      ...             ...    ... .       ...       ...\n##    XV_1072001_1076000       XV 1072001-1076000      * |      2783  0.041763\n##    XV_1076001_1080000       XV 1076001-1080000      * |      2784       NaN\n##    XV_1080001_1084000       XV 1080001-1084000      * |      2785       NaN\n##    XV_1084001_1088000       XV 1084001-1088000      * |      2786       NaN\n##    XV_1088001_1091291       XV 1088001-1091291      * |      2787       NaN\n##                         chr    center\n##                       &lt;Rle&gt; &lt;integer&gt;\n##             II_1_4000    II      2000\n##          II_4001_8000    II      6000\n##         II_8001_12000    II     10000\n##        II_12001_16000    II     14000\n##        II_16001_20000    II     18000\n##                   ...   ...       ...\n##    XV_1072001_1076000    XV   1074000\n##    XV_1076001_1080000    XV   1078000\n##    XV_1080001_1084000    XV   1082000\n##    XV_1084001_1088000    XV   1086000\n##    XV_1088001_1091291    XV   1089646\n##    -------\n##    seqinfo: 16 sequences from an unspecified genome\n\nanchors(hic2)\n##  $first\n##  GRanges object with 18032 ranges and 4 metadata columns:\n##            seqnames        ranges strand |    bin_id    weight   chr\n##               &lt;Rle&gt;     &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt; &lt;numeric&gt; &lt;Rle&gt;\n##        [1]       II        1-4000      * |        58       NaN    II\n##        [2]       II        1-4000      * |        58       NaN    II\n##        [3]       II        1-4000      * |        58       NaN    II\n##        [4]       II        1-4000      * |        58       NaN    II\n##        [5]       II        1-4000      * |        58       NaN    II\n##        ...      ...           ...    ... .       ...       ...   ...\n##    [18028]       II 808001-812000      * |       260       NaN    II\n##    [18029]       II 808001-812000      * |       260       NaN    II\n##    [18030]       II 808001-812000      * |       260       NaN    II\n##    [18031]       II 808001-812000      * |       260       NaN    II\n##    [18032]       II 808001-812000      * |       260       NaN    II\n##               center\n##            &lt;integer&gt;\n##        [1]      2000\n##        [2]      2000\n##        [3]      2000\n##        [4]      2000\n##        [5]      2000\n##        ...       ...\n##    [18028]    810000\n##    [18029]    810000\n##    [18030]    810000\n##    [18031]    810000\n##    [18032]    810000\n##    -------\n##    seqinfo: 16 sequences from an unspecified genome\n##  \n##  $second\n##  GRanges object with 18032 ranges and 4 metadata columns:\n##            seqnames          ranges strand |    bin_id    weight   chr\n##               &lt;Rle&gt;       &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt; &lt;numeric&gt; &lt;Rle&gt;\n##        [1]       XV     48001-52000      * |      2527 0.0185354    XV\n##        [2]       XV   348001-352000      * |      2602 0.0233750    XV\n##        [3]       XV   468001-472000      * |      2632 0.0153615    XV\n##        [4]       XV   472001-476000      * |      2633 0.0189624    XV\n##        [5]       XV   584001-588000      * |      2661 0.0167715    XV\n##        ...      ...             ...    ... .       ...       ...   ...\n##    [18028]       XV   980001-984000      * |      2760 0.0187827    XV\n##    [18029]       XV   984001-988000      * |      2761 0.0250094    XV\n##    [18030]       XV   992001-996000      * |      2763 0.0185599    XV\n##    [18031]       XV 1004001-1008000      * |      2766 0.0196942    XV\n##    [18032]       XV 1064001-1068000      * |      2781 0.0208220    XV\n##               center\n##            &lt;integer&gt;\n##        [1]     50000\n##        [2]    350000\n##        [3]    470000\n##        [4]    474000\n##        [5]    586000\n##        ...       ...\n##    [18028]    982000\n##    [18029]    986000\n##    [18030]    994000\n##    [18031]   1006000\n##    [18032]   1066000\n##    -------\n##    seqinfo: 16 sequences from an unspecified genome\n\n\nImport interactions between segments of two chromosomes:\n\n\nhic3 &lt;- import(cf, focus = 'III:10000-40000|XV:10000-40000', resolution = 2000)\n\nregions(hic3)\n##  GRanges object with 32 ranges and 4 metadata columns:\n##                    seqnames      ranges strand |    bin_id    weight   chr\n##                       &lt;Rle&gt;   &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt; &lt;numeric&gt; &lt;Rle&gt;\n##     III_8001_10000      III  8001-10000      * |       527       NaN   III\n##    III_10001_12000      III 10001-12000      * |       528       NaN   III\n##    III_12001_14000      III 12001-14000      * |       529       NaN   III\n##    III_14001_16000      III 14001-16000      * |       530 0.0356351   III\n##    III_16001_18000      III 16001-18000      * |       531 0.0230693   III\n##                ...      ...         ...    ... .       ...       ...   ...\n##     XV_30001_32000       XV 30001-32000      * |      5039 0.0482465    XV\n##     XV_32001_34000       XV 32001-34000      * |      5040 0.0241580    XV\n##     XV_34001_36000       XV 34001-36000      * |      5041 0.0273166    XV\n##     XV_36001_38000       XV 36001-38000      * |      5042 0.0542235    XV\n##     XV_38001_40000       XV 38001-40000      * |      5043 0.0206849    XV\n##                       center\n##                    &lt;integer&gt;\n##     III_8001_10000      9000\n##    III_10001_12000     11000\n##    III_12001_14000     13000\n##    III_14001_16000     15000\n##    III_16001_18000     17000\n##                ...       ...\n##     XV_30001_32000     31000\n##     XV_32001_34000     33000\n##     XV_34001_36000     35000\n##     XV_36001_38000     37000\n##     XV_38001_40000     39000\n##    -------\n##    seqinfo: 16 sequences from an unspecified genome\n\nanchors(hic3)\n##  $first\n##  GRanges object with 11 ranges and 4 metadata columns:\n##         seqnames      ranges strand |    bin_id    weight   chr    center\n##            &lt;Rle&gt;   &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt; &lt;numeric&gt; &lt;Rle&gt; &lt;integer&gt;\n##     [1]      III 14001-16000      * |       530 0.0356351   III     15000\n##     [2]      III 16001-18000      * |       531 0.0230693   III     17000\n##     [3]      III 16001-18000      * |       531 0.0230693   III     17000\n##     [4]      III 20001-22000      * |       533 0.0343250   III     21000\n##     [5]      III 22001-24000      * |       534 0.0258604   III     23000\n##     [6]      III 24001-26000      * |       535 0.0290757   III     25000\n##     [7]      III 28001-30000      * |       537 0.0290713   III     29000\n##     [8]      III 30001-32000      * |       538 0.0266373   III     31000\n##     [9]      III 32001-34000      * |       539 0.0201137   III     33000\n##    [10]      III 32001-34000      * |       539 0.0201137   III     33000\n##    [11]      III 36001-38000      * |       541 0.0220603   III     37000\n##    -------\n##    seqinfo: 16 sequences from an unspecified genome\n##  \n##  $second\n##  GRanges object with 11 ranges and 4 metadata columns:\n##         seqnames      ranges strand |    bin_id    weight   chr    center\n##            &lt;Rle&gt;   &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt; &lt;numeric&gt; &lt;Rle&gt; &lt;integer&gt;\n##     [1]       XV 16001-18000      * |      5032 0.0187250    XV     17000\n##     [2]       XV 16001-18000      * |      5032 0.0187250    XV     17000\n##     [3]       XV 20001-22000      * |      5034 0.0247973    XV     21000\n##     [4]       XV 14001-16000      * |      5031 0.0379727    XV     15000\n##     [5]       XV 10001-12000      * |      5029 0.0296913    XV     11000\n##     [6]       XV 32001-34000      * |      5040 0.0241580    XV     33000\n##     [7]       XV 16001-18000      * |      5032 0.0187250    XV     17000\n##     [8]       XV 38001-40000      * |      5043 0.0206849    XV     39000\n##     [9]       XV 22001-24000      * |      5035 0.0613856    XV     23000\n##    [10]       XV 30001-32000      * |      5039 0.0482465    XV     31000\n##    [11]       XV 10001-12000      * |      5029 0.0296913    XV     11000\n##    -------\n##    seqinfo: 16 sequences from an unspecified genome\n\n\n2.4.2 Interacting with HiCExperiment data\n\nAn HiCExperiment object allows parsing of a disk-stored contact matrix.\nAn HiCExperiment object operates by wrapping together (1) a ContactFile (i.e. a connection to a disk-stored data file) and (2) a GInteractions generated by parsing the data file.\n\nWe will use the yeast_hic HiCExperiment object to demonstrate how to parse information from a HiCExperiment object.\n\nyeast_hic &lt;- contacts_yeast(full = TRUE)\n##  see ?HiContactsData and browseVignettes('HiContactsData') for documentation\n##  loading from cache\n\n\nyeast_hic\n##  `HiCExperiment` object with 8,757,906 contacts over 763 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752\" \n##  focus: \"whole genome\" \n##  resolutions(5): 1000 2000 4000 8000 16000\n##  active resolution: 16000 \n##  interactions: 267709 \n##  scores(2): count balanced \n##  topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) centromeres(16) \n##  pairsFile: N/A \n##  metadata(0):\n\n\n2.4.2.1 Interactions\nThe imported genomic interactions can be directly exposed using the interactions function and are returned as a GInteractions object.\n\ninteractions(yeast_hic)\n##  GInteractions object with 267709 interactions and 4 metadata columns:\n##             seqnames1       ranges1     seqnames2       ranges2 |   bin_id1\n##                 &lt;Rle&gt;     &lt;IRanges&gt;         &lt;Rle&gt;     &lt;IRanges&gt; | &lt;numeric&gt;\n##         [1]         I       1-16000 ---         I       1-16000 |         0\n##         [2]         I       1-16000 ---         I   16001-32000 |         0\n##         [3]         I       1-16000 ---         I   32001-48000 |         0\n##         [4]         I       1-16000 ---         I   48001-64000 |         0\n##         [5]         I       1-16000 ---         I   64001-80000 |         0\n##         ...       ...           ... ...       ...           ... .       ...\n##    [267705]       XVI 896001-912000 ---       XVI 912001-928000 |       759\n##    [267706]       XVI 896001-912000 ---       XVI 928001-944000 |       759\n##    [267707]       XVI 912001-928000 ---       XVI 912001-928000 |       760\n##    [267708]       XVI 912001-928000 ---       XVI 928001-944000 |       760\n##    [267709]       XVI 928001-944000 ---       XVI 928001-944000 |       761\n##               bin_id2     count  balanced\n##             &lt;numeric&gt; &lt;numeric&gt; &lt;numeric&gt;\n##         [1]         0      2836 1.0943959\n##         [2]         1      2212 0.9592069\n##         [3]         2      1183 0.4385242\n##         [4]         3       831 0.2231192\n##         [5]         4       310 0.0821255\n##         ...       ...       ...       ...\n##    [267705]       760      3565  1.236371\n##    [267706]       761      1359  0.385016\n##    [267707]       760      3534  2.103988\n##    [267708]       761      3055  1.485794\n##    [267709]       761      4308  1.711565\n##    -------\n##    regions: 763 ranges and 4 metadata columns\n##    seqinfo: 16 sequences from an unspecified genome\n\nBecause genomic interactions are actually stored as GInteractions, regions and anchors work on HiCExperiment objects just as they work with GInteractions!\n\nregions(yeast_hic)\n##  GRanges object with 763 ranges and 4 metadata columns:\n##                      seqnames        ranges strand |    bin_id     weight\n##                         &lt;Rle&gt;     &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt;  &lt;numeric&gt;\n##            I_1_16000        I       1-16000      * |         0  0.0196442\n##        I_16001_32000        I   16001-32000      * |         1  0.0220746\n##        I_32001_48000        I   32001-48000      * |         2  0.0188701\n##        I_48001_64000        I   48001-64000      * |         3  0.0136679\n##        I_64001_80000        I   64001-80000      * |         4  0.0134860\n##                  ...      ...           ...    ... .       ...        ...\n##    XVI_880001_896000      XVI 880001-896000      * |       758 0.00910873\n##    XVI_896001_912000      XVI 896001-912000      * |       759 0.01421350\n##    XVI_912001_928000      XVI 912001-928000      * |       760 0.02439992\n##    XVI_928001_944000      XVI 928001-944000      * |       761 0.01993237\n##    XVI_944001_948066      XVI 944001-948066      * |       762        NaN\n##                        chr    center\n##                      &lt;Rle&gt; &lt;integer&gt;\n##            I_1_16000     I      8000\n##        I_16001_32000     I     24000\n##        I_32001_48000     I     40000\n##        I_48001_64000     I     56000\n##        I_64001_80000     I     72000\n##                  ...   ...       ...\n##    XVI_880001_896000   XVI    888000\n##    XVI_896001_912000   XVI    904000\n##    XVI_912001_928000   XVI    920000\n##    XVI_928001_944000   XVI    936000\n##    XVI_944001_948066   XVI    946033\n##    -------\n##    seqinfo: 16 sequences from an unspecified genome\n\nanchors(yeast_hic)\n##  $first\n##  GRanges object with 267709 ranges and 4 metadata columns:\n##             seqnames        ranges strand |    bin_id    weight   chr\n##                &lt;Rle&gt;     &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt; &lt;numeric&gt; &lt;Rle&gt;\n##         [1]        I       1-16000      * |         0 0.0196442     I\n##         [2]        I       1-16000      * |         0 0.0196442     I\n##         [3]        I       1-16000      * |         0 0.0196442     I\n##         [4]        I       1-16000      * |         0 0.0196442     I\n##         [5]        I       1-16000      * |         0 0.0196442     I\n##         ...      ...           ...    ... .       ...       ...   ...\n##    [267705]      XVI 896001-912000      * |       759 0.0142135   XVI\n##    [267706]      XVI 896001-912000      * |       759 0.0142135   XVI\n##    [267707]      XVI 912001-928000      * |       760 0.0243999   XVI\n##    [267708]      XVI 912001-928000      * |       760 0.0243999   XVI\n##    [267709]      XVI 928001-944000      * |       761 0.0199324   XVI\n##                center\n##             &lt;integer&gt;\n##         [1]      8000\n##         [2]      8000\n##         [3]      8000\n##         [4]      8000\n##         [5]      8000\n##         ...       ...\n##    [267705]    904000\n##    [267706]    904000\n##    [267707]    920000\n##    [267708]    920000\n##    [267709]    936000\n##    -------\n##    seqinfo: 16 sequences from an unspecified genome\n##  \n##  $second\n##  GRanges object with 267709 ranges and 4 metadata columns:\n##             seqnames        ranges strand |    bin_id    weight   chr\n##                &lt;Rle&gt;     &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt; &lt;numeric&gt; &lt;Rle&gt;\n##         [1]        I       1-16000      * |         0 0.0196442     I\n##         [2]        I   16001-32000      * |         1 0.0220746     I\n##         [3]        I   32001-48000      * |         2 0.0188701     I\n##         [4]        I   48001-64000      * |         3 0.0136679     I\n##         [5]        I   64001-80000      * |         4 0.0134860     I\n##         ...      ...           ...    ... .       ...       ...   ...\n##    [267705]      XVI 912001-928000      * |       760 0.0243999   XVI\n##    [267706]      XVI 928001-944000      * |       761 0.0199324   XVI\n##    [267707]      XVI 912001-928000      * |       760 0.0243999   XVI\n##    [267708]      XVI 928001-944000      * |       761 0.0199324   XVI\n##    [267709]      XVI 928001-944000      * |       761 0.0199324   XVI\n##                center\n##             &lt;integer&gt;\n##         [1]      8000\n##         [2]     24000\n##         [3]     40000\n##         [4]     56000\n##         [5]     72000\n##         ...       ...\n##    [267705]    920000\n##    [267706]    936000\n##    [267707]    920000\n##    [267708]    936000\n##    [267709]    936000\n##    -------\n##    seqinfo: 16 sequences from an unspecified genome\n\n\n2.4.2.2 Bins and seqinfo\nAdditional useful information can be recovered from a HiCExperiment object. This includes:\n\nThe seqinfo of the HiCExperiment:\n\n\nseqinfo(yeast_hic)\n##  Seqinfo object with 16 sequences from an unspecified genome:\n##    seqnames seqlengths isCircular genome\n##    I            230218       &lt;NA&gt;   &lt;NA&gt;\n##    II           813184       &lt;NA&gt;   &lt;NA&gt;\n##    III          316620       &lt;NA&gt;   &lt;NA&gt;\n##    IV          1531933       &lt;NA&gt;   &lt;NA&gt;\n##    V            576874       &lt;NA&gt;   &lt;NA&gt;\n##    ...             ...        ...    ...\n##    XII         1078177       &lt;NA&gt;   &lt;NA&gt;\n##    XIII         924431       &lt;NA&gt;   &lt;NA&gt;\n##    XIV          784333       &lt;NA&gt;   &lt;NA&gt;\n##    XV          1091291       &lt;NA&gt;   &lt;NA&gt;\n##    XVI          948066       &lt;NA&gt;   &lt;NA&gt;\n\nThis lists the different chromosomes available to parse along with their length.\n\nThe bins of the HiCExperiment:\n\n\nbins(yeast_hic)\n##  GRanges object with 763 ranges and 2 metadata columns:\n##                      seqnames        ranges strand |    bin_id     weight\n##                         &lt;Rle&gt;     &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt;  &lt;numeric&gt;\n##            I_1_16000        I       1-16000      * |         0  0.0196442\n##        I_16001_32000        I   16001-32000      * |         1  0.0220746\n##        I_32001_48000        I   32001-48000      * |         2  0.0188701\n##        I_48001_64000        I   48001-64000      * |         3  0.0136679\n##        I_64001_80000        I   64001-80000      * |         4  0.0134860\n##                  ...      ...           ...    ... .       ...        ...\n##    XVI_880001_896000      XVI 880001-896000      * |       758 0.00910873\n##    XVI_896001_912000      XVI 896001-912000      * |       759 0.01421350\n##    XVI_912001_928000      XVI 912001-928000      * |       760 0.02439992\n##    XVI_928001_944000      XVI 928001-944000      * |       761 0.01993237\n##    XVI_944001_948066      XVI 944001-948066      * |       762        NaN\n##    -------\n##    seqinfo: 16 sequences from an unspecified genome\n\n\n\n\n\n\n\nWarningDifference between bins and regions\n\n\n\nbins are not equivalent to regions of an HiCExperiment.\n\n\nbins refer to all the possible regions of a HiCExperiment. For instance, for a HiCExperiment with a total genome size of 1,000,000 and a resolution of 2000, bins will always return a GRanges object with 500 ranges.\n\nregions, on the opposite, refer to the union of anchors of all the interactions imported in a HiCExperiment object.\n\nThus, all the regions will necessarily be a subset of the HiCExperiment bins, or equal to bins if no focus has been specified when importing a ContactFile.\n\n\n\n2.4.2.3 Scores\nOf course, what the end-user would be looking for is the frequency for each genomic interaction. Such frequency scores are available using the scores function. scores returns a list with a number of different types of scores.\n\nhead(scores(yeast_hic))\n##  List of length 2\n##  names(2): count balanced\n\nhead(scores(yeast_hic, \"count\"))\n##  [1] 2836 2212 1183  831  310  159\n\nhead(scores(yeast_hic, \"balanced\"))\n##  [1] 1.09439586 0.95920688 0.43852417 0.22311917 0.08212549 0.03345221\n\nCalling interactions(hic) returns a GInteractions with scores already stored in extra columns. This short-hand allows one to dynamically check scores directly from the interactions output.\n\ninteractions(yeast_hic)\n##  GInteractions object with 267709 interactions and 4 metadata columns:\n##             seqnames1       ranges1     seqnames2       ranges2 |   bin_id1\n##                 &lt;Rle&gt;     &lt;IRanges&gt;         &lt;Rle&gt;     &lt;IRanges&gt; | &lt;numeric&gt;\n##         [1]         I       1-16000 ---         I       1-16000 |         0\n##         [2]         I       1-16000 ---         I   16001-32000 |         0\n##         [3]         I       1-16000 ---         I   32001-48000 |         0\n##         [4]         I       1-16000 ---         I   48001-64000 |         0\n##         [5]         I       1-16000 ---         I   64001-80000 |         0\n##         ...       ...           ... ...       ...           ... .       ...\n##    [267705]       XVI 896001-912000 ---       XVI 912001-928000 |       759\n##    [267706]       XVI 896001-912000 ---       XVI 928001-944000 |       759\n##    [267707]       XVI 912001-928000 ---       XVI 912001-928000 |       760\n##    [267708]       XVI 912001-928000 ---       XVI 928001-944000 |       760\n##    [267709]       XVI 928001-944000 ---       XVI 928001-944000 |       761\n##               bin_id2     count  balanced\n##             &lt;numeric&gt; &lt;numeric&gt; &lt;numeric&gt;\n##         [1]         0      2836 1.0943959\n##         [2]         1      2212 0.9592069\n##         [3]         2      1183 0.4385242\n##         [4]         3       831 0.2231192\n##         [5]         4       310 0.0821255\n##         ...       ...       ...       ...\n##    [267705]       760      3565  1.236371\n##    [267706]       761      1359  0.385016\n##    [267707]       760      3534  2.103988\n##    [267708]       761      3055  1.485794\n##    [267709]       761      4308  1.711565\n##    -------\n##    regions: 763 ranges and 4 metadata columns\n##    seqinfo: 16 sequences from an unspecified genome\n\nhead(interactions(yeast_hic)$count)\n##  [1] 2836 2212 1183  831  310  159\n\n\n2.4.2.4 topologicalFeatures\nIn Hi-C studies, “topological features” refer to genomic structures identified (usually from a Hi-C map, but not necessarily). For instance, one may want to study known structural loops anchored at CTCF sites, or interactions around or over centromeres, or simply specific genomic “viewpoints”.\nHiCExperiment objects can store topologicalFeatures to facilitate this analysis. By default, four empty topologicalFeatures are stored in a list:\n\ncompartments\nborders\nloops\nviewpoints\n\nAdditional topologicalFeatures can be added to this list (read next chapter for more detail).\n\ntopologicalFeatures(yeast_hic)\n##  List of length 5\n##  names(5): compartments borders loops viewpoints centromeres\n\ntopologicalFeatures(yeast_hic, 'centromeres')\n##  GRanges object with 16 ranges and 0 metadata columns:\n##         seqnames        ranges strand\n##            &lt;Rle&gt;     &lt;IRanges&gt;  &lt;Rle&gt;\n##     [1]        I 151583-151641      +\n##     [2]       II 238361-238419      +\n##     [3]      III 114322-114380      +\n##     [4]       IV 449879-449937      +\n##     [5]        V 152522-152580      +\n##     ...      ...           ...    ...\n##    [12]      XII 151366-151424      +\n##    [13]     XIII 268222-268280      +\n##    [14]      XIV 628588-628646      +\n##    [15]       XV 326897-326955      +\n##    [16]      XVI 556255-556313      +\n##    -------\n##    seqinfo: 17 sequences (1 circular) from R64-1-1 genome\n\n\n2.4.2.5 pairsFile\nAs a contact matrix is typically obtained from binning a .pairs file, it is often the case that the matching .pairs file is available to then end-user. A PairsFile can thus be created and associated to the corresponding HiCExperiment object. This allows more accurate estimation of contact distribution, e.g. when calculating distance-dependent genomic interaction frequency.\n\npairsFile(yeast_hic) &lt;- pairsf\n\npairsFile(yeast_hic)\n##                                                        EH7703 \n##  \"/home/biocbuild/.cache/R/ExperimentHub/38d2025bce7760_7753\"\n\nreadLines(pairsFile(yeast_hic), 25)\n##   [1] \"## pairs format v1.0\"                                                             \n##   [2] \"#sorted: chr1-pos1-chr2-pos2\"                                                     \n##   [3] \"#columns: readID chr1 pos1 chr2 pos2 strand1 strand2 frag1 frag2\"                 \n##   [4] \"#chromsize: I 230218\"                                                             \n##   [5] \"#chromsize: II 813184\"                                                            \n##   [6] \"#chromsize: III 316620\"                                                           \n##   [7] \"#chromsize: IV 1531933\"                                                           \n##   [8] \"#chromsize: V 576874\"                                                             \n##   [9] \"#chromsize: VI 270161\"                                                            \n##  [10] \"#chromsize: VII 1090940\"                                                          \n##  [11] \"#chromsize: VIII 562643\"                                                          \n##  [12] \"#chromsize: IX 439888\"                                                            \n##  [13] \"#chromsize: X 745751\"                                                             \n##  [14] \"#chromsize: XI 666816\"                                                            \n##  [15] \"#chromsize: XII 1078177\"                                                          \n##  [16] \"#chromsize: XIII 924431\"                                                          \n##  [17] \"#chromsize: XIV 784333\"                                                           \n##  [18] \"#chromsize: XV 1091291\"                                                           \n##  [19] \"#chromsize: XVI 948066\"                                                           \n##  [20] \"#chromsize: Mito 85779\"                                                           \n##  [21] \"NS500150:527:HHGYNBGXF:3:21611:19085:3986\\tII\\t105\\tII\\t48548\\t+\\t-\\t1358\\t1681\"  \n##  [22] \"NS500150:527:HHGYNBGXF:4:13604:19734:2406\\tII\\t113\\tII\\t45003\\t-\\t+\\t1358\\t1658\"  \n##  [23] \"NS500150:527:HHGYNBGXF:2:11108:25178:11036\\tII\\t119\\tII\\t687251\\t-\\t+\\t1358\\t5550\"\n##  [24] \"NS500150:527:HHGYNBGXF:1:22301:8468:1586\\tII\\t160\\tII\\t26124\\t+\\t-\\t1358\\t1510\"   \n##  [25] \"NS500150:527:HHGYNBGXF:4:23606:24037:2076\\tII\\t169\\tII\\t39052\\t+\\t+\\t1358\\t1613\"\n\n\n2.4.2.6 Importing a PairsFile\n\nThe .pairs file linked to a HiCExperiment object can itself be imported in a GInteractions object:\n\nimport(pairsFile(yeast_hic), format = 'pairs')\n##  GInteractions object with 471364 interactions and 3 metadata columns:\n##             seqnames1   ranges1     seqnames2   ranges2 |     frag1     frag2\n##                 &lt;Rle&gt; &lt;IRanges&gt;         &lt;Rle&gt; &lt;IRanges&gt; | &lt;numeric&gt; &lt;numeric&gt;\n##         [1]        II       105 ---        II     48548 |      1358      1681\n##         [2]        II       113 ---        II     45003 |      1358      1658\n##         [3]        II       119 ---        II    687251 |      1358      5550\n##         [4]        II       160 ---        II     26124 |      1358      1510\n##         [5]        II       169 ---        II     39052 |      1358      1613\n##         ...       ...       ... ...       ...       ... .       ...       ...\n##    [471360]        II    808605 ---        II    809683 |      6316      6320\n##    [471361]        II    808609 ---        II    809917 |      6316      6324\n##    [471362]        II    808617 ---        II    809506 |      6316      6319\n##    [471363]        II    809447 ---        II    809685 |      6319      6321\n##    [471364]        II    809472 ---        II    809675 |      6319      6320\n##              distance\n##             &lt;integer&gt;\n##         [1]     48443\n##         [2]     44890\n##         [3]    687132\n##         [4]     25964\n##         [5]     38883\n##         ...       ...\n##    [471360]      1078\n##    [471361]      1308\n##    [471362]       889\n##    [471363]       238\n##    [471364]       203\n##    -------\n##    regions: 549331 ranges and 0 metadata columns\n##    seqinfo: 17 sequences from an unspecified genome\n\nNote that these GInteractions are not binned, contrary to interactions extracted from a HiCExperiment. Anchors of the interactions listed in the GInteractions imported from a disk-stored .pairs file are all of width 1.",
    "crumbs": [
      "Fundamentals concepts",
      "<span class='chapter-number'>2</span>  <span class='chapter-title'>Hi-C data structures in R</span>"
    ]
  },
  {
    "objectID": "pages/data-representation.html#visual-summary-of-the-hicexperiment-data-structure",
    "href": "pages/data-representation.html#visual-summary-of-the-hicexperiment-data-structure",
    "title": "\n2  Hi-C data structures in R\n",
    "section": "\n2.5 Visual summary of the HiCExperiment data structure",
    "text": "2.5 Visual summary of the HiCExperiment data structure\nThe HiCExperiment data structure provided by the HiCExperiment package inherits methods from core GInteractions and BiocFile classes to provide a flexible representation of Hi-C data in R. It allows random access-based queries to seamlessly import parts or all the data contained in disk-stored Hi-C contact matrices in a variety of formats.",
    "crumbs": [
      "Fundamentals concepts",
      "<span class='chapter-number'>2</span>  <span class='chapter-title'>Hi-C data structures in R</span>"
    ]
  },
  {
    "objectID": "pages/parsing.html",
    "href": "pages/parsing.html",
    "title": "\n3  Manipulating Hi-C data in R\n",
    "section": "",
    "text": "3.1 Subsetting a contact matrix\nTwo entirely different approaches are possible to subset of a Hi-C contact matrix:",
    "crumbs": [
      "Fundamentals concepts",
      "<span class='chapter-number'>3</span>  <span class='chapter-title'>Manipulating Hi-C data in R</span>"
    ]
  },
  {
    "objectID": "pages/parsing.html#subsetting-a-contact-matrix",
    "href": "pages/parsing.html#subsetting-a-contact-matrix",
    "title": "\n3  Manipulating Hi-C data in R\n",
    "section": "",
    "text": "Subsetting before importing: leveraging random access to a disk-stored contact matrix to only import interactions overlapping with a genomic locus of interest.\nSubsetting after importing: parsing the entire contact matrix in memory, and subsequently subset interactions overlapping with a genomic locus of interest.\n\n\n\n3.1.1 Subsetting before import: with focus\n\nSpecifying a focus when importing a dataset in R (i.e. \"Subset first, then parse\") is generally the recommended approach to import Hi-C data in R.\nThe focus argument can be set when importing a ContactFile in R, as follows:\n\n## This code is not evaluated\n## Change `focus = \"...\"` accordingly (see below)\nimport(cf, focus = \"...\")\n\nThis ensures that only the needed data is parsed in R, reducing memory load and accelerating the import. Thus, this should be the preferred way of parsing HiCExperiment data, as disk-stored contact matrices allow efficient random access to indexed data.\nfocus can be any of the following string types:\n\n#   \"II\"                                  --&gt; import contacts over an entire chromosome\n#   \"II:300001-800000\"                    --&gt; import on-diagonal contacts within a chromosome\n#   \"II:300001-400000|II:600001-700000\"   --&gt; import off-diagonal contacts within a chromosome\n#   \"II|III\"                              --&gt; import contacts between two chromosomes\n#   \"II:300001-800000|V:1-500000\"         --&gt; import contacts between segments of two chromosomes\n\n\n\n\n\n\n\nImportantMore examples for import with focus argument 👇\n\n\n\n\n\n\nSubsetting to a specific on-diagonal genomic location using standard UCSC coordinates query:\n\n\nimport(cf, focus = 'II:300001-800000', resolution = 2000)\n##  `HiCExperiment` object with 301,018 contacts over 250 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752\" \n##  focus: \"II:300,001-800,000\" \n##  resolutions(5): 1000 2000 4000 8000 16000\n##  active resolution: 2000 \n##  interactions: 17974 \n##  scores(2): count balanced \n##  topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n##  pairsFile: N/A \n##  metadata(0):\n\n\nSubsetting to a specific off-diagonal genomic location using pairs of coordinates query:\n\n\nimport(cf, focus = 'II:300001-400000|II:600001-700000', resolution = 2000)\n##  `HiCExperiment` object with 402 contacts over 100 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752\" \n##  focus: \"II:300001-400000|II:600001-700000\" \n##  resolutions(5): 1000 2000 4000 8000 16000\n##  active resolution: 2000 \n##  interactions: 357 \n##  scores(2): count balanced \n##  topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n##  pairsFile: N/A \n##  metadata(0):\n\n\nSubsetting interactions to retain those constrained within a single chromosome:\n\n\nimport(cf, focus = 'II', resolution = 2000)\n##  `HiCExperiment` object with 471,364 contacts over 407 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752\" \n##  focus: \"II\" \n##  resolutions(5): 1000 2000 4000 8000 16000\n##  active resolution: 2000 \n##  interactions: 34063 \n##  scores(2): count balanced \n##  topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n##  pairsFile: N/A \n##  metadata(0):\n\n\nSubsetting interactions to retain those between two chromosomes:\n\n\nimport(cf, focus = 'II|III', resolution = 2000)\n##  `HiCExperiment` object with 9,092 contacts over 566 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752\" \n##  focus: \"II|III\" \n##  resolutions(5): 1000 2000 4000 8000 16000\n##  active resolution: 2000 \n##  interactions: 7438 \n##  scores(2): count balanced \n##  topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n##  pairsFile: N/A \n##  metadata(0):\n\n\nSubsetting interactions to retain those between parts of two chromosomes:\n\n\nimport(cf, focus = 'II:300001-800000|V:1-500000', resolution = 2000)\n##  `HiCExperiment` object with 7,147 contacts over 500 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752\" \n##  focus: \"II:300001-800000|V:1-500000\" \n##  resolutions(5): 1000 2000 4000 8000 16000\n##  active resolution: 2000 \n##  interactions: 6523 \n##  scores(2): count balanced \n##  topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n##  pairsFile: N/A \n##  metadata(0):\n\n\n\n\n\n\n3.1.2 Subsetting after import\nIt may sometimes be desirable to import a full dataset from disk first, and only then perform in-memory subsetting of the HiCExperiment object (i.e. \"Parse first, then subset\"). This is for example necessary when the end user aims to investigate subsets of interactions across a large number of different areas of a contact matrix.\nSeveral strategies are possible to allow subsetting of imported data, either with subsetByOverlaps or [.\n\n3.1.2.1 subsetByOverlaps(&lt;HiCExperiment&gt;, &lt;GRanges&gt;)\n\nsubsetByOverlaps can take a HiCExperiment as a query and a GRanges as a query. In this case, the GRanges is used to extract a subset of a HiCExperiment constrained within a specific genomic location.\n\ntelomere &lt;- GRanges(\"II:700001-813184\")\nsubsetByOverlaps(hic, telomere) |&gt; interactions()\n##  GInteractions object with 1540 interactions and 4 metadata columns:\n##           seqnames1       ranges1     seqnames2       ranges2 |   bin_id1\n##               &lt;Rle&gt;     &lt;IRanges&gt;         &lt;Rle&gt;     &lt;IRanges&gt; | &lt;numeric&gt;\n##       [1]        II 700001-702000 ---        II 700001-702000 |       466\n##       [2]        II 700001-702000 ---        II 702001-704000 |       466\n##       [3]        II 700001-702000 ---        II 704001-706000 |       466\n##       [4]        II 700001-702000 ---        II 706001-708000 |       466\n##       [5]        II 700001-702000 ---        II 708001-710000 |       466\n##       ...       ...           ... ...       ...           ... .       ...\n##    [1536]        II 804001-806000 ---        II 810001-812000 |       518\n##    [1537]        II 806001-808000 ---        II 806001-808000 |       519\n##    [1538]        II 806001-808000 ---        II 808001-810000 |       519\n##    [1539]        II 806001-808000 ---        II 810001-812000 |       519\n##    [1540]        II 808001-810000 ---        II 808001-810000 |       520\n##             bin_id2     count  balanced\n##           &lt;numeric&gt; &lt;numeric&gt; &lt;numeric&gt;\n##       [1]       466        30 0.0283618\n##       [2]       467       145 0.0709380\n##       [3]       468       124 0.0704979\n##       [4]       469        59 0.0510221\n##       [5]       470        59 0.0384004\n##       ...       ...       ...       ...\n##    [1536]       521         1       NaN\n##    [1537]       519        15 0.0560633\n##    [1538]       520        25       NaN\n##    [1539]       521         1       NaN\n##    [1540]       520        10       NaN\n##    -------\n##    regions: 57 ranges and 4 metadata columns\n##    seqinfo: 16 sequences from an unspecified genome\n\nBy default, subsetByOverlaps(hic, telomere) will only recover interactions constrained within telomere, i.e. interactions for which both ends are in telomere.\nAlternatively, type = \"any\" can be specified to get all interactions with at least one of their anchors within telomere.\n\nsubsetByOverlaps(hic, telomere, type = \"any\") |&gt; interactions()\n##  GInteractions object with 6041 interactions and 4 metadata columns:\n##           seqnames1       ranges1     seqnames2       ranges2 |   bin_id1\n##               &lt;Rle&gt;     &lt;IRanges&gt;         &lt;Rle&gt;     &lt;IRanges&gt; | &lt;numeric&gt;\n##       [1]        II 300001-302000 ---        II 702001-704000 |       266\n##       [2]        II 300001-302000 ---        II 704001-706000 |       266\n##       [3]        II 300001-302000 ---        II 768001-770000 |       266\n##       [4]        II 300001-302000 ---        II 784001-786000 |       266\n##       [5]        II 302001-304000 ---        II 740001-742000 |       267\n##       ...       ...           ... ...       ...           ... .       ...\n##    [6037]        II 804001-806000 ---        II 810001-812000 |       518\n##    [6038]        II 806001-808000 ---        II 806001-808000 |       519\n##    [6039]        II 806001-808000 ---        II 808001-810000 |       519\n##    [6040]        II 806001-808000 ---        II 810001-812000 |       519\n##    [6041]        II 808001-810000 ---        II 808001-810000 |       520\n##             bin_id2     count    balanced\n##           &lt;numeric&gt; &lt;numeric&gt;   &lt;numeric&gt;\n##       [1]       467         1 0.000590999\n##       [2]       468         1 0.000686799\n##       [3]       500         1 0.000728215\n##       [4]       508         1 0.000923092\n##       [5]       486         1 0.000382222\n##       ...       ...       ...         ...\n##    [6037]       521         1         NaN\n##    [6038]       519        15   0.0560633\n##    [6039]       520        25         NaN\n##    [6040]       521         1         NaN\n##    [6041]       520        10         NaN\n##    -------\n##    regions: 257 ranges and 4 metadata columns\n##    seqinfo: 16 sequences from an unspecified genome\n\n\n3.1.2.2 &lt;HiCExperiment&gt;[\"...\"]\n\nThe square bracket operator [ allows for more advanced textual queries, similarly to focus arguments that can be used when importing contact matrices in memory.\nThis ensures that only the needed data is parsed in R, reducing memory load and accelerating the import. Thus, this should be the preferred way of parsing HiCExperiment data, as disk-stored contact matrices allow efficient random access to indexed data.\nThe following string types can be used to subset a HiCExperiment object with the [ notation:\n\n#   \"II\"                                  --&gt; import contacts over an entire chromosome\n#   \"II:300001-800000\"                    --&gt; import on-diagonal contacts within a chromosome\n#   \"II:300001-400000|II:600001-700000\"   --&gt; import off-diagonal contacts within a chromosome\n#   \"II|III\"                              --&gt; import contacts between two chromosomes\n#   \"II:300001-800000|V:1-500000\"         --&gt; import contacts between segments of two chromosomes\n#   c(\"II\", \"III\", \"IV\")                  --&gt; import contacts within and between several chromosomes\n\n\n\n\n\n\n\nImportantMore examples for subsetting with [ 👇\n\n\n\n\n\n\nSubsetting to a specific on-diagonal genomic location using standard UCSC coordinates query:\n\n\nhic[\"II:800001-813184\"]\n##  `HiCExperiment` object with 1,040 contacts over 6 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752\" \n##  focus: \"II:800,001-813,184\" \n##  resolutions(5): 1000 2000 4000 8000 16000\n##  active resolution: 2000 \n##  interactions: 19 \n##  scores(2): count balanced \n##  topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n##  pairsFile: N/A \n##  metadata(0):\n\n\nSubsetting to a specific off-diagonal genomic location using pairs of coordinates query:\n\n\nhic[\"II:300001-320000|II:800001-813184\"]\n##  `HiCExperiment` object with 3 contacts over 6 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752\" \n##  focus: \"II:300001-320000|II:800001-813184\" \n##  resolutions(5): 1000 2000 4000 8000 16000\n##  active resolution: 2000 \n##  interactions: 3 \n##  scores(2): count balanced \n##  topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n##  pairsFile: N/A \n##  metadata(0):\n\n\nSubsetting interactions to retain those constrained within a single chromosome:\n\n\nhic[\"II\"]\n##  `HiCExperiment` object with 306,212 contacts over 257 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752\" \n##  focus: \"II\" \n##  resolutions(5): 1000 2000 4000 8000 16000\n##  active resolution: 2000 \n##  interactions: 18513 \n##  scores(2): count balanced \n##  topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n##  pairsFile: N/A \n##  metadata(0):\n\n\nSubsetting interactions to retain those between two chromosomes:\n\n\nhic[\"II|IV\"]\n##  `HiCExperiment` object with 0 contacts over 0 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752\" \n##  focus: \"II:1-813184|IV:1-1531933\" \n##  resolutions(5): 1000 2000 4000 8000 16000\n##  active resolution: 2000 \n##  interactions: 0 \n##  scores(2): count balanced \n##  topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n##  pairsFile: N/A \n##  metadata(0):\n\n\nSubsetting interactions to retain those between segments of two chromosomes:\n\n\nhic[\"II:300001-320000|IV:1-100000\"]\n##  `HiCExperiment` object with 0 contacts over 0 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752\" \n##  focus: \"II:300001-320000|IV:1-100000\" \n##  resolutions(5): 1000 2000 4000 8000 16000\n##  active resolution: 2000 \n##  interactions: 0 \n##  scores(2): count balanced \n##  topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n##  pairsFile: N/A \n##  metadata(0):\n\n\nSubsetting interactions to retain those constrained within several chromosomes:\n\n\nhic[c('II', 'III', 'IV')]\n##  `HiCExperiment` object with 306,212 contacts over 257 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752\" \n##  focus: \"II, III, IV\" \n##  resolutions(5): 1000 2000 4000 8000 16000\n##  active resolution: 2000 \n##  interactions: 18513 \n##  scores(2): count balanced \n##  topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n##  pairsFile: N/A \n##  metadata(0):\n\nSome notes:\n\nThis last example (subsetting for a vector of several chromosomes) is the only scenario for which [-based in-memory subsetting of pre-imported data is the only way to go, as such subsetting is not possible with focus from disk-stored data.\nAll the other [ subsetting scenarii illustrated above can be achieved more efficiently using the focus argument when importing data into a HiCExperiment object.\nHowever, keep in mind that subsetting preserves extra data, e.g. added scores, topologicalFeatures, metadata or pairsFile, whereas this information is lost using focus with import.\n\n\n\n\n\n3.1.3 Zooming on a HiCExperiment\n\n“Zooming” refers to dynamically changing the resolution of a HiCExperiment. By zooming a HiCExperiment, one can refine or coarsen the contact matrix. This operation takes aContactFile and focus from an existing HiCExperiment input and re-generates a new HiCExperiment with updated resolution, interactions and scores. Note that zoom will preserve existing metadata, topologicalFeatures and pairsFile information.\n\nhic\n##  `HiCExperiment` object with 306,212 contacts over 257 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752\" \n##  focus: \"II:300,001-813,184\" \n##  resolutions(5): 1000 2000 4000 8000 16000\n##  active resolution: 2000 \n##  interactions: 18513 \n##  scores(2): count balanced \n##  topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n##  pairsFile: N/A \n##  metadata(0):\n\nzoom(hic, 4000)\n##  `HiCExperiment` object with 306,212 contacts over 129 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752\" \n##  focus: \"II:300,001-813,184\" \n##  resolutions(5): 1000 2000 4000 8000 16000\n##  active resolution: 4000 \n##  interactions: 6800 \n##  scores(2): count balanced \n##  topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n##  pairsFile: N/A \n##  metadata(0):\n\nzoom(hic, 1000)\n##  `HiCExperiment` object with 306,212 contacts over 514 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752\" \n##  focus: \"II:300,001-813,184\" \n##  resolutions(5): 1000 2000 4000 8000 16000\n##  active resolution: 1000 \n##  interactions: 44363 \n##  scores(2): count balanced \n##  topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n##  pairsFile: N/A \n##  metadata(0):\n\n\n\n\n\n\n\nNoteNote\n\n\n\nThe sum of raw counts do not change after zooming, however the number of individual interactions and regions changes.\n\nlength(hic)\n##  [1] 18513\nlength(zoom(hic, 1000))\n##  [1] 44363\nlength(zoom(hic, 4000))\n##  [1] 6800\nsum(scores(hic, \"count\"))\n##  [1] 306212\nsum(scores(zoom(hic, 1000), \"count\"))\n##  [1] 306212\nsum(scores(zoom(hic, 4000), \"count\"))\n##  [1] 306212\n\n\n\n\n\n\n\n\n\nImportant\n\n\n\n\n\nzoom does not change the focus! It only affects the resolution (and consequently, the interactions).\n\nzoom will only work for multi-resolution contact matrices, e.g. .mcool or .hic.",
    "crumbs": [
      "Fundamentals concepts",
      "<span class='chapter-number'>3</span>  <span class='chapter-title'>Manipulating Hi-C data in R</span>"
    ]
  },
  {
    "objectID": "pages/parsing.html#updating-an-hicexperiment-object",
    "href": "pages/parsing.html#updating-an-hicexperiment-object",
    "title": "\n3  Manipulating Hi-C data in R\n",
    "section": "\n3.2 Updating an HiCExperiment object",
    "text": "3.2 Updating an HiCExperiment object\n\n\n\n\n\n\nTipTL;DR: Which HiCExperiment slots are mutable (✅) / immutable (⛔️)?\n\n\n\n\n\nfileName(hic): ⛔️ (obtained from disk-stored file)\n\nfocus(hic): 🤔 (see subsetting section)\n\nresolutions(hic): ⛔️ (obtained from disk-stored file)\n\nresolution(hic): 🤔 (see zooming section)\n\ninteractions(hic): ⛔️ (obtained from disk-stored file)\n\nscores(hic): ✅\n\ntopologicalFeatures(hic): ✅\n\npairsFile(hic): ✅\n\nmetadata(hic): ✅\n\n\n\n\n3.2.1 Immutable slots\nAn HiCExperiment object acts as an interface exposing disk-stored data. This implies that the fileName slot itself is immutable (i.e. cannot be changed). This should be obvious, as a HiCExperiment has to be associated with a disk-stored contact matrix to properly function (except in some advanced cases developed in next chapters).\nFor this reason, methods to manually modify interactions and resolutions slots are also not exposed in the HiCExperiment package.\nA corollary of this is that the associated regions and anchors of an HiCExperiment should not be modified by hand either, since they are directly linked to interactions.\n\n3.2.2 Mutable slots\nThat being said, HiCExperiment objects are flexible and can be partially modified in memory without having to change/overwrite the original, disk-stored contact matrix.\nSeveral slots can be modified in memory: slots, topologicalFeatures, pairsFile and metadata.\n\n3.2.2.1 scores\n\nWe have seen in the previous chapter that scores are stored in a list and are available using the scores function.\n\nscores(hic)\n##  List of length 2\n##  names(2): count balanced\n\nhead(scores(hic, \"count\"))\n##  [1]  7 92 75 61 38 43\n\nhead(scores(hic, \"balanced\"))\n##  [1] 0.009657438 0.076622340 0.054101992 0.042940512 0.040905212 0.029293930\n\nExtra scores can be added to this list, e.g. to describe the “expected” interaction frequency for each interaction stored in the HiCExperiment object). This can be achieved using the scores()&lt;- function.\n\nscores(hic, \"random\") &lt;- runif(length(hic))\n\nscores(hic)\n##  List of length 3\n##  names(3): count balanced random\n\nhead(scores(hic, \"random\"))\n##  [1] 0.91677155 0.33187048 0.41033088 0.51380464 0.60195827 0.02179668\n\n\n3.2.2.2 topologicalFeatures\n\nThe end-user can create additional topologicalFeatures or modify the existing ones using the topologicalFeatures()&lt;- function.\n\ntopologicalFeatures(hic, 'CTCF') &lt;- GRanges(c(\n    \"II:340-352\", \n    \"II:3520-3532\", \n    \"II:7980-7992\", \n    \"II:9240-9252\" \n))\ntopologicalFeatures(hic, 'CTCF')\n##  GRanges object with 4 ranges and 0 metadata columns:\n##        seqnames    ranges strand\n##           &lt;Rle&gt; &lt;IRanges&gt;  &lt;Rle&gt;\n##    [1]       II   340-352      *\n##    [2]       II 3520-3532      *\n##    [3]       II 7980-7992      *\n##    [4]       II 9240-9252      *\n##    -------\n##    seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\ntopologicalFeatures(hic, 'loops') &lt;- GInteractions(\n    topologicalFeatures(hic, 'CTCF')[rep(1:3, each = 3)],\n    topologicalFeatures(hic, 'CTCF')[rep(1:3, 3)]\n)\ntopologicalFeatures(hic, 'loops')\n##  GInteractions object with 9 interactions and 0 metadata columns:\n##        seqnames1   ranges1     seqnames2   ranges2\n##            &lt;Rle&gt; &lt;IRanges&gt;         &lt;Rle&gt; &lt;IRanges&gt;\n##    [1]        II   340-352 ---        II   340-352\n##    [2]        II   340-352 ---        II 3520-3532\n##    [3]        II   340-352 ---        II 7980-7992\n##    [4]        II 3520-3532 ---        II   340-352\n##    [5]        II 3520-3532 ---        II 3520-3532\n##    [6]        II 3520-3532 ---        II 7980-7992\n##    [7]        II 7980-7992 ---        II   340-352\n##    [8]        II 7980-7992 ---        II 3520-3532\n##    [9]        II 7980-7992 ---        II 7980-7992\n##    -------\n##    regions: 3 ranges and 0 metadata columns\n##    seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\nhic\n##  `HiCExperiment` object with 306,212 contacts over 257 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752\" \n##  focus: \"II:300,001-813,184\" \n##  resolutions(5): 1000 2000 4000 8000 16000\n##  active resolution: 2000 \n##  interactions: 18513 \n##  scores(3): count balanced random \n##  topologicalFeatures: compartments(0) borders(0) loops(9) viewpoints(0) CTCF(4) \n##  pairsFile: N/A \n##  metadata(0):\n\nAll these objects can be used in *Overlap methods, as they all extend the GRanges class of objects.\n\n# ---- This counts the number of times `CTCF` anchors are being used in the \n#      `loops` `GInteractions` object\ncountOverlaps(\n    query = topologicalFeatures(hic, 'CTCF'), \n    subject = topologicalFeatures(hic, 'loops')\n)\n##  [1] 5 5 5 0\n\n\n3.2.2.3 pairsFile\n\nIf pairsFile is not specified when importing the ContactFile into a HiCExperiment object, one can add it later.\n\npairsf &lt;- HiContactsData('yeast_wt', 'pairs.gz')\n##  see ?HiContactsData and browseVignettes('HiContactsData') for documentation\n##  loading from cache\n\n\npairsFile(hic) &lt;- pairsf\nhic\n##  `HiCExperiment` object with 306,212 contacts over 257 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752\" \n##  focus: \"II:300,001-813,184\" \n##  resolutions(5): 1000 2000 4000 8000 16000\n##  active resolution: 2000 \n##  interactions: 18513 \n##  scores(3): count balanced random \n##  topologicalFeatures: compartments(0) borders(0) loops(9) viewpoints(0) CTCF(4) \n##  pairsFile: /home/biocbuild/.cache/R/ExperimentHub/38d2025bce7760_7753 \n##  metadata(0):\n\n\n3.2.2.4 metadata\n\nMetadata associated with a HiCExperiment can be updated at any point.\n\nmetadata(hic) &lt;- list(\n    info = \"HiCExperiment created from an example .mcool file from `HiContactsData`\", \n    date = date()\n)\nmetadata(hic)\n##  $info\n##  [1] \"HiCExperiment created from an example .mcool file from `HiContactsData`\"\n##  \n##  $date\n##  [1] \"Mon Mar 16 10:21:01 2026\"",
    "crumbs": [
      "Fundamentals concepts",
      "<span class='chapter-number'>3</span>  <span class='chapter-title'>Manipulating Hi-C data in R</span>"
    ]
  },
  {
    "objectID": "pages/parsing.html#coercing-hicexperiment-objects",
    "href": "pages/parsing.html#coercing-hicexperiment-objects",
    "title": "\n3  Manipulating Hi-C data in R\n",
    "section": "\n3.3 Coercing HiCExperiment objects",
    "text": "3.3 Coercing HiCExperiment objects\nConvenient coercing functions exist to transform data stored as a HiCExperiment into another class.\n\n\nas.matrix(): allows to coerce the HiCExperiment into a sparse or dense matrix (using the sparse logical argument, TRUE by default) and choosing specific scores of interest (using the use.scores argument, \"balanced\" by default).\n\n\n# ----- `as.matrix` coerces a `HiCExperiment` into a dense `matrix` by default \nas.matrix(hic) |&gt; class()\n##  [1] \"matrix\" \"array\"\n\nas.matrix(hic) |&gt; dim()\n##  [1] 257 257\n\n# ----- One can specify which scores should be used when coercing into a matrix\nas.matrix(hic, use.scores = \"balanced\")[1:5, 1:5]\n##              [,1]       [,2]       [,3]       [,4]       [,5]\n##  [1,] 0.009657438 0.07662234 0.05410199 0.04294051 0.04090521\n##  [2,] 0.076622340 0.05128277 0.09841564 0.06926737 0.05263611\n##  [3,] 0.054101992 0.09841564 0.05657589 0.08723160 0.07316890\n##  [4,] 0.042940512 0.06926737 0.08723160 0.03699543 0.08403496\n##  [5,] 0.040905212 0.05263611 0.07316890 0.08403496 0.04787415\n\nas.matrix(hic, use.scores = \"count\")[1:5, 1:5]\n##       [,1] [,2] [,3] [,4] [,5]\n##  [1,]    7   92   75   61   38\n##  [2,]   92  102  226  163   81\n##  [3,]   75  226  150  237  130\n##  [4,]   61  163  237  103  153\n##  [5,]   38   81  130  153   57\n\n# ----- If needed, one can coerce a HiCExperiment into a sparse matrix\nas.matrix(hic, use.scores = \"count\", sparse = TRUE)[1:5, 1:5]\n##  5 x 5 sparse Matrix of class \"dgTMatrix\"\n##                         \n##  [1,]  7  92  75  61  38\n##  [2,] 92 102 226 163  81\n##  [3,] 75 226 150 237 130\n##  [4,] 61 163 237 103 153\n##  [5,] 38  81 130 153  57\n\n\n\nas.data.frame(): simply coercing interactions into a rectangular data frame\n\n\nas.data.frame(hic) |&gt; head()\n##    seqnames1 start1   end1 width1 strand1 bin_id1    weight1 center1\n##  1        II 300001 302000   2000       *     266 0.03714342  301000\n##  2        II 300001 302000   2000       *     266 0.03714342  301000\n##  3        II 300001 302000   2000       *     266 0.03714342  301000\n##  4        II 300001 302000   2000       *     266 0.03714342  301000\n##  5        II 300001 302000   2000       *     266 0.03714342  301000\n##  6        II 300001 302000   2000       *     266 0.03714342  301000\n##    seqnames2 start2   end2 width2 strand2 bin_id2    weight2 center2 count\n##  1        II 300001 302000   2000       *     266 0.03714342  301000     7\n##  2        II 302001 304000   2000       *     267 0.02242258  303000    92\n##  3        II 304001 306000   2000       *     268 0.01942093  305000    75\n##  4        II 306001 308000   2000       *     269 0.01895202  307000    61\n##  5        II 308001 310000   2000       *     270 0.02898098  309000    38\n##  6        II 310001 312000   2000       *     271 0.01834118  311000    43\n##       balanced     random\n##  1 0.009657438 0.91677155\n##  2 0.076622340 0.33187048\n##  3 0.054101992 0.41033088\n##  4 0.042940512 0.51380464\n##  5 0.040905212 0.60195827\n##  6 0.029293930 0.02179668\n\n\n\n\n\n\n\nWarning\n\n\n\nThese coercing methods only operate on interactions and scores, and discard all other information, e.g. regarding genomic regions, available resolutions, associated metadata, pairsFile or topologicalFeatures.",
    "crumbs": [
      "Fundamentals concepts",
      "<span class='chapter-number'>3</span>  <span class='chapter-title'>Manipulating Hi-C data in R</span>"
    ]
  },
  {
    "objectID": "pages/visualization.html",
    "href": "pages/visualization.html",
    "title": "\n4  Hi-C data visualization\n",
    "section": "",
    "text": "4.1 Visualizing Hi-C contact maps\nVisualizing Hi-C contact maps is often a necessary step in exploratory data analysis. A Hi-C contact map is usually displayed as a heatmap, in which:",
    "crumbs": [
      "Fundamentals concepts",
      "<span class='chapter-number'>4</span>  <span class='chapter-title'>Hi-C data visualization</span>"
    ]
  },
  {
    "objectID": "pages/visualization.html#visualizing-hi-c-contact-maps",
    "href": "pages/visualization.html#visualizing-hi-c-contact-maps",
    "title": "\n4  Hi-C data visualization\n",
    "section": "",
    "text": "Each axis represents a section of the genome of interest (either a segment of a chromosome, or several chromosomes, …).\nThe color code aims to represent “interaction frequency”, which can be expressed in “raw” counts or normalized (balanced).\nOther metrics can also be displayed in Hi-C heatmaps, e.g. ratios of interaction frequency between two Hi-C experiments, p-values of differential interaction analysis, …\nAxes are often identical, representing interactions constrained within a single genomic window, a.k.a. on-diagonal matrices.\nHowever, axes can be different: this is the case when off-diagonal matrices are displayed.\n\n\n4.1.1 Single map\nSimple visualization of disk-stored Hi-C contact matrices can be done by:\n\nImporting the interactions over the genomic location of interest into a HiCExperiment object;\nUsing plotMatrix function (provided by HiContacts) to generate a plot.\n\n\nlibrary(HiContacts)\nplotMatrix(hic)\n\n\n\n\n\n\n\n\n\n\n\n\n\nNoteNote\n\n\n\nA caption summarizing the plotting parameters is added below the heatmap. This can be removed with caption = FALSE.\n\n\n\n4.1.2 Horizontal map\nHi-C maps are sometimes visualized in a “horizontal” style, where a square on-diagonal heatmap is tilted by 45˚ and truncated to only show interactions up to a certain distance from the main diagonal.\nWhen a maxDistance argument is provided to plotMatrix, it automatically generates a horizontal-style heatmap.\n\nplotMatrix(hic, maxDistance = 200000)\n\n\n\n\n\n\n\n\n4.1.3 Side-by-side maps\nSometimes, one may want to visually plot 2 Hi-C samples side by side to compare the interaction landscapes over the same genomic locus. This can be done by adding a second HiCExperiment (imported with the same focus) with the compare.to argument.\nHere, we are importing a second .mcool file corresponding to a Hi-C experiment performed in a eco1 yeast mutant:\n\nhic2 &lt;- import(\n    CoolFile(HiContactsData('yeast_eco1', 'mcool')), \n    focus = 'V', \n    resolution = 2000\n)\n##  see ?HiContactsData and browseVignettes('HiContactsData') for documentation\n##  loading from cache\n\nWe then plot the 2 matrices side by side. The first will be displayed in the top right corner and the second (provided with compare.to) will be in the bottom left corner.\n\nplotMatrix(hic, compare.to = hic2)\n##  [1] \"/home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752 | /home/biocbuild/.cache/R/ExperimentHub/b9a3f464a3cfc_7754\"\n\n\n\n\n\n\n\n\n4.1.4 Plotting multiple chromosomes\nInteractions from multiple chromosomes can be visualized in a Hi-C heatmap. One needs to (1) first parse the entire contact matrix in R, (2) then subset interactions over chromosomes of interest with [ and (3) use plotMatrix to generate the multi-chromosome plot.\n\nfull_hic &lt;- import(cf, resolution = 4000)\nplotMatrix(full_hic)\n\n\n\n\n\n\nhic_subset &lt;- full_hic[c(\"II\", \"III\", \"IV\")]\nplotMatrix(hic_subset)",
    "crumbs": [
      "Fundamentals concepts",
      "<span class='chapter-number'>4</span>  <span class='chapter-title'>Hi-C data visualization</span>"
    ]
  },
  {
    "objectID": "pages/visualization.html#hi-c-maps-customization-options",
    "href": "pages/visualization.html#hi-c-maps-customization-options",
    "title": "\n4  Hi-C data visualization\n",
    "section": "\n4.2 Hi-C maps customization options",
    "text": "4.2 Hi-C maps customization options\nA number of customization options are available for the plotMatrix function. The next subsections focus on how to:\n\nPick the scores of interest to represent in a Hi-C heatmap;\nChange the numeric scale and boundaries;\nChange the color map;\nExtra customization options\n\n\n4.2.1 Choosing scores\nBy default, plotMatrix will attempt to plot balanced (coverage normalized) Hi-C matrices. However, extra scores may be associated with interactions in a HiCExperiment object (more on this in the next chapter)\nFor instance, we can plot the count scores, which are un-normalized raw contact counts directly obtained when binning a .pairs file:\n\nplotMatrix(hic, use.scores = 'count')\n\n\n\n\n\n\n\n\n4.2.2 Choosing scale\nThe color scale is automatically adjusted to range from the minimum to the maximum scores of the HiCExperiment being plotted. This can be adjusted using the limits argument.\n\nplotMatrix(hic, limits = c(-3.5, -1))\n\n\n\n\n\n\n\n\n4.2.3 Choosing color map\n?HiContacts::palettes returns a list of available color maps to use with plotMatrix. Any custom color map can also be used by manually specifying a vector of colors.\n\n# ----- `afmhotr` color map is shipped in the `HiContacts` package\nafmhotrColors() \n##  [1] \"#ffffff\" \"#f8f5c3\" \"#f4ee8d\" \"#f6be35\" \"#ee7d32\" \"#c44228\" \"#821d19\"\n##  [8] \"#381211\" \"#050606\"\nplotMatrix(\n    hic, \n    use.scores = 'balanced',\n    limits = c(-4, -1),\n    cmap = afmhotrColors()\n)",
    "crumbs": [
      "Fundamentals concepts",
      "<span class='chapter-number'>4</span>  <span class='chapter-title'>Hi-C data visualization</span>"
    ]
  },
  {
    "objectID": "pages/visualization.html#advanced-visualization",
    "href": "pages/visualization.html#advanced-visualization",
    "title": "\n4  Hi-C data visualization\n",
    "section": "\n4.3 Advanced visualization",
    "text": "4.3 Advanced visualization\n\n4.3.1 Overlaying topological features\nTopological features (e.g. chromatin loops, domain borders, A/B compartments, e.g. …) are often displayed over a Hi-C heatmap.\nTo illustrate how to do this, let’s import pre-computed chromatin loops in R. These loops have been identified using chromosight (Matthey-Doret et al. (2020)) on the contact matrix which we imported interactions from.\n\nlibrary(rtracklayer)\nlibrary(InteractionSet)\nloops &lt;- system.file('extdata', 'S288C-loops.bedpe', package = 'HiCExperiment') |&gt; \n    import() |&gt; \n    makeGInteractionsFromGRangesPairs()\nloops\n##  GInteractions object with 162 interactions and 0 metadata columns:\n##          seqnames1       ranges1     seqnames2       ranges2\n##              &lt;Rle&gt;     &lt;IRanges&gt;         &lt;Rle&gt;     &lt;IRanges&gt;\n##      [1]         I     3001-4000 ---         I   29001-30000\n##      [2]         I   29001-30000 ---         I   50001-51000\n##      [3]         I   95001-96000 ---         I 128001-129000\n##      [4]         I 133001-134000 ---         I 157001-158000\n##      [5]        II     8001-9000 ---        II   46001-47000\n##      ...       ...           ... ...       ...           ...\n##    [158]       XVI 773001-774000 ---       XVI 803001-804000\n##    [159]       XVI 834001-835000 ---       XVI 859001-860000\n##    [160]       XVI 860001-861000 ---       XVI 884001-885000\n##    [161]       XVI 901001-902000 ---       XVI 940001-941000\n##    [162]       XVI 917001-918000 ---       XVI 939001-940000\n##    -------\n##    regions: 316 ranges and 0 metadata columns\n##    seqinfo: 16 sequences from an unspecified genome; no seqlengths\n\nSimilarly, borders have also been mapped with chromosight. We can also import them in R.\n\nborders &lt;- system.file('extdata', 'S288C-borders.bed', package = 'HiCExperiment') |&gt; \n    import()\nborders\n##  GRanges object with 814 ranges and 0 metadata columns:\n##          seqnames        ranges strand\n##             &lt;Rle&gt;     &lt;IRanges&gt;  &lt;Rle&gt;\n##      [1]        I   73001-74000      *\n##      [2]        I 108001-109000      *\n##      [3]        I 181001-182000      *\n##      [4]       II   90001-91000      *\n##      [5]       II 119001-120000      *\n##      ...      ...           ...    ...\n##    [810]      XVI 777001-778000      *\n##    [811]      XVI 796001-797000      *\n##    [812]      XVI 811001-812000      *\n##    [813]      XVI 890001-891000      *\n##    [814]      XVI 933001-934000      *\n##    -------\n##    seqinfo: 16 sequences from an unspecified genome; no seqlengths\n\nChromatin loops are stored in GInteractions while borders are GRanges. The former will be displayed as off-diagonal circles and the later as on-diagonal diamonds on the Hi-C heatmap.\n\nplotMatrix(hic, loops = loops, borders = borders)\n\n\n\n\n\n\n\n\n4.3.2 Aggregated Hi-C maps\nFinally, Hi-C map “snippets” (i.e. extracts) are often aggregated together to show an average signal. This analysis is sometimes referred to as APA (Aggregated Plot Analysis).\nAggregated Hi-C maps can be computed over a collection of targets using the aggregate function. These targets can be GRanges (to extract on-diagonal snippets) or GInteractions (to extract off-diagonal snippets). The flankingBins specifies how many matrix bins should be extracted on each side of the targets of interest.\nHere, we compute the aggregated Hi-C snippets of ± 15kb around each chromatin loop listed in loops.\n\nhic &lt;- zoom(hic, 1000)\naggr_loops &lt;- aggregate(hic, targets = loops, flankingBins = 15)\n##  Going through preflight checklist...\n##  Parsing the entire contact matrice as a sparse matrix...\n##  Modeling distance decay...\n##  Filtering for contacts within provided targets...\naggr_loops\n##  `AggrHiCExperiment` object over 148 targets \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752\" \n##  focus: 148 targets \n##  resolutions(5): 1000 2000 4000 8000 16000\n##  active resolution: 1000 \n##  interactions: 961 \n##  scores(4): count balanced expected detrended \n##  slices(4): count balanced expected detrended \n##  topologicalFeatures: targets(148) compartments(0) borders(0) loops(0) viewpoints(0) \n##  pairsFile: N/A \n##  metadata(0):\n\naggregate generates a AggrHiCExperiment object, a flavor of HiCExperiment class of objects.\n\n\nAggrHiCExperiment objects have an extra slices slot. This stores a list of arrays, one per scores. Each array is of 3 dimensions, x and y representing the heatmap axes, and z representing the index of the target.\n\nAggrHiCExperiment objects also have a mandatory topologicalFeatures element named targets, storing the genomic loci provided in aggregate.\n\n\nslices(aggr_loops)\n##  List of length 4\n##  names(4): count balanced expected detrended\ndim(slices(aggr_loops, 'count'))\n##  [1]  31  31 148\ntopologicalFeatures(aggr_loops, 'targets')\n##  Pairs object with 148 pairs and 0 metadata columns:\n##                      first            second\n##                  &lt;GRanges&gt;         &lt;GRanges&gt;\n##      [1]     I:14501-44500     I:35501-65500\n##      [2]    I:80501-110500   I:113501-143500\n##      [3]   I:118501-148500   I:142501-172500\n##      [4]    II:33501-63500    II:63501-93500\n##      [5]  II:134501-164500  II:159501-189500\n##      ...               ...               ...\n##    [144] XVI:586501-616500 XVI:606501-636500\n##    [145] XVI:733501-763500 XVI:754501-784500\n##    [146] XVI:758501-788500 XVI:788501-818500\n##    [147] XVI:819501-849500 XVI:844501-874500\n##    [148] XVI:845501-875500 XVI:869501-899500\n\nThe resulting AggrHiCExperiment can be plotted using the same plotMatrix function with the arguments described above.\n\nplotMatrix(\n    aggr_loops, \n    use.scores = 'detrended', \n    scale = 'linear', \n    limits = c(-1, 1), \n    cmap = bgrColors()\n)",
    "crumbs": [
      "Fundamentals concepts",
      "<span class='chapter-number'>4</span>  <span class='chapter-title'>Hi-C data visualization</span>"
    ]
  },
  {
    "objectID": "pages/matrix-centric.html",
    "href": "pages/matrix-centric.html",
    "title": "\n5  Matrix-centric analysis\n",
    "section": "",
    "text": "5.1 Operations in an individual matrix\nIn the first part of this book, we have seen how to query parts or all of the data contained in Hi-C contact matrices using the HiCExperiment object (Chapter 2), how to manipulate HiCExperiment objects (Chapter 3) and how to visualize Hi-C contact matrices as heatmaps (Chapter 4).\nThe HiContacts package directly operates on HiCExperiment objects and extends its usability by providing a comprehensive toolkit to analyze Hi-C data, focusing on four main topics:\nMatrix-centric analyses consider a HiCExperiment object from the “matrix” perspective to perform a range of matrix-based operations. This encompasses:",
    "crumbs": [
      "In-depth Hi-C analysis",
      "<span class='chapter-number'>5</span>  <span class='chapter-title'>Matrix-centric analysis</span>"
    ]
  },
  {
    "objectID": "pages/matrix-centric.html#operations-in-an-individual-matrix",
    "href": "pages/matrix-centric.html#operations-in-an-individual-matrix",
    "title": "\n5  Matrix-centric analysis\n",
    "section": "",
    "text": "5.1.1 Balancing a raw interaction count map\nHi-C sequencing coverage is systematically affected by multiple confounding factors, e.g.  density of restriction sites, GC%, genome mappability, etc.. Overall, it generally ends up not homogenous throughout the entire genome and this leads to artifacts in un-normalized count matrices.\nTo correct for sequencing coverage heterogeneity of raw count maps, Hi-C data can be normalized using matrix balancing approaches (Cournac et al. (2012), Imakaev et al. (2012)). This is generally done directly on the disk-stored matrices using out-of-memory strategies (e.g. with cooler balance &lt;.cool&gt;). However, if contact matrix files are imported into a HiCExperiment object but no balanced scores are available, in-memory balancing can be performed using the normalize function. This adds an extra ICE element in scores list (while the interactions themselves are unmodified).\n\nnormalized_hic &lt;- normalize(hic)\nnormalized_hic\n##  `HiCExperiment` object with 471,364 contacts over 407 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752\" \n##  focus: \"II\" \n##  resolutions(5): 1000 2000 4000 8000 16000\n##  active resolution: 2000 \n##  interactions: 34063 \n##  scores(3): count balanced ICE \n##  topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n##  pairsFile: N/A \n##  metadata(0):\n\nIt is possible to plot the different scores of the resulting object to visualize the newly computed scores. In this example, ICE scores should be nearly identical to balanced scores, which were originally imported from the disk-stored contact matrix.\n\n\ncowplot::plot_grid(\n    plotMatrix(normalized_hic, use.scores = 'count', caption = FALSE),\n    plotMatrix(normalized_hic, use.scores = 'balanced', caption = FALSE),\n    plotMatrix(normalized_hic, use.scores = 'ICE', caption = FALSE), \n    nrow = 1\n)\n\n\n\n\n\n\n\n\n\n5.1.2 Computing observed/expected (O/E) map\nThe most prominent feature of a balanced Hi-C matrix is the strong main diagonal. This main diagonal is observed because interactions between immediate adjacent genomic loci are more prone to happen than interactions spanning longer genomic distances. This “expected” behavior is due to the polymer nature of the chromosomes being studied, and can be locally estimated using the distance-dependent interaction frequency (a.k.a. the “distance law”, or P(s)). It can be used to compute an expected matrix on interactions.\nWhen it is desirable to “mask” this polymer behavior to emphasize topological structures formed by chromosomes, one can divide a given balanced matrix by its expected matrix, i.e. calculate the observed/expected (O/E) map. This is sometimes called “detrending”, as it effectively removes the average polymer behavior from the balanced matrix.\nThe detrend function performs this operation on a given HiCExperiment object. It adds two extra elements in scores list: expected and detrended metrics (while the interactions themselves are unmodified).\n\ndetrended_hic &lt;- detrend(hic)\ndetrended_hic\n##  `HiCExperiment` object with 471,364 contacts over 407 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752\" \n##  focus: \"II\" \n##  resolutions(5): 1000 2000 4000 8000 16000\n##  active resolution: 2000 \n##  interactions: 34063 \n##  scores(4): count balanced expected detrended \n##  topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n##  pairsFile: N/A \n##  metadata(0):\n\nTopological features will be visually more prominent in the O/E detrended Hi-C map.\n\n\ncowplot::plot_grid(\n    plotMatrix(detrended_hic, use.scores = 'balanced', scale = 'log10', limits = c(-3.5, -1.2), caption = FALSE),\n    plotMatrix(detrended_hic, use.scores = 'expected', scale = 'log10', limits = c(-3.5, -1.2), caption = FALSE),\n    plotMatrix(detrended_hic, use.scores = 'detrended', scale = 'linear', limits = c(-1, 1), cmap = bwrColors(), caption = FALSE), \n    nrow = 1\n)\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nNoteScale for detrended scores\n\n\n\n\n\nexpected scores are in linear scale and ± in the same amplitude than balanced scores;\n\ndetrended scores are in log2 scale, in general approximately centered around 0. When plotting detrended scores, scale = linear should be set to prevent the default log10 scaling.\n\n\n\n\n5.1.3 Computing autocorrelated map\nCorrelation matrices are often calculated from balanced Hi-C matrices. For instance, in genomes composed of eu- and heterochromatin, a correlation matrix can be used to reveal a checkerboard pattern emphasizing the segregation of chromatin into two A/B compartments (Lieberman-Aiden et al. (2009)).\nThe autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome.\n\nautocorr_hic &lt;- autocorrelate(hic)\nautocorr_hic\n##  `HiCExperiment` object with 471,364 contacts over 407 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752\" \n##  focus: \"II\" \n##  resolutions(5): 1000 2000 4000 8000 16000\n##  active resolution: 2000 \n##  interactions: 34063 \n##  scores(5): count balanced expected detrended autocorrelated \n##  topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n##  pairsFile: N/A \n##  metadata(0):\n\nSince these metrics represent correlation scores, they range between -1 and 1. Two loci with an autocorrelated score close to -1 have anti-correlated interaction profiles, while two loci with a autocorrelated score close to 1 are likely to interact with shared targets.\n\nsummary(scores(autocorr_hic, 'autocorrelated'))\n##       Min.   1st Qu.    Median      Mean   3rd Qu.      Max.       NAs \n##  -0.415614  0.002486  0.050404  0.064474  0.103600  1.000000       564\n\nCorrelated and anti-correlated loci will be visually represented in the autocorrelated Hi-C map in red and blue pixels, respectively.\n\n\n\n\n\n\nNoteNote\n\n\n\nHere we have illustrated how to compute an autocorrelation matrix from a HiCExperiment object using the example yeast Hi-C experiment. Bear in mind that this is unusual and not very useful, as yeast chromatin is not segregated in two compartments but rather follows a Rabl conformation (Duan et al. (2010)). An example of autocorrelation map from a vertebrate Hi-C experiment (for which chromatin is segregated in A/B compartments) is shown in Chapter 10.\n\n\n\nplotMatrix(\n    autocorr_hic, \n    use.scores = 'autocorrelated', \n    scale = 'linear', \n    limits = c(-0.4, 0.4), \n    cmap = bgrColors()\n)\n\n\n\n\n\n\n\n\n\n\n\n\n\nNoteScale for autocorrelated scores\n\n\n\n\n\nautocorrelated scores are in linear scale, in general approximately centered around 0. When plotting autocorrelated scores, scale = linear should be set to prevent the default log10 scaling.\n\nlimits should be manually set to c(-x, x) (0 &lt; x &lt;= 1) to ensure that the color range is effectively centered on 0.\n\n\n\n\n5.1.4 Despeckling (smoothing out) a contact map\nShallow-sequenced Hi-C libraries or matrices binned with an overly small bin size sometimes produce “grainy” Hi-C maps with noisy backgrounds. A grainy map may also be obtained when dividing two matrices, e.g. when computing the O/E ratio with detrend. This is particularly true for sparser long-range interactions. To overcome such limitations, HiCExperiment objects can be “despeckled” to smooth out focal speckles.\n\nhic2 &lt;- detrend(hic['II:400000-700000'])\nhic2 &lt;- despeckle(hic2, use.scores = 'detrended', focal.size = 2)\nhic2\n##  `HiCExperiment` object with 168,785 contacts over 150 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752\" \n##  focus: \"II:400,000-700,000\" \n##  resolutions(5): 1000 2000 4000 8000 16000\n##  active resolution: 2000 \n##  interactions: 11325 \n##  scores(5): count balanced expected detrended detrended.despeckled \n##  topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n##  pairsFile: N/A \n##  metadata(0):\n\nThe added &lt;use.scores&gt;.despeckled scores correspond to scores averaged using a window, whose width is provided with the focal.size argument. This results in a smoother Hi-C heatmap, effectively removing the “speckles” observed at longer range.\n\n\nlibrary(InteractionSet)\nloops &lt;- system.file('extdata', 'S288C-loops.bedpe', package = 'HiCExperiment') |&gt; \n    import() |&gt; \n    makeGInteractionsFromGRangesPairs()\nborders &lt;- system.file('extdata', 'S288C-borders.bed', package = 'HiCExperiment') |&gt; \n    import()\ncowplot::plot_grid(\n    plotMatrix(hic2, caption = FALSE),\n    plotMatrix(hic2, use.scores = 'detrended', scale = 'linear', limits = c(-1, 1), caption = FALSE),\n    plotMatrix(\n        hic2, \n        use.scores = 'detrended.despeckled', \n        scale = 'linear', \n        limits = c(-1, 1), \n        caption = FALSE, \n        loops = loops, \n        borders = borders\n    ),\n    nrow = 1\n)\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nNoteScale for despeckled scores\n\n\n\ndespeckled scores are in the same scale than the scores they were computed from.",
    "crumbs": [
      "In-depth Hi-C analysis",
      "<span class='chapter-number'>5</span>  <span class='chapter-title'>Matrix-centric analysis</span>"
    ]
  },
  {
    "objectID": "pages/matrix-centric.html#operations-between-multiple-matrices",
    "href": "pages/matrix-centric.html#operations-between-multiple-matrices",
    "title": "\n5  Matrix-centric analysis\n",
    "section": "\n5.2 Operations between multiple matrices",
    "text": "5.2 Operations between multiple matrices\n\n5.2.1 Merging maps\nHi-C libraries are often sequenced in multiple rounds, for example when high genome coverage is required. This results in multiple contact matrix files being generated. The merge function can be used to bind several HiCExperiment objects into a single one.\nThe different HiCExperiment objects do not need to all have identical regions, as shown in the following example.\n\nhic_sub1 &lt;- subsetByOverlaps(hic, GRanges(\"II:100001-200000\"))\nhic_sub2 &lt;- subsetByOverlaps(hic, GRanges(\"II:300001-400000\"))\nbound_hic &lt;- merge(hic_sub1, hic_sub2)\nplotMatrix(bound_hic)\n\n\n\n\n\n\n\n\n5.2.2 Computing ratio between two maps\nComparing two Hi-C maps can be useful to infer which genomic loci are differentially interacting between experimental conditions. Comparing two HiCExperiment objects can be done in R using the divide function.\nFor example, we can divide the eco1 mutant Hi-C data by wild-type Hi-C dataset using the divide function.\n\nhic_eco1 &lt;- import(\n    CoolFile(HiContactsData('yeast_eco1', 'mcool')), \n    focus = 'II', \n    resolution = 2000\n)\n##  see ?HiContactsData and browseVignettes('HiContactsData') for documentation\n##  loading from cache\n\n\ndiv_contacts &lt;- divide(hic_eco1, by = hic) \ndiv_contacts\n##  `HiCExperiment` object with 996,154 contacts over 407 regions \n##  -------\n##  fileName: N/A \n##  focus: \"II\" \n##  resolutions(1): 2000\n##  active resolution: 2000 \n##  interactions: 60894 \n##  scores(6): count.x balanced.x count.by balanced.by balanced.fc balanced.l2fc \n##  topologicalFeatures: () \n##  pairsFile: N/A \n##  metadata(2): hce_list operation\n\nWe can visually compare wild-type and eco1 maps side by side (left) and their ratio map (right). This highlights the depletion of short-range and increase of long-range interactions in the eco1 dataset.\n\ncowplot::plot_grid(\n    plotMatrix(hic_eco1, compare.to = hic, limits = c(-4, -1)), \n    plotMatrix(\n        div_contacts, \n        use.scores = 'balanced.fc', \n        scale = 'log2', \n        limits = c(-1, 1),\n        cmap = bwrColors()\n    )\n)\n##  [1] \"/home/biocbuild/.cache/R/ExperimentHub/b9a3f464a3cfc_7754 | /home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752\"",
    "crumbs": [
      "In-depth Hi-C analysis",
      "<span class='chapter-number'>5</span>  <span class='chapter-title'>Matrix-centric analysis</span>"
    ]
  },
  {
    "objectID": "pages/interactions-centric.html",
    "href": "pages/interactions-centric.html",
    "title": "\n6  Interactions-centric analysis\n",
    "section": "",
    "text": "6.1 Distance law(s)\nInteraction-centric analyses consider a HiCExperiment object from the “interactions” perspective to perform a range of operations on genomic interactions.\nThis encompasses:",
    "crumbs": [
      "In-depth Hi-C analysis",
      "<span class='chapter-number'>6</span>  <span class='chapter-title'>Interactions-centric analysis</span>"
    ]
  },
  {
    "objectID": "pages/interactions-centric.html#distance-laws",
    "href": "pages/interactions-centric.html#distance-laws",
    "title": "\n6  Interactions-centric analysis\n",
    "section": "",
    "text": "6.1.1 P(s) from a single .pairs file\nDistance laws are generally computed directly from .pairs files. This is because the .pairs files are at 1-bp resolution whereas the contact matrices (for example from .cool files) are binned at a minimum resolution.\nAn example .pairs file can be fetched from the ExperimentHub database using the HiContactsData package.\n\nlibrary(HiCExperiment)\nlibrary(HiContactsData)\npairsf &lt;- HiContactsData('yeast_wt', 'pairs.gz')\n##  see ?HiContactsData and browseVignettes('HiContactsData') for documentation\n##  loading from cache\npf &lt;- PairsFile(pairsf)\n\n\npf\n##  PairsFile object\n##  resource: /home/biocbuild/.cache/R/ExperimentHub/38d2025bce7760_7753\n\nIf needed, PairsFile connections can be imported directly into a GInteractions object with import().\n\nimport(pf)\n##  GInteractions object with 471364 interactions and 3 metadata columns:\n##             seqnames1   ranges1     seqnames2   ranges2 |     frag1     frag2\n##                 &lt;Rle&gt; &lt;IRanges&gt;         &lt;Rle&gt; &lt;IRanges&gt; | &lt;numeric&gt; &lt;numeric&gt;\n##         [1]        II       105 ---        II     48548 |      1358      1681\n##         [2]        II       113 ---        II     45003 |      1358      1658\n##         [3]        II       119 ---        II    687251 |      1358      5550\n##         [4]        II       160 ---        II     26124 |      1358      1510\n##         [5]        II       169 ---        II     39052 |      1358      1613\n##         ...       ...       ... ...       ...       ... .       ...       ...\n##    [471360]        II    808605 ---        II    809683 |      6316      6320\n##    [471361]        II    808609 ---        II    809917 |      6316      6324\n##    [471362]        II    808617 ---        II    809506 |      6316      6319\n##    [471363]        II    809447 ---        II    809685 |      6319      6321\n##    [471364]        II    809472 ---        II    809675 |      6319      6320\n##              distance\n##             &lt;integer&gt;\n##         [1]     48443\n##         [2]     44890\n##         [3]    687132\n##         [4]     25964\n##         [5]     38883\n##         ...       ...\n##    [471360]      1078\n##    [471361]      1308\n##    [471362]       889\n##    [471363]       238\n##    [471364]       203\n##    -------\n##    regions: 549331 ranges and 0 metadata columns\n##    seqinfo: 17 sequences from an unspecified genome\n\nWe can compute a P(s) per chromosome from this .pairs file using the distanceLaw function.\n\nlibrary(HiContacts)\nps &lt;- distanceLaw(pf, by_chr = TRUE) \n##  Importing pairs file /home/biocbuild/.cache/R/ExperimentHub/38d2025bce7760_7753 in memory. This may take a while...\nps\n##  # A tibble: 115 × 6\n##    chr   binned_distance          p     norm_p norm_p_unity slope\n##    &lt;chr&gt;           &lt;dbl&gt;      &lt;dbl&gt;      &lt;dbl&gt;        &lt;dbl&gt; &lt;dbl&gt;\n##  1 II                 14 0.00000212 0.00000106         2.27  0   \n##  2 II                 16 0.0000170  0.0000170         36.4   1.56\n##  3 II                 17 0.0000361  0.0000180         38.6   1.55\n##  4 II                 19 0.0000424  0.0000212         45.5   1.55\n##  5 II                 21 0.0000467  0.0000233         50.0   1.54\n##  6 II                 23 0.0000870  0.0000290         62.1   1.53\n##  # ℹ 109 more rows\n\nThe plotPs() and plotPsSlope() functions are convenient ggplot2-based functions with pre-configured settings optimized for P(s) visualization.\n\nlibrary(ggplot2)\nplotPs(ps, aes(x = binned_distance, y = norm_p, color = chr))\n##  Warning: Removed 67 rows containing missing values or values outside the scale range\n##  (`geom_line()`).\n\n\n\n\n\n\nplotPsSlope(ps, aes(x = binned_distance, y = slope, color = chr))\n##  Warning: Removed 67 rows containing missing values or values outside the scale range\n##  (`geom_line()`).\n\n\n\n\n\n\n\n\n6.1.2 P(s) for multiple .pairs files\nLet’s first import a second example dataset. We’ll import pairs identified in a eco1 yeast mutant.\n\neco1_pairsf &lt;- HiContactsData('yeast_eco1', 'pairs.gz')\n##  see ?HiContactsData and browseVignettes('HiContactsData') for documentation\n##  loading from cache\neco1_pf &lt;- PairsFile(eco1_pairsf)\n\n\neco1_ps &lt;- distanceLaw(eco1_pf, by_chr = TRUE) \n##  Importing pairs file /home/biocbuild/.cache/R/ExperimentHub/b9a3f450e2818_7755 in memory. This may take a while...\neco1_ps\n##  # A tibble: 115 × 6\n##    chr   binned_distance          p     norm_p norm_p_unity slope\n##    &lt;chr&gt;           &lt;dbl&gt;      &lt;dbl&gt;      &lt;dbl&gt;        &lt;dbl&gt; &lt;dbl&gt;\n##  1 II                 14 0.00000201 0.00000100        0.660  0   \n##  2 II                 16 0.0000221  0.0000221        14.5    1.46\n##  3 II                 17 0.0000492  0.0000246        16.2    1.46\n##  4 II                 19 0.0000412  0.0000206        13.5    1.45\n##  5 II                 21 0.0000653  0.0000326        21.5    1.45\n##  6 II                 23 0.0000803  0.0000268        17.6    1.44\n##  # ℹ 109 more rows\n\nA little data wrangling can help plotting the distance laws for 2 different samples in the same plot.\n\nlibrary(dplyr)\nmerged_ps &lt;- rbind(\n    ps |&gt; mutate(sample = 'WT'), \n    eco1_ps |&gt; mutate(sample = 'eco1')\n)\nplotPs(merged_ps, aes(x = binned_distance, y = norm_p, color = sample, linetype = chr)) + \n    scale_color_manual(values = c('#c6c6c6', '#ca0000'))\n##  Warning: Removed 134 rows containing missing values or values outside the scale range\n##  (`geom_line()`).\n\n\n\n\n\n\nplotPsSlope(merged_ps, aes(x = binned_distance, y = slope, color = sample, linetype = chr)) + \n    scale_color_manual(values = c('#c6c6c6', '#ca0000'))\n##  Warning: Removed 135 rows containing missing values or values outside the scale range\n##  (`geom_line()`).\n\n\n\n\n\n\n\n\n6.1.3 P(s) from HiCExperiment objects\nAlternatively, distance laws can be computed from binned matrices directly by providing HiCExperiment objects. For deeply sequenced datasets, this can be significantly faster than when using original .pairs files, but the smoothness of the resulting curves will be greatly impacted, notably at short distances.\n\nps_from_hic &lt;- distanceLaw(hic, by_chr = TRUE) \n##  pairsFile not specified. The P(s) curve will be an approximation.\nplotPs(ps_from_hic, aes(x = binned_distance, y = norm_p))\n##  Warning: Removed 9 rows containing missing values or values outside the scale range\n##  (`geom_line()`).\n\n\n\n\n\n\nplotPsSlope(ps_from_hic, aes(x = binned_distance, y = slope))\n##  Warning: Removed 8 rows containing missing values or values outside the scale range\n##  (`geom_line()`).",
    "crumbs": [
      "In-depth Hi-C analysis",
      "<span class='chapter-number'>6</span>  <span class='chapter-title'>Interactions-centric analysis</span>"
    ]
  },
  {
    "objectID": "pages/interactions-centric.html#cistrans-ratios",
    "href": "pages/interactions-centric.html#cistrans-ratios",
    "title": "\n6  Interactions-centric analysis\n",
    "section": "\n6.2 Cis/trans ratios",
    "text": "6.2 Cis/trans ratios\nThe ratio between cis interactions and trans interactions is often used to assess the overall quality of a Hi-C dataset. It can be computed per chromosome using the cisTransRatio() function. You will need to provide a genome-wide HiCExperiment to estimate cis/trans ratios!\n\nfull_hic &lt;- import(cf, resolution = 2000)\nct &lt;- cisTransRatio(full_hic) \nct\n##  # A tibble: 16 × 6\n##  # Groups:   chr [16]\n##    chr       cis  trans n_total cis_pct trans_pct\n##    &lt;fct&gt;   &lt;dbl&gt;  &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;     &lt;dbl&gt;\n##  1 I      186326  96738  283064   0.658     0.342\n##  2 II     942728 273966 1216694   0.775     0.225\n##  3 III    303980 127087  431067   0.705     0.295\n##  4 IV    1858062 418218 2276280   0.816     0.184\n##  5 V      607090 220873  827963   0.733     0.267\n##  6 VI     280282 127771  408053   0.687     0.313\n##  # ℹ 10 more rows\n\nIt can be plotted using ggplot2-based visualization functions.\n\nggplot(ct, aes(x = chr, y = cis_pct)) + \n    geom_col(position = position_stack()) + \n    theme_bw() + \n    guides(x=guide_axis(angle = 90)) + \n    scale_y_continuous(labels = scales::percent) + \n    labs(x = 'Chromosomes', y = '% of cis contacts')\n\n\n\n\n\n\n\nCis/trans contact ratios will greatly vary depending on the cell cycle phase the sample is in! For instance, chromosomes during the mitosis phase of the cell cycle have very little trans contacts, due to their structural organization and individualization.",
    "crumbs": [
      "In-depth Hi-C analysis",
      "<span class='chapter-number'>6</span>  <span class='chapter-title'>Interactions-centric analysis</span>"
    ]
  },
  {
    "objectID": "pages/interactions-centric.html#virtual-4c-profiles",
    "href": "pages/interactions-centric.html#virtual-4c-profiles",
    "title": "\n6  Interactions-centric analysis\n",
    "section": "\n6.3 Virtual 4C profiles",
    "text": "6.3 Virtual 4C profiles\nInteraction profile of a genomic locus of interest with its surrounding environment or the rest of the genome is frequently generated. In some cases, this can help in identifying and/or comparing regulatory or structural interactions.\nFor instance, we can compute the genome-wide virtual 4C profile of interactions anchored at the centromere in chromosome II (located at ~ 238kb).\n\nlibrary(GenomicRanges)\nv4C &lt;- virtual4C(full_hic, viewpoint = GRanges(\"II:230001-240000\"))\nv4C\n##  GRanges object with 6045 ranges and 4 metadata columns:\n##           seqnames        ranges strand |       score    center in_viewpoint\n##              &lt;Rle&gt;     &lt;IRanges&gt;  &lt;Rle&gt; |   &lt;numeric&gt; &lt;numeric&gt;    &lt;logical&gt;\n##       [1]        I        1-2000      * |  0.00000000    1000.5        FALSE\n##       [2]        I     2001-4000      * |  0.00000000    3000.5        FALSE\n##       [3]        I     4001-6000      * |  0.00129049    5000.5        FALSE\n##       [4]        I     6001-8000      * |  0.00000000    7000.5        FALSE\n##       [5]        I    8001-10000      * |  0.00000000    9000.5        FALSE\n##       ...      ...           ...    ... .         ...       ...          ...\n##    [6041]      XVI 940001-942000      * | 0.000775721    941000        FALSE\n##    [6042]      XVI 942001-944000      * | 0.000000000    943000        FALSE\n##    [6043]      XVI 944001-946000      * | 0.000000000    945000        FALSE\n##    [6044]      XVI 946001-948000      * | 0.000000000    947000        FALSE\n##    [6045]      XVI 948001-948066      * | 0.000000000    948034        FALSE\n##                  viewpoint\n##                &lt;character&gt;\n##       [1] II:230001-240000\n##       [2] II:230001-240000\n##       [3] II:230001-240000\n##       [4] II:230001-240000\n##       [5] II:230001-240000\n##       ...              ...\n##    [6041] II:230001-240000\n##    [6042] II:230001-240000\n##    [6043] II:230001-240000\n##    [6044] II:230001-240000\n##    [6045] II:230001-240000\n##    -------\n##    seqinfo: 16 sequences from an unspecified genome; no seqlengths\n\nggplot2 can be used to visualize the 4C-like profile over multiple chromosomes.\n\n\ndf &lt;- as_tibble(v4C)\nggplot(df, aes(x = center, y = score)) + \n    geom_area(position = \"identity\", alpha = 0.5) + \n    theme_bw() + \n    labs(x = \"Position\", y = \"Contacts with viewpoint\") +\n    scale_x_continuous(labels = scales::unit_format(unit = \"M\", scale = 1e-06)) + \n    facet_wrap(~seqnames, scales = 'free_y')\n\n\n\n\n\n\n\n\nThis clearly highlights trans interactions of the chromosome II centromere with the centromeres from other chromosomes.",
    "crumbs": [
      "In-depth Hi-C analysis",
      "<span class='chapter-number'>6</span>  <span class='chapter-title'>Interactions-centric analysis</span>"
    ]
  },
  {
    "objectID": "pages/interactions-centric.html#scalograms",
    "href": "pages/interactions-centric.html#scalograms",
    "title": "\n6  Interactions-centric analysis\n",
    "section": "\n6.4 Scalograms",
    "text": "6.4 Scalograms\nScalograms were introduced in Lioy et al. (2018) to investigate distance-dependent contact frequencies for individual genomic bins along chromosomes.\nTo generate a scalogram, one needs to provide a HiCExperiment object with a valid associated pairsFile.\n\npairsFile(hic) &lt;- pairsf\nscalo &lt;- scalogram(hic) \n##  Importing pairs file /home/biocbuild/.cache/R/ExperimentHub/38d2025bce7760_7753 in memory. This may take a while...\nplotScalogram(scalo |&gt; filter(chr == 'II'), ylim = c(1e3, 1e5))\n\n\n\n\n\n\n\nSeveral scalograms can be plotted together to compare distance-dependent contact frequencies along a given chromosome in different samples.\n\n\neco1_hic &lt;- import(\n    CoolFile(HiContactsData('yeast_eco1', 'mcool')), \n    focus = 'II', \n    resolution = 2000\n)\n##  see ?HiContactsData and browseVignettes('HiContactsData') for documentation\n##  Error while performing HEAD request.\n##     Proceeding without cache information.\n##  loading from cache\neco1_pairsf &lt;- HiContactsData('yeast_eco1', 'pairs.gz')\n##  see ?HiContactsData and browseVignettes('HiContactsData') for documentation\n##  loading from cache\npairsFile(eco1_hic) &lt;- eco1_pairsf\neco1_scalo &lt;- scalogram(eco1_hic) \n##  Importing pairs file /home/biocbuild/.cache/R/ExperimentHub/b9a3f450e2818_7755 in memory. This may take a while...\nmerged_scalo &lt;- rbind(\n    scalo |&gt; mutate(sample = 'WT'), \n    eco1_scalo |&gt; mutate(sample = 'eco1')\n)\nplotScalogram(merged_scalo |&gt; filter(chr == 'II'), ylim = c(1e3, 1e5)) + \n    facet_grid(~sample)\n\n\n\n\n\n\n\n\nThis example points out the overall longer interactions within the long arm of the chromosome II in an eco1 mutant.",
    "crumbs": [
      "In-depth Hi-C analysis",
      "<span class='chapter-number'>6</span>  <span class='chapter-title'>Interactions-centric analysis</span>"
    ]
  },
  {
    "objectID": "pages/topological-features.html",
    "href": "pages/topological-features.html",
    "title": "\n7  Finding topological features in Hi-C\n",
    "section": "",
    "text": "7.1 Chromosome compartments\nChromosome compartments refer to the segregation of the chromatin into active euchromatin (A compartments) and regulated heterochromatin (B compartment).",
    "crumbs": [
      "In-depth Hi-C analysis",
      "<span class='chapter-number'>7</span>  <span class='chapter-title'>Finding topological features in Hi-C</span>"
    ]
  },
  {
    "objectID": "pages/topological-features.html#chromosome-compartments",
    "href": "pages/topological-features.html#chromosome-compartments",
    "title": "\n7  Finding topological features in Hi-C\n",
    "section": "",
    "text": "7.1.1 Importing Hi-C data\nTo investigate chromosome compartments, we will fetch a contact matrix generated from a micro-C experiment (from Krietenstein et al. (2020)). A subset of the genome-wide dataset is provided in the HiContactsData package. It contains intra-chromosomal interactions within chr17, binned at 5000, 100000 and 250000 bp.\n\nlibrary(HiCExperiment)\nlibrary(HiContactsData)\ncf &lt;- CoolFile(HiContactsData('microC', 'mcool'))\n##  see ?HiContactsData and browseVignettes('HiContactsData') for documentation\n##  loading from cache\nmicroC &lt;- import(cf, resolution = 250000)\nmicroC\n##  `HiCExperiment` object with 10,086,710 contacts over 334 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/ExperimentHub/34889a75064968_8601\" \n##  focus: \"whole genome\" \n##  resolutions(3): 5000 100000 250000\n##  active resolution: 250000 \n##  interactions: 52755 \n##  scores(2): count balanced \n##  topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n##  pairsFile: N/A \n##  metadata(0):\n\nseqinfo(microC)\n##  Seqinfo object with 1 sequence from an unspecified genome:\n##    seqnames seqlengths isCircular genome\n##    chr17      83257441         NA   &lt;NA&gt;\n\n\n7.1.2 Annotating A/B compartments\nThe consensus approach to annotate A/B compartments is to compute the eigenvectors of a Hi-C contact matrix and identify the eigenvector representing the chromosome-wide bi-partite segmentation of the genome.\nThe getCompartments() function performs several internal operations to achieve this:\n\nObtains cis interactions per chromosome\nComputes O/E contact matrix scores\nComputes 3 first eigenvectors of this Hi-C contact matrix\nNormalizes eigenvectors\nPicks the eigenvector that has the greatest absolute correlation with a phasing track (e.g. a GC% track automatically computed from a genome reference sequence, or a gene density track)\nSigns this eigenvector so that positive values represent the A compartment\n\n\nphasing_track &lt;- BSgenome.Hsapiens.UCSC.hg38::BSgenome.Hsapiens.UCSC.hg38\nmicroC_compts &lt;- getCompartments(microC, genome = phasing_track)\n##  Going through preflight checklist...\n##  Parsing intra-chromosomal contacts for each chromosome...\n##  Computing eigenvectors for each chromosome...\n\nmicroC_compts\n##  `HiCExperiment` object with 10,086,710 contacts over 334 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/ExperimentHub/34889a75064968_8601\" \n##  focus: \"whole genome\" \n##  resolutions(3): 5000 100000 250000\n##  active resolution: 250000 \n##  interactions: 52755 \n##  scores(2): count balanced \n##  topologicalFeatures: compartments(41) borders(0) loops(0) viewpoints(0) \n##  pairsFile: N/A \n##  metadata(1): eigens\n\ngetCompartments() is an endomorphism: it returns the original object, enriched with two new pieces of information:\n\nA compartments topologicalFeatures:\n\n\ntopologicalFeatures(microC_compts, \"compartments\")\n##  GRanges object with 41 ranges and 1 metadata column:\n##         seqnames            ranges strand | compartment\n##            &lt;Rle&gt;         &lt;IRanges&gt;  &lt;Rle&gt; | &lt;character&gt;\n##     [1]    chr17    250001-3000000      * |           A\n##     [2]    chr17   3000001-3500000      * |           B\n##     [3]    chr17   3500001-5500000      * |           A\n##     [4]    chr17   5500001-6500000      * |           B\n##     [5]    chr17   6500001-8500000      * |           A\n##     ...      ...               ...    ... .         ...\n##    [37]    chr17 72750001-73250000      * |           A\n##    [38]    chr17 73250001-74750000      * |           B\n##    [39]    chr17 74750001-79250000      * |           A\n##    [40]    chr17 79250001-79750000      * |           B\n##    [41]    chr17 79750001-83250000      * |           A\n##    -------\n##    seqinfo: 1 sequence from an unspecified genome\n\n\nThe calculated eigenvectors stored in metadata:\n\n\nmetadata(microC_compts)$eigens\n##  GRanges object with 334 ranges and 9 metadata columns:\n##                                  seqnames            ranges strand |\n##                                     &lt;Rle&gt;         &lt;IRanges&gt;  &lt;Rle&gt; |\n##             chr17.chr17_1_250000    chr17          1-250000      * |\n##        chr17.chr17_250001_500000    chr17     250001-500000      * |\n##        chr17.chr17_500001_750000    chr17     500001-750000      * |\n##       chr17.chr17_750001_1000000    chr17    750001-1000000      * |\n##      chr17.chr17_1000001_1250000    chr17   1000001-1250000      * |\n##                              ...      ...               ...    ... .\n##    chr17.chr17_82250001_82500000    chr17 82250001-82500000      * |\n##    chr17.chr17_82500001_82750000    chr17 82500001-82750000      * |\n##    chr17.chr17_82750001_83000000    chr17 82750001-83000000      * |\n##    chr17.chr17_83000001_83250000    chr17 83000001-83250000      * |\n##    chr17.chr17_83250001_83257441    chr17 83250001-83257441      * |\n##                                     bin_id     weight   chr    center\n##                                  &lt;numeric&gt;  &lt;numeric&gt; &lt;Rle&gt; &lt;integer&gt;\n##             chr17.chr17_1_250000         0        NaN chr17    125000\n##        chr17.chr17_250001_500000         1 0.00626903 chr17    375000\n##        chr17.chr17_500001_750000         2 0.00567190 chr17    625000\n##       chr17.chr17_750001_1000000         3 0.00528588 chr17    875000\n##      chr17.chr17_1000001_1250000         4 0.00464628 chr17   1125000\n##                              ...       ...        ...   ...       ...\n##    chr17.chr17_82250001_82500000       329 0.00463044 chr17  82375000\n##    chr17.chr17_82500001_82750000       330 0.00486910 chr17  82625000\n##    chr17.chr17_82750001_83000000       331 0.00561269 chr17  82875000\n##    chr17.chr17_83000001_83250000       332 0.00546433 chr17  83125000\n##    chr17.chr17_83250001_83257441       333        NaN chr17  83253721\n##                                         E1        E2        E3   phasing\n##                                  &lt;numeric&gt; &lt;numeric&gt; &lt;numeric&gt; &lt;numeric&gt;\n##             chr17.chr17_1_250000  0.000000  0.000000  0.000000  0.383084\n##        chr17.chr17_250001_500000  0.450991  0.653287  0.615300  0.433972\n##        chr17.chr17_500001_750000  0.716784  0.707461  0.845033  0.465556\n##       chr17.chr17_750001_1000000  0.904423  0.414952  0.864288  0.503592\n##      chr17.chr17_1000001_1250000  0.913023  0.266287  0.759016  0.547712\n##                              ...       ...       ...       ...       ...\n##    chr17.chr17_82250001_82500000  1.147060  0.239112  1.133498  0.550872\n##    chr17.chr17_82500001_82750000  1.106937  0.419647  1.169464  0.513212\n##    chr17.chr17_82750001_83000000  0.818990  0.591955  0.850340  0.522432\n##    chr17.chr17_83000001_83250000  0.874038  0.503175  0.847926  0.528448\n##    chr17.chr17_83250001_83257441  0.000000  0.000000  0.000000  0.000000\n##                                      eigen\n##                                  &lt;numeric&gt;\n##             chr17.chr17_1_250000  0.000000\n##        chr17.chr17_250001_500000  0.450991\n##        chr17.chr17_500001_750000  0.716784\n##       chr17.chr17_750001_1000000  0.904423\n##      chr17.chr17_1000001_1250000  0.913023\n##                              ...       ...\n##    chr17.chr17_82250001_82500000  1.147060\n##    chr17.chr17_82500001_82750000  1.106937\n##    chr17.chr17_82750001_83000000  0.818990\n##    chr17.chr17_83000001_83250000  0.874038\n##    chr17.chr17_83250001_83257441  0.000000\n##    -------\n##    seqinfo: 1 sequence from an unspecified genome\n\n\n7.1.3 Exporting compartment tracks\nTo save the eigenvector (as a bigwig file) and the compartments(as a gff file), the export function can be used:\n\nlibrary(GenomicRanges)\nlibrary(rtracklayer)\ncoverage(metadata(microC_compts)$eigens, weight = 'eigen') |&gt; export('microC_eigen.bw')\ntopologicalFeatures(microC_compts, \"compartments\") |&gt; export('microC_compartments.gff3')\n\n\n7.1.4 Visualizing compartment tracks\nCompartment tracks should be visualized in a dedicated genome browser, with the phasing track loaded as well, to ensure they are phased accordingly.\nThat being said, it is possible to visualize a genome track in R besides the matching Hi-C contact matrix.\n\nlibrary(ggplot2)\nlibrary(patchwork)\nmicroC &lt;- autocorrelate(microC)\np1 &lt;- plotMatrix(microC, use.scores = 'autocorrelated', scale = 'linear', limits = c(-1, 1), caption = FALSE)\neigen &lt;- coverage(metadata(microC_compts)$eigens, weight = 'eigen')[[1]]\neigen_df &lt;- tibble(pos = cumsum(runLength(eigen)), eigen = runValue(eigen))\np2 &lt;- ggplot(eigen_df, aes(x = pos, y = eigen)) + \n    geom_area() + \n    theme_void() + \n    coord_cartesian(expand = FALSE) + \n    labs(x = \"Genomic position\", y = \"Eigenvector value\")\nwrap_plots(p1, p2, ncol = 1, heights = c(10, 1))\n\n\n\n\n\n\n\nHere, we clearly note the concordance between the Hi-C correlation matrix, highlighting correlated interactions between pairs of genomic segments, and the eigenvector representing chromosome segmentation into 2 compartments: A (for positive values) and B (for negative values).\n\n7.1.5 Saddle plots\nSaddle plots are typically used to measure the observed vs. expected interaction scores within or between genomic loci belonging to A and B compartments.\nNon-overlapping genomic windows are grouped in nbins quantiles (typically between 10 and 50 quantiles) according to their A/B compartment eigenvector value, from lowest eigenvector values (i.e. strongest B compartments) to highest eigenvector values (i.e. strongest A compartments). The average observed vs. expected interaction scores are then computed for pairwise eigenvector quantiles and plotted in a 2D heatmap.\n\nlibrary(BiocParallel)\nplotSaddle(microC_compts, nbins = 25, BPPARAM = SerialParam(progressbar = FALSE))\n\n\n\n\n\n\n\nHere, the top-left small corner represents average O/E scores between strong B compartments and the bottom-right larger corner represents average O/E scores between strong A compartments. Note that only chr17 interactions are contained in this dataset, explaining the grainy aspect of the saddle plot.",
    "crumbs": [
      "In-depth Hi-C analysis",
      "<span class='chapter-number'>7</span>  <span class='chapter-title'>Finding topological features in Hi-C</span>"
    ]
  },
  {
    "objectID": "pages/topological-features.html#topological-domains",
    "href": "pages/topological-features.html#topological-domains",
    "title": "\n7  Finding topological features in Hi-C\n",
    "section": "\n7.2 Topological domains",
    "text": "7.2 Topological domains\nTopological domains (a.k.a. Topologically Associating Domains, TADs, isolated neighborhoods, contact domains, …) refer to local chromosomal segments (e.b. roughly ≤ 1Mb in mammal genomes) which preferentially self-interact, in a constrained manner. They are demarcated by domain boundaries.\n\n\n\n\nThey are generally conserved across cell types and species (Schmitt et al. (2016)), typically correlate with units of DNA replication (Pope et al. (2014)), and could play a role during development (Stadhouders et al. (2019)).\n\n7.2.1 Computing diamond insulation score\nSeveral approaches exist to annotate topological domains (Sefer (2022)). Several packages in R implement some of these functionalities, e.g. spectralTAD or TADcompare.\nHiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution.\n\n# - Compute insulation score\nbpparam &lt;- SerialParam(progressbar = FALSE)\nhic &lt;- zoom(microC, 5000) |&gt; \n    refocus('chr17:60000001-83257441') |&gt;\n    getDiamondInsulation(window_size = 100000, BPPARAM = bpparam) |&gt; \n    getBorders()\n##  Going through preflight checklist...\n##  Scan each window and compute diamond insulation score...\n##  Annotating diamond score prominence for each window...\n\nhic\n##  `HiCExperiment` object with 2,156,222 contacts over 4,652 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/ExperimentHub/34889a75064968_8601\" \n##  focus: \"chr17:60,000,001-83,257,441\" \n##  resolutions(3): 5000 100000 250000\n##  active resolution: 5000 \n##  interactions: 2156044 \n##  scores(2): count balanced \n##  topologicalFeatures: compartments(0) borders(21) loops(0) viewpoints(0) \n##  pairsFile: N/A \n##  metadata(1): insulation\n\ngetDiamondInsulation() is an endomorphism: it returns the original object, enriched with two new pieces of information:\n\nA borders topologicalFeatures:\n\n\ntopologicalFeatures(hic, \"borders\")\n##  GRanges object with 21 ranges and 1 metadata column:\n##           seqnames            ranges strand |     score\n##              &lt;Rle&gt;         &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt;\n##    strong    chr17 60105001-60110000      * |  0.574760\n##      weak    chr17 60210001-60215000      * |  0.414425\n##      weak    chr17 61415001-61420000      * |  0.346668\n##    strong    chr17 61500001-61505000      * |  0.544336\n##      weak    chr17 62930001-62935000      * |  0.399794\n##       ...      ...               ...    ... .       ...\n##      weak    chr17 78395001-78400000      * |  0.235613\n##      weak    chr17 79065001-79070000      * |  0.236535\n##      weak    chr17 80155001-80160000      * |  0.284855\n##      weak    chr17 81735001-81740000      * |  0.497478\n##    strong    chr17 81840001-81845000      * |  1.395949\n##    -------\n##    seqinfo: 1 sequence from an unspecified genome\n\n\nThe calculated insulation scores stored in metadata:\n\n\nmetadata(hic)$insulation\n##  GRanges object with 4611 ranges and 8 metadata columns:\n##                            seqnames            ranges strand |    bin_id\n##                               &lt;Rle&gt;         &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt;\n##    chr17_60100001_60105000    chr17 60100001-60105000      * |     12020\n##    chr17_60105001_60110000    chr17 60105001-60110000      * |     12021\n##    chr17_60110001_60115000    chr17 60110001-60115000      * |     12022\n##    chr17_60115001_60120000    chr17 60115001-60120000      * |     12023\n##    chr17_60120001_60125000    chr17 60120001-60125000      * |     12024\n##                        ...      ...               ...    ... .       ...\n##    chr17_83130001_83135000    chr17 83130001-83135000      * |     16626\n##    chr17_83135001_83140000    chr17 83135001-83140000      * |     16627\n##    chr17_83140001_83145000    chr17 83140001-83145000      * |     16628\n##    chr17_83145001_83150000    chr17 83145001-83150000      * |     16629\n##    chr17_83150001_83155000    chr17 83150001-83155000      * |     16630\n##                               weight   chr    center     score insulation\n##                            &lt;numeric&gt; &lt;Rle&gt; &lt;integer&gt; &lt;numeric&gt;  &lt;numeric&gt;\n##    chr17_60100001_60105000 0.0406489 chr17  60102500  0.188061  -0.750142\n##    chr17_60105001_60110000 0.0255539 chr17  60107500  0.180860  -0.806466\n##    chr17_60110001_60115000       NaN chr17  60112500  0.196579  -0.686232\n##    chr17_60115001_60120000       NaN chr17  60117500  0.216039  -0.550046\n##    chr17_60120001_60125000       NaN chr17  60122500  0.230035  -0.459489\n##                        ...       ...   ...       ...       ...        ...\n##    chr17_83130001_83135000 0.0314684 chr17  83132500  0.262191  -0.270723\n##    chr17_83135001_83140000 0.0307197 chr17  83137500  0.240779  -0.393632\n##    chr17_83140001_83145000 0.0322810 chr17  83142500  0.219113  -0.529664\n##    chr17_83145001_83150000 0.0280840 chr17  83147500  0.199645  -0.663900\n##    chr17_83150001_83155000 0.0272775 chr17  83152500  0.180434  -0.809873\n##                                  min prominence\n##                            &lt;logical&gt;  &lt;numeric&gt;\n##    chr17_60100001_60105000     FALSE         NA\n##    chr17_60105001_60110000      TRUE    0.57476\n##    chr17_60110001_60115000     FALSE         NA\n##    chr17_60115001_60120000     FALSE         NA\n##    chr17_60120001_60125000     FALSE         NA\n##                        ...       ...        ...\n##    chr17_83130001_83135000     FALSE         NA\n##    chr17_83135001_83140000     FALSE         NA\n##    chr17_83140001_83145000     FALSE         NA\n##    chr17_83145001_83150000     FALSE         NA\n##    chr17_83150001_83155000     FALSE         NA\n##    -------\n##    seqinfo: 1 sequence from an unspecified genome\n\n\n\n\n\n\n\nNoteNote\n\n\n\nThe getDiamondInsulation function can be parallelized over multiple threads by specifying the Bioconductor generic BPPARAM argument.\n\n\n\n7.2.2 Exporting insulation scores tracks\nTo save the diamond insulation scores (as a bigwig file) and the borders (as a bed file), the export function can be used:\n\ncoverage(metadata(hic)$insulation, weight = 'insulation') |&gt; export('microC_insulation.bw')\ntopologicalFeatures(hic, \"borders\") |&gt; export('microC_borders.bed')\n\n\n7.2.3 Visualizing chromatin domains\nInsulation tracks should be visualized in a dedicated genome browser.\nThat being said, it is possible to visualize a genome track in R besides the matching Hi-C contact matrix.\n\nhic &lt;- zoom(hic, 100000)\np1 &lt;- plotMatrix(\n    hic, \n    use.scores = 'balanced', \n    limits = c(-3.5, -1),\n    borders = topologicalFeatures(hic, \"borders\"),\n    caption = FALSE\n)\ninsulation &lt;- coverage(metadata(hic)$insulation, weight = 'insulation')[[1]]\ninsulation_df &lt;- tibble(pos = cumsum(runLength(insulation)), insulation = runValue(insulation))\np2 &lt;- ggplot(insulation_df, aes(x = pos, y = insulation)) + \n    geom_area() + \n    theme_void() + \n    coord_cartesian(expand = FALSE) + \n    labs(x = \"Genomic position\", y = \"Diamond insulation score\")\nwrap_plots(p1, p2, ncol = 1, heights = c(10, 1))\n\n\n\n\n\n\n\nLocal minima in the diamond insulation score displayed below the Hi-C contact matrix are identified using the getBorders() function, which automatically estimates a minimum threshold. These local minima correspond to borders and are visually depicted on the Hi-C map by blue diamonds.",
    "crumbs": [
      "In-depth Hi-C analysis",
      "<span class='chapter-number'>7</span>  <span class='chapter-title'>Finding topological features in Hi-C</span>"
    ]
  },
  {
    "objectID": "pages/topological-features.html#chromatin-loops",
    "href": "pages/topological-features.html#chromatin-loops",
    "title": "\n7  Finding topological features in Hi-C\n",
    "section": "\n7.3 Chromatin loops",
    "text": "7.3 Chromatin loops\n\n7.3.1 chromosight\n\nChromatin loops, dots, or contacts, refer to a strong increase of interaction frequency between a pair of two genomic loci. They correspond to focal “dots” on a Hi-C map. Relying on computer vision algorithms, chromosight uses this property to annotate chromatin loops in a Hi-C map (Matthey-Doret et al. (2020)). chromosight is a standalone python package and is made available in R through the HiCool-managed conda environment with the getLoops() function.\n\n\n\n\n\n\nImportantImportant note:\n\n\n\nHiCool relies on basilisk R package to set up an underlying, self-managed python environment. Packages from this environment, including chromosight, are not yet available for ARM chips (e.g. M1/2/3 in newer on macbooks) or Windows. For this reason, HiCool-supported features are not available on these machines.\n\n\n\n7.3.1.1 Identifying loops\n\n## Due to HiCool limitations when rendering the book, this code is not executed here\nhic &lt;- HiCool::getLoops(microC, resolution = 5000)\n\n\n## Instead we load pre-computed data from a backed-up object\nhic_rds &lt;- system.file('extdata', 'microC_with-loops.rds', package = 'OHCA')\nhic &lt;- readRDS(hic_rds)\n\n\nhic\n##  `HiCExperiment` object with 2,103,634 contacts over 200 regions \n##  -------\n##  fileName: \"../4d434d8538a0_4DNFI9FVHJZQ_subset.mcool\" \n##  focus: \"chr17:62,500,001-63,500,000\" \n##  resolutions(1): 5000\n##  active resolution: 5000 \n##  interactions: 19667 \n##  scores(2): count balanced \n##  topologicalFeatures: loops(2419) \n##  pairsFile: N/A \n##  metadata(1): chromosight_args\n\ngetLoops() is an endomorphism: it returns the original object, enriched with two new pieces of information:\n\nA loops topologicalFeatures:\n\n\ntopologicalFeatures(hic, \"loops\")\n##  GInteractions object with 2419 interactions and 7 metadata columns:\n##           seqnames1           ranges1     seqnames2           ranges2 |\n##               &lt;Rle&gt;         &lt;IRanges&gt;         &lt;Rle&gt;         &lt;IRanges&gt; |\n##       [1]     chr17     150000-155000 ---     chr17     390000-395000 |\n##       [2]     chr17     145000-150000 ---     chr17     755000-760000 |\n##       [3]     chr17     145000-150000 ---     chr17   1050000-1055000 |\n##       [4]     chr17     145000-150000 ---     chr17     510000-515000 |\n##       [5]     chr17     150000-155000 ---     chr17     990000-995000 |\n##       ...       ...               ... ...       ...               ... .\n##    [2415]     chr17 82870000-82875000 ---     chr17 83075000-83080000 |\n##    [2416]     chr17 82880000-82885000 ---     chr17 82925000-82930000 |\n##    [2417]     chr17 82960000-82965000 ---     chr17 83080000-83085000 |\n##    [2418]     chr17 82975000-82980000 ---     chr17 83000000-83005000 |\n##    [2419]     chr17 83100000-83105000 ---     chr17 83200000-83205000 |\n##                bin1      bin2 kernel_id iteration     score    pvalue\n##           &lt;numeric&gt; &lt;numeric&gt; &lt;numeric&gt; &lt;numeric&gt; &lt;numeric&gt; &lt;numeric&gt;\n##       [1]    498194    498242         0         0  0.666651         0\n##       [2]    498193    498315         0         0  0.452903         0\n##       [3]    498193    498374         0         0  0.518936         0\n##       [4]    498193    498266         0         0  0.536020         0\n##       [5]    498194    498362         0         0  0.573763         0\n##       ...       ...       ...       ...       ...       ...       ...\n##    [2415]    514738    514779         0         0  0.478653   0.0e+00\n##    [2416]    514740    514749         0         0  0.369344   5.0e-10\n##    [2417]    514756    514780         0         0  0.690669   0.0e+00\n##    [2418]    514759    514764         0         0  0.374722   5.1e-09\n##    [2419]    514784    514804         0         0  0.768593   0.0e+00\n##              qvalue\n##           &lt;numeric&gt;\n##       [1]         0\n##       [2]         0\n##       [3]         0\n##       [4]         0\n##       [5]         0\n##       ...       ...\n##    [2415]     0e+00\n##    [2416]     6e-10\n##    [2417]     0e+00\n##    [2418]     6e-09\n##    [2419]     0e+00\n##    -------\n##    regions: 3169 ranges and 0 metadata columns\n##    seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\n\nThe arguments used by chromosight, stored in metadata:\n\n\nmetadata(hic)$chromosight_args\n##  $`--pattern`\n##  [1] \"loops\"\n##  \n##  $`--dump`\n##  [1] \"/tmp/RtmpUN0slk\"\n##  \n##  $`--inter`\n##  [1] FALSE\n##  \n##  $`--iterations`\n##  [1] \"auto\"\n##  \n##  $`--kernel-config`\n##  NULL\n##  \n##  $`--perc-zero`\n##  [1] \"auto\"\n##  \n##  $`--perc-undetected`\n##  [1] \"auto\"\n##  \n##  $`--tsvd`\n##  [1] FALSE\n##  \n##  $`--win-fmt`\n##  [1] \"json\"\n##  \n##  $`--win-size`\n##  [1] \"auto\"\n##  \n##  $`--no-plotting`\n##  [1] TRUE\n##  \n##  $`--smooth-trend`\n##  [1] FALSE\n##  \n##  $`--norm`\n##  [1] \"auto\"\n##  \n##  $`&lt;contact_map&gt;`\n##  [1] \"/root/.cache/R/fourDNData/913914662_4DNFI9FVHJZQ.mcool::/resolutions/5000\"\n##  \n##  $`--max-dist`\n##  [1] \"auto\"\n##  \n##  $`--min-dist`\n##  [1] \"auto\"\n##  \n##  $`--min-separation`\n##  [1] \"auto\"\n##  \n##  $`--n-mads`\n##  [1] 5\n##  \n##  $`&lt;prefix&gt;`\n##  [1] \"chromosight/chromo\"\n##  \n##  $`--pearson`\n##  [1] \"auto\"\n##  \n##  $`--subsample`\n##  [1] \"no\"\n##  \n##  $`--threads`\n##  [1] 1\n\n\n7.3.1.2 Importing loops from files\nIf you are using chromosight directly from the terminal (i.e. outside R), you can import the annotated loops in R as follows:\n\n## Change the `.tsv` file to the local output file from chromosight\nloops &lt;- system.file('extdata', 'chromo.tsv', package = 'OHCA') |&gt; \n    readr::read_tsv() |&gt; \n    plyinteractions::as_ginteractions(seqnames1 = chrom1, seqnames2 = chrom2)\n##  Rows: 2419 Columns: 13\n##  ── Column specification ─────────────────────────────────────────────────────\n##  Delimiter: \"\\t\"\n##  chr  (2): chrom1, chrom2\n##  dbl (11): start1, end1, start2, end2, bin1, bin2, kernel_id, iteration, s...\n##  \n##  ℹ Use `spec()` to retrieve the full column specification for this data.\n##  ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.\n\nloops\n##  GInteractions object with 2419 interactions and 7 metadata columns:\n##           seqnames1           ranges1 strand1     seqnames2           ranges2\n##               &lt;Rle&gt;         &lt;IRanges&gt;   &lt;Rle&gt;         &lt;Rle&gt;         &lt;IRanges&gt;\n##       [1]     chr17     150000-155000       * ---     chr17     390000-395000\n##       [2]     chr17     145000-150000       * ---     chr17     755000-760000\n##       [3]     chr17     145000-150000       * ---     chr17   1050000-1055000\n##       [4]     chr17     145000-150000       * ---     chr17     510000-515000\n##       [5]     chr17     150000-155000       * ---     chr17     990000-995000\n##       ...       ...               ...     ... ...       ...               ...\n##    [2415]     chr17 82870000-82875000       * ---     chr17 83075000-83080000\n##    [2416]     chr17 82880000-82885000       * ---     chr17 82925000-82930000\n##    [2417]     chr17 82960000-82965000       * ---     chr17 83080000-83085000\n##    [2418]     chr17 82975000-82980000       * ---     chr17 83000000-83005000\n##    [2419]     chr17 83100000-83105000       * ---     chr17 83200000-83205000\n##           strand2 |      bin1      bin2 kernel_id iteration     score\n##             &lt;Rle&gt; | &lt;numeric&gt; &lt;numeric&gt; &lt;numeric&gt; &lt;numeric&gt; &lt;numeric&gt;\n##       [1]       * |    498194    498242         0         0  0.666651\n##       [2]       * |    498193    498315         0         0  0.452903\n##       [3]       * |    498193    498374         0         0  0.518936\n##       [4]       * |    498193    498266         0         0  0.536020\n##       [5]       * |    498194    498362         0         0  0.573763\n##       ...     ... .       ...       ...       ...       ...       ...\n##    [2415]       * |    514738    514779         0         0  0.478653\n##    [2416]       * |    514740    514749         0         0  0.369344\n##    [2417]       * |    514756    514780         0         0  0.690669\n##    [2418]       * |    514759    514764         0         0  0.374722\n##    [2419]       * |    514784    514804         0         0  0.768593\n##              pvalue    qvalue\n##           &lt;numeric&gt; &lt;numeric&gt;\n##       [1]         0         0\n##       [2]         0         0\n##       [3]         0         0\n##       [4]         0         0\n##       [5]         0         0\n##       ...       ...       ...\n##    [2415]   0.0e+00     0e+00\n##    [2416]   5.0e-10     6e-10\n##    [2417]   0.0e+00     0e+00\n##    [2418]   5.1e-09     6e-09\n##    [2419]   0.0e+00     0e+00\n##    -------\n##    regions: 3169 ranges and 0 metadata columns\n##    seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\n\n7.3.1.3 Exporting chromatin loops\n\nloops &lt;- topologicalFeatures(hic, \"loops\")\nloops &lt;- loops[loops$score &gt;= 0.4 & loops$qvalue &lt;= 1e-6]\nGenomicInteractions::export.bedpe(loops, 'loops.bedpe')\n##  Warning in interactionCounts(x): 'counts' not in mcols of object; will\n##  return NULL\n\n\n7.3.1.4 Visualizing chromatin loops\n\nplotMatrix(\n    hic, \n    loops = loops,\n    limits = c(-4, -1.2),\n    caption = FALSE\n)\n\n\n\n\n\n\n\n\n7.3.2 Other R packages\nA number of other R packages have been developed to identify focal chromatin loops, notably fitHiC (Ay et al. (2014)), GOTHiC (Mifsud et al. (2017)) or idr2d (Krismer et al. (2020)). Each fits a slightly different purpose, and we encourage the end user to read companion publications.",
    "crumbs": [
      "In-depth Hi-C analysis",
      "<span class='chapter-number'>7</span>  <span class='chapter-title'>Finding topological features in Hi-C</span>"
    ]
  },
  {
    "objectID": "pages/disseminating.html",
    "href": "pages/disseminating.html",
    "title": "\n8  Data gateways: accessing public Hi-C data portals\n",
    "section": "",
    "text": "8.1 4DN data portal\nThe Hi-C experimental approach has gained significant traction across multiple fields related to genome biology, and several consortia developed large-scale programs based on this technique. The fourDNData and DNAZooData R packages were designed to accelerate the investigation of chromatin structure using these public resources.\nThe 4D Nucleome Data Coordination and Integration Center (DCIC) has developed and actively maintains a data portal providing public access to a wealth of resources to investigate 3D chromatin architecture. Notably, 3D chromatin conformation libraries relying on different technologies (“in situ” or “dilution” Hi-C, Capture Hi-C, Micro-C, DNase Hi-C, …), generated by 50+ collaborating labs, were homogeneously processed, yielding more than 350 sets of processed files.\nfourDNData (read 4DN-Data) is a package giving programmatic access to these uniformly processed Hi-C contact files.\nThe fourDNData() function provides a gateway to 4DN-hosted Hi-C files, including contact matrices (in .hic or .mcool) and other Hi-C derived files such as annotated compartments, domains, insulation scores, or .pairs files.\nlibrary(fourDNData)\nhead(fourDNData())\n##    experimentSetAccession     fileType     size organism experimentType\n##  1           4DNES18BMU79        pairs 10151.53    mouse   in situ Hi-C\n##  3           4DNES18BMU79          hic  5285.82    mouse   in situ Hi-C\n##  4           4DNES18BMU79        mcool  6110.75    mouse   in situ Hi-C\n##  5           4DNES18BMU79   boundaries     0.12    mouse   in situ Hi-C\n##  6           4DNES18BMU79   insulation     7.18    mouse   in situ Hi-C\n##  7           4DNES18BMU79 compartments     0.18    mouse   in situ Hi-C\n##    details                              dataset\n##  1   DpnII Hi-C on Mouse Olfactory System cells\n##  3   DpnII Hi-C on Mouse Olfactory System cells\n##  4   DpnII Hi-C on Mouse Olfactory System cells\n##  5   DpnII Hi-C on Mouse Olfactory System cells\n##  6   DpnII Hi-C on Mouse Olfactory System cells\n##  7   DpnII Hi-C on Mouse Olfactory System cells\n##                                                          condition\n##  1 Mature olfactory sensory neurons with conditional Ldb1 knockout\n##  3 Mature olfactory sensory neurons with conditional Ldb1 knockout\n##  4 Mature olfactory sensory neurons with conditional Ldb1 knockout\n##  5 Mature olfactory sensory neurons with conditional Ldb1 knockout\n##  6 Mature olfactory sensory neurons with conditional Ldb1 knockout\n##  7 Mature olfactory sensory neurons with conditional Ldb1 knockout\n##                  biosource biosourceType             publication\n##  1 olfactory receptor cell  primary cell Monahan K et al. (2019)\n##  3 olfactory receptor cell  primary cell Monahan K et al. (2019)\n##  4 olfactory receptor cell  primary cell Monahan K et al. (2019)\n##  5 olfactory receptor cell  primary cell Monahan K et al. (2019)\n##  6 olfactory receptor cell  primary cell Monahan K et al. (2019)\n##  7 olfactory receptor cell  primary cell Monahan K et al. (2019)\n##                                                                                                                                    URL\n##  1 https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/49504f97-904e-48c1-8c20-1033680b66da/4DNFIC5AHBPV.pairs.gz\n##  3      https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/6cd4378a-8f51-4e65-99eb-15f5c80abf8d/4DNFIT4I5C6Z.hic\n##  4    https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/01fb704f-2fd7-48c6-91af-c5f4584529ed/4DNFIVPAXJO8.mcool\n##  5   https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/5c07cdee-53e2-43e0-8853-cfe5f057b3f1/4DNFIR3XCIMA.bed.gz\n##  6       https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/d1f4beb9-701f-4188-abe2-6271fe658770/4DNFIXKKNMS7.bw\n##  7       https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/3d429647-51c8-4e3a-a18b-eec0b1480905/4DNFIN13N8C1.bw",
    "crumbs": [
      "Advanced Hi-C topics",
      "<span class='chapter-number'>8</span>  <span class='chapter-title'>Data gateways: accessing public Hi-C data portals</span>"
    ]
  },
  {
    "objectID": "pages/disseminating.html#dn-data-portal",
    "href": "pages/disseminating.html#dn-data-portal",
    "title": "\n8  Data gateways: accessing public Hi-C data portals\n",
    "section": "",
    "text": "8.1.1 Querying individual files\nThe fourDNData() function can be used to directly fetch specific files from the 4DN data portal:\n\ncf &lt;- fourDNData(experimentSetAccession = '4DNESJNPEKZD', type = 'mcool')\n\nThis effectively downloads and caches the queried file locally.\n\ncf\n##  [1] \"/home/biocbuild/.cache/R/fourDNData/348caa4cf7f1d3_4DNFIZL8OZE1.mcool\"\n\navailableChromosomes(cf)\n##  Seqinfo object with 24 sequences from an unspecified genome:\n##    seqnames seqlengths isCircular genome\n##    chr1      248956422       &lt;NA&gt;   &lt;NA&gt;\n##    chr2      242193529       &lt;NA&gt;   &lt;NA&gt;\n##    chr3      198295559       &lt;NA&gt;   &lt;NA&gt;\n##    chr4      190214555       &lt;NA&gt;   &lt;NA&gt;\n##    chr5      181538259       &lt;NA&gt;   &lt;NA&gt;\n##    ...             ...        ...    ...\n##    chr20      64444167       &lt;NA&gt;   &lt;NA&gt;\n##    chr21      46709983       &lt;NA&gt;   &lt;NA&gt;\n##    chr22      50818468       &lt;NA&gt;   &lt;NA&gt;\n##    chrX      156040895       &lt;NA&gt;   &lt;NA&gt;\n##    chrY       57227415       &lt;NA&gt;   &lt;NA&gt;\n\navailableResolutions(cf)\n##  resolutions(13): 1000 2000 ... 5e+06 1e+07\n##  \n\nimport(cf, focus = \"chr4:10000001-20000000\", resolution = 5000)\n##  `HiCExperiment` object with 656 contacts over 2,000 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/fourDNData/348caa4cf7f1d3_4DNFIZL8OZE1.mcool\" \n##  focus: \"chr4:10,000,001-20,000,000\" \n##  resolutions(13): 1000 2000 ... 5000000 10000000\n##  active resolution: 5000 \n##  interactions: 614 \n##  scores(2): count balanced \n##  topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n##  pairsFile: N/A \n##  metadata(0):\n\nDifferent Hi-C related genomic files are provided by the 4DN consortium. The type of file to fetch can be specified with the type argument:\n\n\ntype = 'pairs' will fetch the pairs file which was generated by the 4DN consortium and binned into a contact matrix. Once fetched from the 4DN data portal, the local file can be imported in R using the import function, which will generate a GInteractions object.\n\n\n## Not evaluated for now\npairs_f &lt;- fourDNData(experimentSetAccession = '4DNESJNPEKZD', type = 'pairs') \nprint(pairs_f)\nimport(pairs_f)\n\n\n\n\n\n\n\nWarningWatch out\n\n\n\n.pairs files can be particularly large and therefore will take both and long time to download and a large storage footprint.\n\n\n\n\ntype = 'insulation' will fetch a .bigwig track file precomputed by the 4DN consortium. This track corresponds to the genome-wide insulation score computed by cooltools as described in Crane et al. (2015). To know more about this, read the excerpt from 4DN data portal. Once fetched from the 4DN data portal, the local file can be imported in R using the import function, which will generate a RleList object.\n\n\nlibrary(rtracklayer)\nfourDNData(experimentSetAccession = '4DNES25ABNZ1', type = 'insulation') |&gt; \n    import(as = 'Rle')\n##  RleList of length 21\n##  $chr1\n##  numeric-Rle of length 195471971 with 38145 runs\n##    Lengths:      3065000         5000 ...         5000       171971\n##    Values :  0.00000e+00  1.01441e-01 ...     0.807009     0.000000\n##  \n##  $chr10\n##  numeric-Rle of length 130694993 with 25100 runs\n##    Lengths:     3175000        5000        5000 ...        5000      169993\n##    Values :  0.00000000  0.37584546  0.33597839 ...    0.628601    0.000000\n##  \n##  $chr11\n##  numeric-Rle of length 122082543 with 23536 runs\n##    Lengths:    3165000       5000       5000 ...       5000     162543\n##    Values :  0.0000000 -0.7906257 -0.7930040 ...   0.515919   0.000000\n##  \n##  $chr12\n##  numeric-Rle of length 120129022 with 22578 runs\n##    Lengths:   3075000      5000      5000 ...      5000      5000    164022\n##    Values :  0.000000  0.411216  0.400357 ... 0.1650951 0.2175749 0.0000000\n##  \n##  $chr13\n##  numeric-Rle of length 120421639 with 22807 runs\n##    Lengths:     3080000        5000        5000 ...        5000      171639\n##    Values :  0.00000000  0.17005745  0.10652249 ...  1.14856148  0.00000000\n##  \n##  ...\n##  &lt;16 more elements&gt;\n\n\n\ntype = 'boundaries' will fetch a .bed file precomputed by the 4DN consortium, listing the annotated borders between topological domains. These borders correspond to local minima identified from the genome-wide insulation track. It can also be imported in R using the import function, which will generate a GRanges object.\n\n\nfourDNData(experimentSetAccession = '4DNES25ABNZ1', type = 'boundaries') |&gt; \n    import()\n##  GRanges object with 6103 ranges and 2 metadata columns:\n##           seqnames            ranges strand |        name     score\n##              &lt;Rle&gt;         &lt;IRanges&gt;  &lt;Rle&gt; | &lt;character&gt; &lt;numeric&gt;\n##       [1]     chr1   4380001-4385000      * |      Strong  0.695274\n##       [2]     chr1   4760001-4765000      * |        Weak  0.444476\n##       [3]     chr1   4910001-4915000      * |        Weak  0.353184\n##       [4]     chr1   5180001-5185000      * |      Strong  0.565763\n##       [5]     chr1   6170001-6175000      * |      Strong  1.644911\n##       ...      ...               ...    ... .         ...       ...\n##    [6099]     chrY 89725001-89730000      * |        Weak  0.258094\n##    [6100]     chrY 89790001-89795000      * |        Weak  0.442186\n##    [6101]     chrY 89895001-89900000      * |        Weak  0.279879\n##    [6102]     chrY 90025001-90030000      * |      Strong  0.660699\n##    [6103]     chrY 90410001-90415000      * |      Strong  1.160018\n##    -------\n##    seqinfo: 21 sequences from an unspecified genome; no seqlengths\n\n\n\ntype = 'compartments' will fetch a .bigwig track file precomputed by the 4DN consortium. This track corresponds to the selected genome-wide eigenvector computed by cooltools and representing A/B compartments. To know more about this, read the excerpt from 4DN data portal. Once fetched from the 4DN data portal, the local file can be imported in R using the import function, which will generate a RleList object. The score represents the eigenvector values, and by convention a genomic bin with a positive score is associated with the A compartment whereas a genomic bin with a negative score is associated with the B compartment.\n\n\nfourDNData(experimentSetAccession = '4DNES25ABNZ1', type = 'compartments') |&gt; \n    import()\n##  GRanges object with 10911 ranges and 1 metadata column:\n##            seqnames            ranges strand |     score\n##               &lt;Rle&gt;         &lt;IRanges&gt;  &lt;Rle&gt; | &lt;numeric&gt;\n##        [1]     chr1          1-250000      * |       NaN\n##        [2]     chr1     250001-500000      * |       NaN\n##        [3]     chr1     500001-750000      * |       NaN\n##        [4]     chr1    750001-1000000      * |       NaN\n##        [5]     chr1   1000001-1250000      * |       NaN\n##        ...      ...               ...    ... .       ...\n##    [10907]     chrY 90500001-90750000      * | 0.0237907\n##    [10908]     chrY 90750001-91000000      * |       NaN\n##    [10909]     chrY 91000001-91250000      * |       NaN\n##    [10910]     chrY 91250001-91500000      * |       NaN\n##    [10911]     chrY 91500001-91744698      * |       NaN\n##    -------\n##    seqinfo: 21 sequences from an unspecified genome\n\n\n8.1.2 Querying a complete experiment dataset\nRather than importing multiple files corresponding to a single experimentSet accession ID one by one, one can import all the available files associated with a experimentSet accession ID into a HiCExperiment object by using the fourDNHiCExperiment() function.\n\nhic &lt;- fourDNHiCExperiment('4DNESJNPEKZD')\n##  Fetching local Hi-C contact map from Bioc cache\n##  Fetching local compartments bigwig file from Bioc cache\n##  Insulation not found for the provided experimentSet accession.\n##  Borders not found for the provided experimentSet accession.\n##  Importing contacts in memory\n\nThis is a more efficient way to import datasets, as it aggregates the different bits together into a single HiCExperiment object with populated topologicalFeatures and metadata slots.\n\nhic\n##  `HiCExperiment` object with 453,301 contacts over 12,366 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/fourDNData/348caa4cf7f1d3_4DNFIZL8OZE1.mcool\" \n##  focus: \"whole genome\" \n##  resolutions(13): 1000 2000 ... 5000000 10000000\n##  active resolution: 250000 \n##  interactions: 289086 \n##  scores(2): count balanced \n##  topologicalFeatures: compartments(5437) borders(0) \n##  pairsFile: N/A \n##  metadata(2): 4DN_info eigens\n\n\nmetadata(hic)\n##  $`4DN_info`\n##       experimentSetAccession     fileType   size organism experimentType\n##  1376           4DNESJNPEKZD        pairs   6.67    human   in situ Hi-C\n##  1378           4DNESJNPEKZD          hic 179.51    human   in situ Hi-C\n##  1379           4DNESJNPEKZD        mcool  30.17    human   in situ Hi-C\n##  1380           4DNESJNPEKZD compartments   0.21    human   in situ Hi-C\n##       details                                     dataset\n##  1376    MboI Hi-C on GM12878 cells - protocol variations\n##  1378    MboI Hi-C on GM12878 cells - protocol variations\n##  1379    MboI Hi-C on GM12878 cells - protocol variations\n##  1380    MboI Hi-C on GM12878 cells - protocol variations\n##                                                       condition biosource\n##  1376 in situ Hi-C on GM12878 with MboI and bio-dUTP (Tri-Link)   GM12878\n##  1378 in situ Hi-C on GM12878 with MboI and bio-dUTP (Tri-Link)   GM12878\n##  1379 in situ Hi-C on GM12878 with MboI and bio-dUTP (Tri-Link)   GM12878\n##  1380 in situ Hi-C on GM12878 with MboI and bio-dUTP (Tri-Link)   GM12878\n##                biosourceType          publication\n##  1376 immortalized cell line Rao SS et al. (2014)\n##  1378 immortalized cell line Rao SS et al. (2014)\n##  1379 immortalized cell line Rao SS et al. (2014)\n##  1380 immortalized cell line Rao SS et al. (2014)\n##                                                                                                                                       URL\n##  1376 https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/0bdd4745-7203-49d0-adf6-291cef1a96b7/4DNFIOZ7D1OQ.pairs.gz\n##  1378      https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/1201682a-a223-482d-913d-3c3972b8eb65/4DNFIIRIHBR2.hic\n##  1379    https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/356fab42-5562-4cfd-a3f8-592aa060b992/4DNFIZL8OZE1.mcool\n##  1380       https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/333aabfd-b747-447c-b93a-8138f9488fad/4DNFIO9V5G93.bw\n##  \n##  $eigens\n##  GRanges object with 11280 ranges and 2 metadata columns:\n##            seqnames              ranges strand |       score       eigen\n##               &lt;Rle&gt;           &lt;IRanges&gt;  &lt;Rle&gt; |   &lt;numeric&gt;   &lt;numeric&gt;\n##        [1]     chr1      750001-1000000      * |   1.6911879   1.6911879\n##        [2]     chr1     1000001-1250000      * |   0.0809129   0.0809129\n##        [3]     chr1     1250001-1500000      * |   0.0690173   0.0690173\n##        [4]     chr1     1500001-1750000      * |  -0.1903324  -0.1903324\n##        [5]     chr1     1750001-2000000      * |   0.3283633   0.3283633\n##        ...      ...                 ...    ... .         ...         ...\n##    [11276]     chrX 154750001-155000000      * | -0.10909061 -0.10909061\n##    [11277]     chrX 155000001-155250000      * | -1.39655280 -1.39655280\n##    [11278]     chrX 155250001-155500000      * |  0.00264734  0.00264734\n##    [11279]     chrX 155500001-155750000      * | -0.15279847 -0.15279847\n##    [11280]     chrX 155750001-156000000      * | -1.41699576 -1.41699576\n##    -------\n##    seqinfo: 24 sequences from an unspecified genome",
    "crumbs": [
      "Advanced Hi-C topics",
      "<span class='chapter-number'>8</span>  <span class='chapter-title'>Data gateways: accessing public Hi-C data portals</span>"
    ]
  },
  {
    "objectID": "pages/disseminating.html#dna-zoo",
    "href": "pages/disseminating.html#dna-zoo",
    "title": "\n8  Data gateways: accessing public Hi-C data portals\n",
    "section": "\n8.2 DNA Zoo",
    "text": "8.2 DNA Zoo\nThe DNA Zoo Consortium is a collaborative group whose aim is to correct and refine genome assemblies across the tree of life using Hi-C approaches. As of 2023, they have performed Hi-C across more than 300 animal, plant and fungi species.\nDNAZooData is a package giving programmatic access to these uniformly processed Hi-C contact files, as well as the refined genome assemblies.\nThe DNAZooData() function provides a gateway to DNA Zoo-hosted Hi-C files, fetching and caching relevant contact matrices in .hic format It returns a HicFile object, which can then be imported in memory using import().\n\nlibrary(DNAZooData)\nhead(DNAZooData())\n##                    species                              readme\n##  1        Acinonyx_jubatus        Acinonyx_jubatus/README.json\n##  2      Acropora_millepora      Acropora_millepora/README.json\n##  3     Addax_nasomaculatus     Addax_nasomaculatus/README.json\n##  4           Aedes_aegypti           Aedes_aegypti/README.json\n##  5   Aedes_aegypti__AaegL4   Aedes_aegypti__AaegL4/README.json\n##  6 Aedes_aegypti__AaegL5.0 Aedes_aegypti__AaegL5.0/README.json\n##                                                            readme_link\n##  1        https://dnazoo.s3.wasabisys.com/Acinonyx_jubatus/README.json\n##  2      https://dnazoo.s3.wasabisys.com/Acropora_millepora/README.json\n##  3     https://dnazoo.s3.wasabisys.com/Addax_nasomaculatus/README.json\n##  4           https://dnazoo.s3.wasabisys.com/Aedes_aegypti/README.json\n##  5   https://dnazoo.s3.wasabisys.com/Aedes_aegypti__AaegL4/README.json\n##  6 https://dnazoo.s3.wasabisys.com/Aedes_aegypti__AaegL5.0/README.json\n##    original_assembly     new_assembly\n##  1           aciJub1      aciJub1_HiC\n##  2       amil_sf_1.1  amil_sf_1.1_HiC\n##  3      ASM1959352v1 ASM1959352v1_HiC\n##  4        AGWG.draft         AaegL5.0\n##  5            AaegL3           AaegL4\n##  6        AGWG.draft         AaegL5.0\n##                                                                new_assembly_link\n##  1         https://dnazoo.s3.wasabisys.com/Acinonyx_jubatus/aciJub1_HiC.fasta.gz\n##  2   https://dnazoo.s3.wasabisys.com/Acropora_millepora/amil_sf_1.1_HiC.fasta.gz\n##  3 https://dnazoo.s3.wasabisys.com/Addax_nasomaculatus/ASM1959352v1_HiC.fasta.gz\n##  4               https://dnazoo.s3.wasabisys.com/Aedes_aegypti/AaegL5.0.fasta.gz\n##  5         https://dnazoo.s3.wasabisys.com/Aedes_aegypti__AaegL4/AaegL4.fasta.gz\n##  6     https://dnazoo.s3.wasabisys.com/Aedes_aegypti__AaegL5.0/AaegL5.0.fasta.gz\n##    new_assembly_link_status\n##  1                      200\n##  2                      200\n##  3                      200\n##  4                      404\n##  5                      200\n##  6                      200\n##                                                                    hic_link\n##  1    https://dnazoo.s3.wasabisys.com/Acinonyx_jubatus/aciJub1.rawchrom.hic\n##  2   https://dnazoo.s3.wasabisys.com/Acropora_millepora/amil_sf_1.1_HiC.hic\n##  3 https://dnazoo.s3.wasabisys.com/Addax_nasomaculatus/ASM1959352v1_HiC.hic\n##  4                                                                     &lt;NA&gt;\n##  5         https://dnazoo.s3.wasabisys.com/Aedes_aegypti__AaegL4/AaegL4.hic\n##  6     https://dnazoo.s3.wasabisys.com/Aedes_aegypti__AaegL5.0/AaegL5.0.hic\n\nFor example, we can directly fetch a Hi-C dataset generated from a tardigrade sample by specifying the right species argument.\n\nhicfile &lt;- DNAZooData(species = 'Hypsibius_dujardini')\n\nhicfile\n##  HicFile object\n##  .hic file: /home/biocbuild/.cache/R/DNAZooData/418e61e03783d_nHd_3.1_HiC.hic \n##  resolution: 5000 \n##  pairs file: \n##  metadata(6): organism draftAssembly ... credits assemblyURL\n\nHere again, the resulting HicFile is populated with metadata parsed from the DNA Zoo data portal.\n\nmetadata(hicfile)$organism\n##  $vernacular\n##  [1] \"Tardigrade\"\n##  \n##  $binomial\n##  [1] \"Hypsibius dujardini\"\n##  \n##  $funFact\n##  [1] \"&lt;i&gt;Hypsibius dujardini&lt;/i&gt; is a species of tardigrade, a tiny microscopic organism. They are also commonly called water bears. This species is found world-wide!\"\n##  \n##  $extraInfo\n##  [1] \"on BioKIDS website\"\n##  \n##  $extraInfoLink\n##  [1] \"http://www.biokids.umich.edu/critters/Hypsibius_dujardini/\"\n##  \n##  $image\n##  [1] \"https://static.wixstatic.com/media/2b9330_82db39c219f24b20a75cb38943aad1fb~mv2.jpg\"\n##  \n##  $imageCredit\n##  [1] \"By Willow Gabriel, Goldstein Lab - https://www.flickr.com/photos/waterbears/1614095719/ Template:Uploader Transferred from en.wikipedia to Commons., CC BY-SA 2.5, https://commons.wikimedia.org/w/index.php?curid=2261992\"\n##  \n##  $isChromognomes\n##  [1] \"FALSE\"\n##  \n##  $taxonomy\n##  [1] \"Species:202423-914154-914155-914158-155166-155362-710171-710179-710192-155390-155420\"\n\nHiCFile metadata also points to a URL to directly fetch the genome assembly corrected by the DNA Zoo consortium.\n\nmetadata(hicfile)$assemblyURL\n##  [1] \"https://dnazoo.s3.wasabisys.com/Hypsibius_dujardini/nHd_3.1_HiC.fasta.gz\"",
    "crumbs": [
      "Advanced Hi-C topics",
      "<span class='chapter-number'>8</span>  <span class='chapter-title'>Data gateways: accessing public Hi-C data portals</span>"
    ]
  },
  {
    "objectID": "pages/interoperability.html",
    "href": "pages/interoperability.html",
    "title": "\n9  Interoperability: using HiCExperiment with other R packages\n",
    "section": "",
    "text": "9.1 diffHic\ndiffHic is the first R package dedicated to Hi-C processing and analysis (Lun & Smyth (2015)). It is packed with useful functions to generate a contact matrix from read pairs and to perform downstream investigation, including normalization, 2D “peak” (i.e. loops) finding and aggregation, differential interaction between samples, etc. It works seamlessly with the InteractionSet class of object, which can be easily obtained from a HiCExperiment object.\nTo do so, we first need to extract GInteractions from one or several HiCExperiment objects and create a single InteractionSet object.\nlibrary(InteractionSet)\nlibrary(GenomicRanges)\nlibrary(HiCExperiment)\nlibrary(HiContactsData)\n\n# ---- This downloads an example `.mcool` file and caches it locally \ncoolf &lt;- HiContactsData('yeast_wt', 'mcool')\n##  see ?HiContactsData and browseVignettes('HiContactsData') for documentation\n##  loading from cache\ncool &lt;- import(coolf, format = 'cool')\ngi &lt;- cool |&gt; \n    interactions() |&gt; \n    as(\"ReverseStrictGInteractions\")\niset &lt;- InteractionSet(\n    assays = list(\n        counts = matrix(gi$count, ncol = 1), \n        balanced = matrix(gi$balanced, ncol = 1)\n    ), \n    interactions = gi, \n    colData = data.frame(lib = c(\"WT\"), totals = sum(gi$count))\n)\nFrom there, we can filter interactions to only retain those with significant enrichment over background.\nlibrary(diffHic)\nset.seed(1234)\n\n# --- Filter to find aggregated interactions\nenrichments &lt;- enrichedPairs(iset)\nfilter &lt;- filterPeaks(enrichments, min.enrich = log2(1.2), min.diag = 5)\nfiltered_iset &lt;- iset[filter]\nfiltered_iset\n##  class: InteractionSet \n##  dim: 41872 1 \n##  metadata(0):\n##  assays(2): counts balanced\n##  rownames: NULL\n##  rowData names(4): bin_id1 bin_id2 count balanced\n##  colnames: NULL\n##  colData names(2): lib totals\n##  type: ReverseStrictGInteractions\n##  regions: 12079\n\n# --- Visualize filtered interactions \nlibrary(plyinteractions)\nlibrary(HiContacts)\n##  Registered S3 methods overwritten by 'readr':\n##    method                    from \n##    as.data.frame.spec_tbl_df vroom\n##    as_tibble.spec_tbl_df     vroom\n##    format.col_spec           vroom\n##    print.col_spec            vroom\n##    print.collector           vroom\n##    print.date_names          vroom\n##    print.locale              vroom\n##    str.col_spec              vroom\ninteractions(filtered_iset) |&gt; \n    filter(seqnames2 == 'II', seqnames1 == seqnames2) |&gt; \n    plotMatrix(use.scores = 'count')\nNext, we can cluster filtered interactions that are next to each other.\n# --- Cluster interactions to find loops\nclustered_iset &lt;- clusterPairs(filtered_iset, tol = 5000)\nclustered_iset$interactions \n##  ReverseStrictGInteractions object with 1644 interactions and 0 metadata columns:\n##           seqnames1       ranges1 strand1     seqnames2       ranges2 strand2\n##               &lt;Rle&gt;     &lt;IRanges&gt;   &lt;Rle&gt;         &lt;Rle&gt;     &lt;IRanges&gt;   &lt;Rle&gt;\n##       [1]         I  15001-149000       * ---         I      1-122000       *\n##       [2]         I 133001-148000       * ---         I 127001-139000       *\n##       [3]         I 154001-160000       * ---         I 128001-149000       *\n##       [4]         I 168001-173000       * ---         I 138001-148000       *\n##       [5]         I 184001-196000       * ---         I   15001-23000       *\n##       ...       ...           ...     ... ...       ...           ...     ...\n##    [1640]       XVI 897001-898000       * ---       XVI 831001-832000       *\n##    [1641]       XVI 907001-910000       * ---       XVI 840001-843000       *\n##    [1642]       XVI 926001-934000       * ---       XVI 872001-878000       *\n##    [1643]       XVI 933001-934000       * ---       XVI 858001-859000       *\n##    [1644]       XVI 933001-942000       * ---       XVI 928001-934000       *\n##    -------\n##    regions: 2822 ranges and 0 metadata columns\n##    seqinfo: 16 sequences from an unspecified genome\n\n# --- Visualize clustered interactions \ninteractions(filtered_iset) |&gt; \n    mutate(cluster = clustered_iset$indices[[1]]) |&gt; \n    filter(seqnames2 == 'II', seqnames1 == seqnames2) |&gt; \n    plotMatrix(use.scores = 'cluster')\nFinally, we can visualize identified individual interaction clusters identified with diffHic using HiContacts.\n# --- Plot matrix at a clustered loops\ncgi &lt;- clustered_iset$interactions[554]\nseqn &lt;- seqnames(anchors(cgi, type=\"second\"))\nstart &lt;- start(anchors(cgi, type=\"second\")) - 50000\nend &lt;- end(anchors(cgi, type=\"first\")) + 50000\ninteractions_peak &lt;- GRanges(seqn, IRanges(start, end))\np &lt;- plotMatrix(cool[interactions_peak])\n\nlibrary(ggplot2)\nan &lt;- anchors(cgi)\np + geom_rect(\n    data = data.frame(xmin = start(an[[2]]), xmax = end(an[[2]]), ymin = start(an[[1]]), ymax = end(an[[1]])), \n    aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax), \n    inherit.aes = FALSE, \n    fill = NA, \n    colour = 'cyan'\n)",
    "crumbs": [
      "Advanced Hi-C topics",
      "<span class='chapter-number'>9</span>  <span class='chapter-title'>Interoperability: using HiCExperiment with other R packages</span>"
    ]
  },
  {
    "objectID": "pages/interoperability.html#multihiccompare",
    "href": "pages/interoperability.html#multihiccompare",
    "title": "\n9  Interoperability: using HiCExperiment with other R packages\n",
    "section": "\n9.2 multiHiCcompare",
    "text": "9.2 multiHiCcompare\nThe multiHiCcompare package provides functions for joint normalization and difference detection in multiple Hi-C datasets (Stansfield et al. (2019)). According to its excerpt, to perform differential interaction analysis, it requires a list of raw counts for different samples/replicates, stored in data frames with four columns (chr, start1, start2, count).\nManipulate a HiCExperiment object to coerce it into such structure is straightforward.\n\nlibrary(dplyr)\nlibrary(tidyr)\nlibrary(purrr)\ncoolf_wt &lt;- HiContactsData('yeast_wt', 'mcool')\n##  see ?HiContactsData and browseVignettes('HiContactsData') for documentation\n##  loading from cache\ncoolf_eco1 &lt;- HiContactsData('yeast_eco1', 'mcool')\n##  see ?HiContactsData and browseVignettes('HiContactsData') for documentation\n##  loading from cache\nhics &lt;- list(\n    \"wt\" = import(coolf_wt, format = 'cool'),\n    \"eco1\" = import(coolf_eco1, format = 'cool')\n)\nhics_list &lt;- map(hics, ~ .x['XI'] |&gt; \n    as.data.frame() |&gt;\n    mutate(chr = 1) |&gt; \n    relocate(chr) |&gt;\n    select(chr, start1, start2, count)\n)\nhead(hics_list[[1]])\n##    chr start1 start2 count\n##  1   1      1      1     2\n##  2   1      1   1001     3\n##  3   1      1   2001     3\n##  4   1      1   3001    13\n##  5   1      1   4001     9\n##  6   1      1   5001    13\n\nOnce this list is generated, the classical multiHiCcompare workflow can be applied: first run make_hicexp(), followed by cyclic_loess(), then hic_exactTest() and finally results():\n\nDI &lt;- hics_list |&gt; \n    make_hicexp(\n        data_list = hics_list, \n        groups = factor(c(1, 2))\n    ) |&gt; \n    cyclic_loess() |&gt; \n    hic_exactTest() |&gt; \n    results()\nDI\n##           chr region1 region2     D      logFC    logCPM    p.value     p.adj\n##         &lt;num&gt;   &lt;int&gt;   &lt;int&gt; &lt;num&gt;      &lt;num&gt;     &lt;num&gt;      &lt;num&gt;     &lt;num&gt;\n##      1:     1       1    1001     1  0.4279414  6.382927 0.78960192 1.0000000\n##      2:     1       1    3001     3  1.0325237  8.339327 0.06035705 0.9501367\n##      3:     1       1    4001     4  0.6862141  7.597689 0.34723639 1.0000000\n##      4:     1       1    5001     5  0.5124878  7.960339 0.43133791 1.0000000\n##      5:     1       1    6001     6 -0.3568672  8.563374 0.52289982 1.0000000\n##     ---                                                                      \n##  22637:     1  663001  666001     3 -1.1680738  7.158551 0.17500113 1.0000000\n##  22638:     1  664001  664001     0  1.4530501  8.536212 0.16535151 1.0000000\n##  22639:     1  664001  665001     1 -0.1014769  8.166275 1.00000000 1.0000000\n##  22640:     1  665001  665001     0 -0.3110054 10.013750 0.60075706 1.0000000\n##  22641:     1  665001  666001     1 -0.4989794  7.750157 0.41481212 1.0000000",
    "crumbs": [
      "Advanced Hi-C topics",
      "<span class='chapter-number'>9</span>  <span class='chapter-title'>Interoperability: using HiCExperiment with other R packages</span>"
    ]
  },
  {
    "objectID": "pages/interoperability.html#topdom",
    "href": "pages/interoperability.html#topdom",
    "title": "\n9  Interoperability: using HiCExperiment with other R packages\n",
    "section": "\n9.3 TopDom",
    "text": "9.3 TopDom\nThe TopDom method is widely used to annotate topological domains in genomes from Hi-C data (Shin et al. (2015)). The TopDom package was created to implement this method in R (Bengtsson et al. (2020)).\nUnfortunately, the format of the input to TopDom is rather tricky (see ?TopDom::readHiC). The following chunk of code shows how to coerce a HiCExperiment object into a TopDom-compatible object.\n\nlibrary(TopDom)\nhic &lt;- import(coolf_wt, format = 'cool')\nHiCExperiment2TopDom &lt;- function(hic, chr) {\n    data &lt;- list()\n    cm &lt;- as(hic[chr], 'ContactMatrix')\n    data$counts &lt;- as.matrix(cm) |&gt; base::as.matrix()\n    data$counts[is.na(data$counts)] &lt;- 0\n    data$bins &lt;- regions(cm) |&gt; \n        as.data.frame() |&gt; \n        select(seqnames, start, end) |&gt;\n        mutate(seqnames = as.character(seqnames)) |&gt;\n        mutate(id = 1:n(), start = start - 1) |&gt; \n        relocate(id) |&gt; \n        dplyr::rename(chr = seqnames, from.coord = start, to.coord = end)\n    class(data) &lt;- 'TopDomData'\n    return(data)\n}\nhic_topdom &lt;- HiCExperiment2TopDom(hic, \"II\")\nhic_topdom\n##  TopDomData:\n##  bins:\n##  'data.frame':   813 obs. of  4 variables:\n##   $ id        : int  1 2 3 4 5 6 7 8 9 10 ...\n##   $ chr       : chr  \"II\" \"II\" \"II\" \"II\" ...\n##   $ from.coord: num  0 1000 2000 3000 4000 5000 6000 7000 8000 9000 ...\n##   $ to.coord  : int  1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 ...\n##  counts:\n##   num [1:813, 1:813] 0 0 0 0 0 0 0 0 0 0 ...\n\nNow that we have coerced a HiCExperiment object into a TopDom-compatible object, we can use the main TopDom function to annotate topological domains.\n\ndomains &lt;- TopDom::TopDom(hic_topdom, window.size = 5)\ndomains\n##  TopDom:\n##  Parameters:\n##  - window.size: 5\n##  - statFilter: TRUE\n##  binSignal:\n##  'data.frame':   813 obs. of  7 variables:\n##   $ id        : int  1 2 3 4 5 6 7 8 9 10 ...\n##   $ chr       : chr  \"II\" \"II\" \"II\" \"II\" ...\n##   $ from.coord: num  0 1000 2000 3000 4000 5000 6000 7000 8000 9000 ...\n##   $ to.coord  : int  1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 ...\n##   $ local.ext : num  -0.5 -0.5 -0.5 -0.5 -0.5 -0.5 -0.5 -0.5 0 0 ...\n##   $ mean.cf   : num  0 0 0 0 0 ...\n##   $ pvalue    : num  1 1 1 1 1 ...\n##  domain:\n##  'data.frame':   61 obs. of  7 variables:\n##   $ chr       : chr  \"II\" \"II\" \"II\" \"II\" ...\n##   $ from.id   : int  1 9 31 36 47 61 76 82 91 102 ...\n##   $ from.coord: num  0 8000 30000 35000 46000 60000 75000 81000 90000 101000 ...\n##   $ to.id     : int  8 30 35 46 60 75 81 90 101 136 ...\n##   $ to.coord  : num  8000 30000 35000 46000 60000 75000 81000 90000 101000 136000 ...\n##   $ tag       : chr  \"gap\" \"domain\" \"gap\" \"domain\" ...\n##   $ size      : num  8000 22000 5000 11000 14000 15000 6000 9000 11000 35000 ...\n##  bed:\n##  'data.frame':   61 obs. of  4 variables:\n##   $ chrom     : chr  \"II\" \"II\" \"II\" \"II\" ...\n##   $ chromStart: num  0 8000 30000 35000 46000 60000 75000 81000 90000 101000 ...\n##   $ chromEnd  : num  8000 30000 35000 46000 60000 75000 81000 90000 101000 136000 ...\n##   $ name      : chr  \"gap\" \"domain\" \"gap\" \"domain\" ...\n\nThe resulting domains object can be used to extract annotated domains, store them in topologicalFeatures of the original HiCExperiment, and optionally write a bed file to export them in text.\n\ntopologicalFeatures(hic, 'domain') &lt;- domains$bed |&gt; \n    mutate(chromStart = chromStart + 1) |&gt; \n    filter(name == 'domain') |&gt; \n    makeGRangesFromDataFrame()\ntopologicalFeatures(hic, 'domain')\n##  GRanges object with 52 ranges and 0 metadata columns:\n##         seqnames        ranges strand\n##            &lt;Rle&gt;     &lt;IRanges&gt;  &lt;Rle&gt;\n##     [1]       II    8001-30000      *\n##     [2]       II   35001-46000      *\n##     [3]       II   46001-60000      *\n##     [4]       II   60001-75000      *\n##     [5]       II   75001-81000      *\n##     ...      ...           ...    ...\n##    [48]       II 664001-681000      *\n##    [49]       II 681001-707000      *\n##    [50]       II 707001-714000      *\n##    [51]       II 714001-761000      *\n##    [52]       II 761001-806000      *\n##    -------\n##    seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\nrtracklayer::export(topologicalFeatures(hic, 'domain'), 'hic_domains.bed')",
    "crumbs": [
      "Advanced Hi-C topics",
      "<span class='chapter-number'>9</span>  <span class='chapter-title'>Interoperability: using HiCExperiment with other R packages</span>"
    ]
  },
  {
    "objectID": "pages/interoperability.html#gothic",
    "href": "pages/interoperability.html#gothic",
    "title": "\n9  Interoperability: using HiCExperiment with other R packages\n",
    "section": "\n9.4 GOTHiC",
    "text": "9.4 GOTHiC\nGOTHiC relies on a cumulative binomial test to detect interactions between distal genomic loci that have significantly more reads than expected by chance in Hi-C experiments (Mifsud et al. (2017)).\n\n\n\n\n\n\nImportantUsing the GOTHiC function\n\n\n\nUnfortunately, the main GOTHiC function require two .bam files as input. These files are often deleted due to their larger size, while the filtered pairs file itself is retained.\nMoreover, the internal nuts and bolts of the main GOTHiC function perform several operations that are not required in modern workflows:\n\n\nFiltering pairs from same restriction fragment; this step is now usually taken care of automatically, e.g. with HiCool Hi-C processing package.\n\nFiltering short-range pairs; the GOTHiC package hard-codes a 10kb lower threshold for minimum pair distance. More advanced optimized filtering approaches have been implemented since then, to circumvent the need for such hard-coded threshold.\n\nBinning pairs; this step is also already taken care of, when working with Hi-C matrices in modern formats, e.g. with .(m)cool files.\n\n\n\nBased on these facts, we can simplify the binomial test function provided by GOTHiC so that it can directly used binned interactions imported as a HiCExperiment object in R.\n\nShow the code for GOTHiC_binomial functionGOTHiC_binomial &lt;- function(x) {\n\n    if (length(trans(x)) != 0) stop(\"Only `cis` interactions can be used here.\")\n    ints &lt;- interactions(x) |&gt;\n        as.data.frame() |&gt; \n        select(seqnames1, start1, seqnames2, start2, count) |&gt;\n        dplyr::rename(chr1 = seqnames1, locus1 = start1, chr2 = seqnames2, locus2 = start2, frequencies = count) |&gt;\n        mutate(locus1 = locus1 - 1, locus2 = locus2 - 1) |&gt;\n        mutate(int1 = paste0(chr1, '_', locus1), int2 = paste0(chr2, '_', locus2))\n    \n    numberOfReadPairs &lt;- sum(ints$frequencies)\n    all_bins &lt;- unique(c(unique(ints$int1), unique(ints$int2)))\n    all_bins &lt;- sort(all_bins)\n    upperhalfBinNumber &lt;- (length(all_bins)^2 - length(all_bins))/2\n\n    cov &lt;- ints |&gt; \n        group_by(int1) |&gt; \n        tally(frequencies) |&gt; \n        full_join(ints |&gt; \n            group_by(int2) |&gt; \n            tally(frequencies), \n            by = c('int1' = 'int2')\n        ) |&gt; \n        rowwise() |&gt; \n        mutate(coverage = sum(n.x, n.y, na.rm = TRUE)) |&gt; \n        ungroup() |&gt;\n        mutate(relative_coverage = coverage/sum(coverage))\n    \n    results &lt;- mutate(ints,\n        cov1 = left_join(ints, select(cov, int1, relative_coverage), by = c('int1' = 'int1'))$relative_coverage, \n        cov2 = left_join(ints, select(cov, int1, relative_coverage), by = c('int2' = 'int1'))$relative_coverage,\n        probability = cov1 * cov2 * 2 * 1/(1 - sum(cov$relative_coverage^2)),\n        predicted = probability * numberOfReadPairs\n    ) |&gt; \n    rowwise() |&gt;\n    mutate(\n        pvalue = binom.test(\n            frequencies, \n            numberOfReadPairs, \n            probability,\n            alternative = \"greater\"\n        )$p.value\n    ) |&gt; \n    ungroup() |&gt; \n    mutate(\n        logFoldChange = log2(frequencies / predicted), \n        qvalue = stats::p.adjust(pvalue, method = \"BH\", n = upperhalfBinNumber)\n    )\n\n    scores(x, \"probability\") &lt;- results$probability\n    scores(x, \"predicted\") &lt;- results$predicted\n    scores(x, \"pvalue\") &lt;- results$pvalue\n    scores(x, \"qvalue\") &lt;- results$qvalue\n    scores(x, \"logFoldChange\") &lt;- results$logFoldChange\n\n    return(x)\n\n} \n\n\n\nres &lt;- GOTHiC_binomial(hic[\"II\"])\nres\n##  `HiCExperiment` object with 471,364 contacts over 802 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/ExperimentHub/7748621cf42b4_7752\" \n##  focus: \"II\" \n##  resolutions(5): 1000 2000 4000 8000 16000\n##  active resolution: 1000 \n##  interactions: 74360 \n##  scores(7): count balanced probability predicted pvalue qvalue logFoldChange \n##  topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) domain(52) \n##  pairsFile: N/A \n##  metadata(0):\n\ninteractions(res)\n##  GInteractions object with 74360 interactions and 9 metadata columns:\n##            seqnames1       ranges1 strand1     seqnames2       ranges2\n##                &lt;Rle&gt;     &lt;IRanges&gt;   &lt;Rle&gt;         &lt;Rle&gt;     &lt;IRanges&gt;\n##        [1]        II        1-1000       * ---        II     1001-2000\n##        [2]        II        1-1000       * ---        II     5001-6000\n##        [3]        II        1-1000       * ---        II     6001-7000\n##        [4]        II        1-1000       * ---        II     8001-9000\n##        [5]        II        1-1000       * ---        II    9001-10000\n##        ...       ...           ...     ... ...       ...           ...\n##    [74356]        II 807001-808000       * ---        II 809001-810000\n##    [74357]        II 807001-808000       * ---        II 810001-811000\n##    [74358]        II 808001-809000       * ---        II 808001-809000\n##    [74359]        II 808001-809000       * ---        II 809001-810000\n##    [74360]        II 809001-810000       * ---        II 809001-810000\n##            strand2 |   bin_id1   bin_id2     count  balanced probability\n##              &lt;Rle&gt; | &lt;numeric&gt; &lt;numeric&gt; &lt;numeric&gt; &lt;numeric&gt;   &lt;numeric&gt;\n##        [1]       * |       231       232         1       NaN 7.83580e-09\n##        [2]       * |       231       236         2       NaN 2.81318e-08\n##        [3]       * |       231       237         1       NaN 2.02960e-08\n##        [4]       * |       231       239         2       NaN 6.73108e-08\n##        [5]       * |       231       240         3       NaN 7.37336e-08\n##        ...     ... .       ...       ...       ...       ...         ...\n##    [74356]       * |      1038      1040         8 0.0472023 3.85638e-07\n##    [74357]       * |      1038      1041         1       NaN 5.03006e-08\n##    [74358]       * |      1039      1039         1       NaN 8.74604e-08\n##    [74359]       * |      1039      1040         7       NaN 1.02111e-07\n##    [74360]       * |      1040      1040         2 0.0411355 1.19216e-07\n##             predicted      pvalue      qvalue logFoldChange\n##             &lt;numeric&gt;   &lt;numeric&gt;   &lt;numeric&gt;     &lt;numeric&gt;\n##        [1] 0.00369352 3.68670e-03 0.063385760       8.08079\n##        [2] 0.01326033 8.71446e-05 0.001926954       7.23674\n##        [3] 0.00956681 9.52120e-03 0.150288341       6.70775\n##        [4] 0.03172791 4.92808e-04 0.009806734       5.97810\n##        [5] 0.03475538 6.81713e-06 0.000173165       6.43158\n##        ...        ...         ...         ...           ...\n##    [74356]  0.1817758 2.51560e-11 1.07966e-09       5.45977\n##    [74357]  0.0237099 2.34310e-02 3.38098e-01       5.39837\n##    [74358]  0.0412257 4.03875e-02 5.49519e-01       4.60031\n##    [74359]  0.0481315 1.13834e-13 5.77259e-12       7.18423\n##    [74360]  0.0561941 1.52097e-03 2.79707e-02       5.15344\n##    -------\n##    regions: 802 ranges and 4 metadata columns\n##    seqinfo: 16 sequences from an unspecified genome",
    "crumbs": [
      "Advanced Hi-C topics",
      "<span class='chapter-number'>9</span>  <span class='chapter-title'>Interoperability: using HiCExperiment with other R packages</span>"
    ]
  },
  {
    "objectID": "pages/workflow-yeast.html",
    "href": "pages/workflow-yeast.html",
    "title": "Workflow 1: Distance-dependent interactions across yeast mutants",
    "section": "",
    "text": "Recovering data from SRA\nThe easiest for this is to directly fetch files from SRA from their FTP server. We can do so using the base download.file function.\n# !! This code is not actually executed !!\ndir.create('data')\ndownload.file(\"ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR876/004/SRR8769554/SRR8769554_1.fastq.gz\", \"data/WT_G1_WT_rep1_R1.fastq.gz\")\ndownload.file(\"ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR876/004/SRR8769554/SRR8769554_2.fastq.gz\", \"data/WT_G1_WT_rep1_R2.fastq.gz\")\ndownload.file(\"ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR106/076/SRR10687276/SRR10687276_1.fastq.gz\", \"data/WT_G1_WT_rep2_R1.fastq.gz\")\ndownload.file(\"ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR106/076/SRR10687276/SRR10687276_2.fastq.gz\", \"data/WT_G1_WT_rep2_R2.fastq.gz\")\ndownload.file(\"ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR876/009/SRR8769549/SRR8769549_1.fastq.gz\", \"data/WT_G2M_WT_rep1_R1.fastq.gz\")\ndownload.file(\"ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR876/009/SRR8769549/SRR8769549_2.fastq.gz\", \"data/WT_G2M_WT_rep1_R2.fastq.gz\")\ndownload.file(\"ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR106/081/SRR10687281/SRR10687281_1.fastq.gz\", \"data/WT_G2M_WT_rep2_R1.fastq.gz\")\ndownload.file(\"ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR106/081/SRR10687281/SRR10687281_2.fastq.gz\", \"data/WT_G2M_WT_rep2_R2.fastq.gz\")\ndownload.file(\"ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR876/001/SRR8769551/SRR8769551_1.fastq.gz\", \"data/wpl1_G2M_rep1_R1.fastq.gz\")\ndownload.file(\"ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR876/001/SRR8769551/SRR8769551_2.fastq.gz\", \"data/wpl1_G2M_rep1_R2.fastq.gz\")\ndownload.file(\"ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR106/078/SRR10687278/SRR10687278_1.fastq.gz\", \"data/wpl1_G2M_rep2_R1.fastq.gz\")\ndownload.file(\"ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR106/078/SRR10687278/SRR10687278_2.fastq.gz\", \"data/wpl1_G2M_rep2_R2.fastq.gz\")\ndownload.file(\"ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR876/005/SRR8769555/SRR8769555_1.fastq.gz\", \"data/wpl1eco1_G2M_R1.fastq.gz\")\ndownload.file(\"ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR876/005/SRR8769555/SRR8769555_2.fastq.gz\", \"data/wpl1eco1_G2M_R2.fastq.gz\")",
    "crumbs": [
      "Advanced Hi-C topics",
      "Workflow 1: Distance-dependent interactions across yeast mutants"
    ]
  },
  {
    "objectID": "pages/workflow-yeast.html#processing-reads-with-hicool",
    "href": "pages/workflow-yeast.html#processing-reads-with-hicool",
    "title": "Workflow 1: Distance-dependent interactions across yeast mutants",
    "section": "Processing reads with HiCool",
    "text": "Processing reads with HiCool\nWe will map each pair of fastqs on the yeast genome reference (R64-1-1) using HiCool.\n\n# !! This code is not actually executed !!\nlibrary(HiCool)\nsamples &lt;- c(\n    'WT_G1_rep1', \n    'WT_G1_rep2', \n    'WT_G2M_rep1', \n    'WT_G2M_rep2', \n    'wpl1_G2M_rep1', \n    'wpl1_G2M_rep2', \n    'wpl1eco1_G2M' \n)\npurrr::map(samples, ~ HiCool(\n    r1 = paste0('data/', .x, '_R1.fastq.gz'), \n    r2 = paste0('data/', .x, '_R2.fastq.gz'), \n    genome = 'R64-1-1', \n    restriction = 'DpnII', \n    iterative = FALSE, \n    threads = 15, \n    output = 'data/HiCool/', \n    scratch = '/data/scratch/'\n))\n\nProcessed samples are put in data/HiCool directory. CoolFile objects are pointers to individual contact matrices. We can create such objects by using the importHiCoolFolder utility function.\n\ncfs &lt;- list(\n    WT_G1_rep1 = importHiCoolFolder('data/HiCool', 'GK8ISZ'), \n    WT_G1_rep2 = importHiCoolFolder('data/HiCool', 'SWZTO0'), \n    WT_G2M_rep1 = importHiCoolFolder('data/HiCool', '3KHHUE'), \n    WT_G2M_rep2 = importHiCoolFolder('data/HiCool', 'UVNG7M'), \n    wpl1_G2M_rep1 = importHiCoolFolder('data/HiCool', 'Q4KX6Z'), \n    wpl1_G2M_rep2 = importHiCoolFolder('data/HiCool', '3N0L25'), \n    wpl1eco1_G2M = importHiCoolFolder('data/HiCool', 'LHMXWE')\n)\ncfs\n\nNow that these pointers have been defined, Hi-C contact matrices can be seamlessly imported in R with import.\n\nlibrary(purrr)\nlibrary(HiCExperiment)\nhics &lt;- map(cfs, import)\nhics\n## $WT_G1_rep1\n## `HiCExperiment` object with 5,454,145 contacts over 12,079 regions\n## -------\n## fileName: \"../OHCA-data/HiCool/matrices/W303_G1_WT_rep1^mapped-S288c^GK8ISZ.mcool\"\n## focus: \"whole genome\"\n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 1000\n## interactions: 3347524\n## scores(2): count balanced\n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0)\n## pairsFile: ../OHCA-data/HiCool/pairs/W303_G1_WT_rep1^mapped-S288c^GK8ISZ.pairs\n## metadata(3): log args stats\n## \n## $WT_G1_rep2\n## `HiCExperiment` object with 12,068,214 contacts over 12,079 regions\n## -------\n## fileName: \"../OHCA-data/HiCool/matrices/W303_G1_WT_rep2^mapped-S288c^SWZTO0.mcool\"\n## focus: \"whole genome\"\n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 1000\n## interactions: 6756099\n## scores(2): count balanced\n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0)\n## pairsFile: ../OHCA-data/HiCool/pairs/W303_G1_WT_rep2^mapped-S288c^SWZTO0.pairs\n## metadata(3): log args stats\n## \n## ...",
    "crumbs": [
      "Advanced Hi-C topics",
      "Workflow 1: Distance-dependent interactions across yeast mutants"
    ]
  },
  {
    "objectID": "pages/workflow-yeast.html#plotting-chromosome-wide-matrices-of-merged-replicates",
    "href": "pages/workflow-yeast.html#plotting-chromosome-wide-matrices-of-merged-replicates",
    "title": "Workflow 1: Distance-dependent interactions across yeast mutants",
    "section": "Plotting chromosome-wide matrices of merged replicates",
    "text": "Plotting chromosome-wide matrices of merged replicates\nWe can merge replicates with the merge function, and map the plotMatrix function over the resulting list of HiCExperiments.\n\nlibrary(HiContacts)\nchr &lt;- 'X'\nmerged_replicates &lt;- list(\n    WT_G1 = merge(hics[[1]][chr], hics[[2]][chr]), \n    WT_G2M = merge(hics[[3]][chr], hics[[4]][chr]), \n    wpl1_G2M = merge(hics[[5]][chr], hics[[6]][chr]), \n    wpl1eco1_G2M = hics[[7]][chr]\n)\nlibrary(dplyr)\nlibrary(ggplot2)\nmaps &lt;- imap(merged_replicates, ~ plotMatrix(\n    .x, use.scores = 'balanced', limits = c(-3.5, -1.5), caption = FALSE\n) + ggtitle(.y))\ncowplot::plot_grid(plotlist = maps, nrow = 1)\n\n\nWe can already note that long-range contacts seem to increase in frequency, in G2/M vs G1, in wpl1 vs WT and in wpl1/eco1 vs wpl1.",
    "crumbs": [
      "Advanced Hi-C topics",
      "Workflow 1: Distance-dependent interactions across yeast mutants"
    ]
  },
  {
    "objectID": "pages/workflow-yeast.html#compute-ps-per-replicate-and-plot-it",
    "href": "pages/workflow-yeast.html#compute-ps-per-replicate-and-plot-it",
    "title": "Workflow 1: Distance-dependent interactions across yeast mutants",
    "section": "Compute P(s) per replicate and plot it",
    "text": "Compute P(s) per replicate and plot it\nStill using the map function, we can compute average P(s) for each replicate.\nComputation of the P(s) will take some time, as millions of pairs have to be imported in memory, but it will be accurate at the base resolution, rather than bin resolution from matrices.\n\n\n\n\n\n\nNoteNote\n\n\n\nSince matrices were imported after HiCool processing with the importHiCoolFolder, the associated .pairs file has been automatically added to each HiCExperiment object!\n\n\nThe computed P(s) is stored for each sample as a tibble.\n\npairsFile(hics[[1]])\nps &lt;- imap(hics, ~ distanceLaw(.x) |&gt; mutate(sample = .y))\n## Importing pairs file ../OHCA-data/HiCool/pairs/W303_G1_WT_rep1^mapped-S288c^GK8ISZ.pairs in memory. This may take a while...\n## |===============================================================| 100% 318 MB\n## Importing pairs file ../OHCA-data/HiCool/pairs/W303_G1_WT_rep2^mapped-S288c^SWZTO0.pairs in memory. This may take a while...\n## |===============================================================| 100% 674 MB\n## Importing pairs file ../OHCA-data/HiCool/pairs/W303_G2M_WT_rep1^mapped-S288c^3KHHUE.pairs in memory. This may take a while...\n## |===============================================================| 100% 709 MB\n## Importing pairs file ../OHCA-data/HiCool/pairs/W303_G2M_WT_rep2^mapped-S288c^UVNG7M.pairs in memory. This may take a while...\n## |==============================================================| 100% 1683 MB\n## Importing pairs file ../OHCA-data/HiCool/pairs/W303_G2M_wpl1_rep1^mapped-S288c^Q4KX6Z.pairs in memory. This may take a while...\n## |==============================================================| 100% 1269 MB\n## Importing pairs file ../OHCA-data/HiCool/pairs/W303_G2M_wpl1_rep2^mapped-S288c^3N0L25.pairs in memory. This may take a while...\n## |==============================================================| 100% 1529 MB\n## Importing pairs file ../OHCA-data/HiCool/pairs/W303_G2M_wpl1-eco1^mapped-S288c^LHMXWE.pairs in memory. This may take a while...\n## |==============================================================| 100% 1036 MB\nps[[1]]\n## # A tibble: 133 x 6\n##   binned_distance          p     norm_p norm_p_unity slope sample\n##             &lt;dbl&gt;      &lt;dbl&gt;      &lt;dbl&gt;        &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;\n## 1               1 0.000154   0.000154         249.   0     WT_G1_rep1\n## 2               2 0.0000563  0.0000563         91.2  0.702 WT_G1_rep1\n## 3               3 0.0000417  0.0000417         67.5  0.699 WT_G1_rep1\n## 4               4 0.00000835 0.00000835        13.5  0.696 WT_G1_rep1\n## 5               5 0.00000501 0.00000501         8.10 0.693 WT_G1_rep1\n## 6               6 0.00000250 0.00000250         4.05 0.690 WT_G1_rep1\n## # ... with 127 more rows\n\nWe can bind all tibbles together and plot P(s) and their slope for each sample.\n\ndf &lt;- list_rbind(ps)\nplotPs(\n    df, aes(x = binned_distance, y = norm_p, \n    group = sample, color = sample)\n)\nplotPsSlope(\n    df, aes(x = binned_distance, y = slope, \n    group = sample, color = sample)\n)",
    "crumbs": [
      "Advanced Hi-C topics",
      "Workflow 1: Distance-dependent interactions across yeast mutants"
    ]
  },
  {
    "objectID": "pages/workflow-yeast.html#correlation-between-replicates-with-hicrep",
    "href": "pages/workflow-yeast.html#correlation-between-replicates-with-hicrep",
    "title": "Workflow 1: Distance-dependent interactions across yeast mutants",
    "section": "Correlation between replicates with hicrep\n",
    "text": "Correlation between replicates with hicrep\n\nhicrep is a popular package to compute stratum-adjusted correlations between Hi-C datasets. “Stratum” refers to the distance from the main diagonal: with increase distance from the main diagonal, interactions of the DNA polymer are bound to decrease. hicrep computes a “per-stratum” correlation score and computes a weighted average correlation for entire chromosomes.\nWe can check the documentation for hicrep main function, get.scc. This tells us that mat1 and mat2 n*n intrachromosomal contact maps of raw counts should be provided. Fortunately, HiCExperiment objects can easily be coerced into actual dense matrices using as.matrix() function.\n\n\n\n\n\n\nImportantImportant\n\n\n\nMake sure to use the count scores, which are required by hicrep.\n\n\nWe can calculate the overall stratum-corrected correlation score over the chromosome IV between the two G2M WT replicates.\n\nlibrary(hicrep)\nscc &lt;- get.scc(\n    hics[['WT_G2M_rep1']][\"IV\"] |&gt; as.matrix(sparse = TRUE, use.scores = 'count'), \n    hics[['WT_G2M_rep2']][\"IV\"] |&gt; as.matrix(sparse = TRUE, use.scores = 'count'), \n    resol = 1000, h = 2, lbr = 5000, ubr = 50000\n)\nnames(scc)\n## [1] \"corr\" \"wei\"  \"scc\"  \"std\"\nscc$scc\n##           [,1]\n## [1,] 0.9785691\n\nThis can be generalized to all pairwise combinations of Hi-C datasets.\n\nlibrary(purrr)\nlibrary(dplyr)\nlibrary(ggplot2)\nmats &lt;- map(hics, ~ .x[\"IV\"] |&gt; as.matrix(use.scores = 'count', sparse = TRUE))\ndf &lt;- map(1:7, function(i) {\n    map(1:7, function(j) {\n        data.frame(\n            i = names(hics)[i], \n            j = names(hics)[j], \n            scc = hicrep::get.scc(mats[[i]], mats[[j]], resol = 1000, h = 2, lbr = 5000, ubr = 200000)$scc\n        ) |&gt;\n            mutate(i = factor(i, names(cfs))) |&gt;\n            mutate(j = factor(j, names(cfs)))\n    }) |&gt; list_rbind()\n}) |&gt; list_rbind()\nggplot(df, aes(x = i, y = j, fill = scc)) + \n    geom_tile() + \n    scale_x_discrete(guide = guide_axis(angle = 90)) + \n    theme_bw() + \n    coord_fixed(ratio = 1) + \n    scale_fill_gradientn(colours = bgrColors())\n\n\nWe can even iterate over an extra level, to compute stratum-corrected correlation for all chromosomes. Here, we will only compute correlation scores between any sample and WT_G2M_rep1 sample.\n\n\n\n\n\n\nTipParallelizing over chromosomes\n\n\n\nBiocParallel::bplapply() replaces purrr::map() here, as it allows parallelization of independent correlation computation runs over multiple CPUs.\n\n\n\n# Some chromosomes will be ignored as they are too small for this analysis \nchrs &lt;- c('II', 'IV', 'V', 'VII', 'VIII', 'IX', 'X', 'XI', 'XIII', 'XIV', 'XVI')\nbpparam &lt;- BiocParallel::MulticoreParam(workers = 6, progressbar = TRUE)\ndf &lt;- BiocParallel::bplapply(chrs, function(CHR) {\n    mats &lt;- map(hics, ~ .x[CHR] |&gt; interactions() |&gt; gi2cm('count') |&gt; cm2matrix())\n\n        map(c(1, 2, 4, 5, 6, 7), function(j) {\n            data.frame(\n                chr = CHR,\n                i = \"WT_G2M_rep1\", \n                j = names(mats)[j], \n                dist = seq(5000, 200000, 1000),\n                scc = hicrep::get.scc(mats[[\"WT_G2M_rep1\"]], mats[[j]], resol = 1000, h = 2, lbr = 5000, ubr = 200000) \n            ) |&gt; mutate(j = factor(j, names(mats)))\n        }) |&gt; list_rbind()\n\n}, BPPARAM = bpparam) |&gt; list_rbind()\n\nA tiny bit of data wrangling will allow us to plot the mean +/- confidence interval (90%) of stratum-adjusted correlations across the different chromosomes.\n\nresults &lt;- group_by(df, j, dist) |&gt; \n    summarize(\n        mean = Rmisc::CI(scc.corr, ci = 0.90)[2], \n        CI_up = Rmisc::CI(scc.corr, ci = 0.90)[1], \n        CI_down = Rmisc::CI(scc.corr, ci = 0.90)[3]\n    )\nggplot(results, aes(x = dist, y = mean, ymax = CI_up, ymin = CI_down)) + \n    geom_line(aes(col = j)) + \n    geom_ribbon(aes(fill = j), alpha = 0.2, col = NA) + \n    theme_bw() + \n    labs(x = \"Stratum (genomic distance)\", y = 'Stratum-corrected correlation')",
    "crumbs": [
      "Advanced Hi-C topics",
      "Workflow 1: Distance-dependent interactions across yeast mutants"
    ]
  },
  {
    "objectID": "pages/workflow-yeast.html#differential-interaction-di-analysis-with-multihiccompare",
    "href": "pages/workflow-yeast.html#differential-interaction-di-analysis-with-multihiccompare",
    "title": "Workflow 1: Distance-dependent interactions across yeast mutants",
    "section": "Differential interaction (DI) analysis with multiHiCcompare\n",
    "text": "Differential interaction (DI) analysis with multiHiCcompare\n\nWe will now focus on the chromosome XI and identify differentially interacting (DI) loci between WT and wpl1 mutant in G2/M.\nTo do this, we can use the multiHiCcompare package. The required input for the main make_hicexp() function is a list of raw counts for different samples/replicates, stored in data frames with four columns (chr, start1, start2, count).\nAlthough this data structure does not correspond to a standard HiC format, it is easy to manipulate a HiCExperiment object to coerce it into such structure.\n\nlibrary(multiHiCcompare)\nhics_list &lt;- map(hics, ~ .x['XI'] |&gt; \n    zoom(2000) |&gt; \n    as.data.frame() |&gt;\n    select(start1, start2, count) |&gt; \n    mutate(chr = 1) |&gt; \n    relocate(chr)\n)\nmhicc &lt;- make_hicexp(\n    data_list = hics_list[c(3, 4, 5, 6)], \n    groups = factor(c(1, 1, 2, 2)\n), A.min = 1)\n\nThe mhicc object contains data over the chromosome XI binned at 2kb for two pairs of replicates (WT or wpl1 G2/M HiC, each in duplicates):\n\nGroup1 contains WT data\nGroup2 contains wpl1 data\n\nTo identify differential interactions, the actual statistical comparison is performed with the hic_exactTest() function.\n\nresults &lt;- cyclic_loess(mhicc, span = 0.2) |&gt; hic_exactTest()\n## |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s\n## |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=05s\nresults \n## Hi-C Experiment Object\n## 2 experimental groups\n## Group 1 has 2 samples\n## Group 2 has 2 samples\n## Data has been normalized\n\nThe results() output is not very informative as it is. It requires a little bit of reformatting to be able to extract valuable insights from it.\n\ndf &lt;- left_join(results@hic_table, results(results)) |&gt; \n    mutate(dist = region2 - region1) |&gt; \n    mutate(group = case_when(\n        region1 &lt; 430000 & region2 &gt; 450000 ~ 'inter_arms',\n        region1 &gt;= 430000 & region2 &lt;= 450000 ~ 'at_centro',\n        TRUE ~ 'arms'\n    )) |&gt; \n    filter(group %in% c('arms', 'inter_arms')) |&gt; \n    mutate(sign = p.value &lt;= 0.05 & abs(logFC) &gt;= 1)\ndf\n## chr region1 region2 D   IF1   IF2   IF3   IF4   logFC  logCPM  p.value    p.adj dist group  sign\n##   1       1       1 0  6.16  2.09  7.96  5.43  0.5401 4.81329 5.38e-01 7.94e-01    0  arms FALSE\n##   1       1    2001 1 16.38 10.25 12.96 12.16 -0.2257 5.82484 7.00e-01 8.81e-01 2000  arms FALSE\n##   1       1    4001 2 41.41 40.72 84.41 45.14  0.5064 7.69885 5.94e-02 2.16e-01 4000  arms FALSE\n##   1       1    6001 3 22.26 30.51 73.83 48.48  1.2726 8.10243 6.48e-07 5.83e-05 6000  arms  TRUE\n##   1       1    8001 4 26.63 31.20 33.39 25.92  0.0998 7.55207 8.02e-01 9.34e-01 8000  arms FALSE\n## ...\nggplot(df, aes(x = logFC, y = -log10(p.value), col = sign)) + \n    geom_point(size = 0.2) + \n    theme_bw() + \n    facet_wrap(~group) + \n    ylim(c(0, 6)) + \n    theme(legend.position = 'none') + \n    scale_color_manual(values = c('grey', 'black'))\n\n\nIn this volcano plot, we can visually appreciate the fold-change of interaction frequency in WT or wpl1, for interactions constrained within the chromosome XI arms (left) or spanning the chr. XI centromere (right). This clearly highlights that interactions within arms are increased in wpl1 mutant while those spanning the centromere strongly decreased.\n\nOne of the strengths of HiContacts is that it can be leveraged to visualize any quantification related to genomic interactions as a HiC heatmap, since plotMatrix can take a GInteractions object with any score saved in mcols as input.\n\ngis &lt;- rename(df, seqnames1 = chr, start1 = region1, start2 = region2) |&gt; \n    mutate(\n        seqnames2 = seqnames1, \n        end1 = start1 + 1999, \n        end2 = start2 + 1999\n    ) |&gt; \n    filter(abs(logFC) &gt;= 1) |&gt;\n    df2gi() \ncowplot::plot_grid(\n    plotMatrix(merged_replicates[['WT_G2M']], use.scores = 'balanced', limits = c(-3.5, -1), caption = FALSE),\n    plotMatrix(merged_replicates[['wpl1_G2M']], use.scores = 'balanced', limits = c(-3.5, -1), caption = FALSE),\n    plotMatrix(gis, use.scores = 'logFC', scale = 'linear', limits = c(-2, 2), cmap = bgrColors()), \n    align = \"hv\", axis = 'tblr', nrow = 1\n)",
    "crumbs": [
      "Advanced Hi-C topics",
      "Workflow 1: Distance-dependent interactions across yeast mutants"
    ]
  },
  {
    "objectID": "pages/workflow-chicken.html",
    "href": "pages/workflow-chicken.html",
    "title": "Workflow 2: Chromosome compartment cohesion upon mitosis entry",
    "section": "",
    "text": "Importing data\nThe 4DN consortium provides access to the datasets published in Gibcus et al. (2018). in R, they can be obtained thanks to the fourDNData gateway package.\nlibrary(HiCExperiment)\nlibrary(fourDNData)\nlibrary(BiocParallel)\nsamples &lt;- list(\n    '4DNES9LEZXN7' = 'G2 block', \n    '4DNESNWWIFZU' = 'prophase (5m)', \n    '4DNESGDXKM2I' = 'prophase (10m)', \n    '4DNESIR416OW' = 'prometaphase (15m)', \n    '4DNESS8PTK6F' = 'prometaphase (30m)' \n)\nbpparam &lt;- MulticoreParam(workers = 5, progressbar = TRUE)\n##  Warning:   'IS_BIOC_BUILD_MACHINE' environment variable detected, setting\n##    BiocParallel workers to 4 (was 5)\nhics &lt;- bplapply(names(samples), fourDNHiCExperiment, BPPARAM = bpparam)\n##  \n  |                                                                         \n  |                                                                   |   0%\n  |                                                                         \n  |=============                                                      |  20%\n  |                                                                         \n  |===========================                                        |  40%\n  |                                                                         \n  |========================================                           |  60%\n  |                                                                         \n  |======================================================             |  80%\n  |                                                                         \n  |===================================================================| 100%\nnames(hics) &lt;- samples\n\nhics[[\"G2 block\"]]\n##  `HiCExperiment` object with 150,494,008 contacts over 4,109 regions \n##  -------\n##  fileName: \"/home/biocbuild/.cache/R/fourDNData/348e2b4c1f37d5_4DNFIT479GDR.mcool\" \n##  focus: \"whole genome\" \n##  resolutions(13): 1000 2000 ... 5000000 10000000\n##  active resolution: 250000 \n##  interactions: 7262748 \n##  scores(2): count balanced \n##  topologicalFeatures: compartments(891) borders(3465) \n##  pairsFile: N/A \n##  metadata(3): 4DN_info eigens diamond_insulation",
    "crumbs": [
      "Advanced Hi-C topics",
      "Workflow 2: Chromosome compartment cohesion upon mitosis entry"
    ]
  },
  {
    "objectID": "pages/workflow-chicken.html#importing-data",
    "href": "pages/workflow-chicken.html#importing-data",
    "title": "Workflow 2: Chromosome compartment cohesion upon mitosis entry",
    "section": "",
    "text": "WarningBeware\n\n\n\nThe first time the following chunk of code is executed, it will cache a large amount of data (mostly consisting of contact matrices stored in .mcool files).",
    "crumbs": [
      "Advanced Hi-C topics",
      "Workflow 2: Chromosome compartment cohesion upon mitosis entry"
    ]
  },
  {
    "objectID": "pages/workflow-chicken.html#plotting-whole-chromosome-matrices",
    "href": "pages/workflow-chicken.html#plotting-whole-chromosome-matrices",
    "title": "Workflow 2: Chromosome compartment cohesion upon mitosis entry",
    "section": "Plotting whole chromosome matrices",
    "text": "Plotting whole chromosome matrices\nWe can visualize the five different Hi-C maps on the entire chromosome 3 with HiContacts by iterating over each of the HiCExperiment objects.\n\nlibrary(purrr)\nlibrary(HiContacts)\n##  Registered S3 methods overwritten by 'readr':\n##    method                    from \n##    as.data.frame.spec_tbl_df vroom\n##    as_tibble.spec_tbl_df     vroom\n##    format.col_spec           vroom\n##    print.col_spec            vroom\n##    print.collector           vroom\n##    print.date_names          vroom\n##    print.locale              vroom\n##    str.col_spec              vroom\nlibrary(ggplot2)\npl &lt;- imap(hics, ~ .x['chr3'] |&gt; \n    zoom(100000) |&gt; \n    plotMatrix(use.scores = 'balanced', limits = c(-4, -1), caption = FALSE) + \n    ggtitle(.y)\n)\nlibrary(cowplot)\nplot_grid(plotlist = pl, nrow = 1)\n\n\n\n\n\n\n\nThis highlights the progressive remodeling of chromatin into condensed chromosomes, starting as soon as 5’ after release from G2 phase.",
    "crumbs": [
      "Advanced Hi-C topics",
      "Workflow 2: Chromosome compartment cohesion upon mitosis entry"
    ]
  },
  {
    "objectID": "pages/workflow-chicken.html#zooming-on-a-chromosome-section",
    "href": "pages/workflow-chicken.html#zooming-on-a-chromosome-section",
    "title": "Workflow 2: Chromosome compartment cohesion upon mitosis entry",
    "section": "Zooming on a chromosome section",
    "text": "Zooming on a chromosome section\nZooming on a chromosome section, we can plot the Hi-C autocorrelation matrix for each timepoint. These matrices are generally used to highlight the overall correlation of interaction profiles between different segments of a chromosome section (see Chapter 5 for more details).\n\n## --- Format compartment positions of chr. 4 segment\n.chr &lt;- 'chr4'\n.start &lt;- 59000000L\n.stop &lt;- 75000000L\nlibrary(GenomicRanges)\n##  Loading required package: stats4\n##  Loading required package: S4Vectors\n##  \n##  Attaching package: 'S4Vectors'\n##  The following object is masked from 'package:HiCExperiment':\n##  \n##      metadata&lt;-\n##  The following object is masked from 'package:utils':\n##  \n##      findMatches\n##  The following objects are masked from 'package:base':\n##  \n##      I, expand.grid, unname\n##  Loading required package: IRanges\n##  \n##  Attaching package: 'IRanges'\n##  The following object is masked from 'package:purrr':\n##  \n##      reduce\n##  Loading required package: Seqinfo\ncoords &lt;- GRanges(paste0(.chr, ':', .start, '-', .stop))\ncompts_df &lt;- topologicalFeatures(hics[[\"G2 block\"]], \"compartments\") |&gt; \n    subsetByOverlaps(coords, type = 'within') |&gt; \n    as.data.frame()\ncompts_gg &lt;- geom_rect(\n    data = compts_df, \n    mapping = aes(xmin = start, xmax = end, ymin = -500000, ymax = 0, alpha = compartment), \n    col = 'black', inherit.aes = FALSE\n)\n\n## --- Subset contact matrices to chr. 4 segment and computing autocorrelation scores\ng2 &lt;- hics[[\"G2 block\"]] |&gt; \n    zoom(100000) |&gt; \n    subsetByOverlaps(coords) |&gt;\n    autocorrelate()\npro5 &lt;- hics[[\"prophase (5m)\"]] |&gt; \n    zoom(100000) |&gt; \n    subsetByOverlaps(coords) |&gt;\n    autocorrelate()\npro30 &lt;- hics[[\"prometaphase (30m)\"]] |&gt; \n    zoom(100000) |&gt; \n    subsetByOverlaps(coords) |&gt;\n    autocorrelate()\n\n## --- Plot autocorrelation matrices\nplot_grid(\n    plotMatrix(\n        subsetByOverlaps(g2, coords),\n        use.scores = 'autocorrelated', \n        scale = 'linear', \n        limits = c(-1, 1), \n        cmap = bwrColors(), \n        maxDistance = 10000000, \n        caption = FALSE\n    ) + ggtitle('G2') + compts_gg,\n    plotMatrix(\n        subsetByOverlaps(pro5, coords),\n        use.scores = 'autocorrelated', \n        scale = 'linear', \n        limits = c(-1, 1), \n        cmap = bwrColors(), \n        maxDistance = 10000000, \n        caption = FALSE\n    ) + ggtitle('Prophase 5min') + compts_gg,\n    plotMatrix(\n        subsetByOverlaps(pro30, coords),\n        use.scores = 'autocorrelated', \n        scale = 'linear', \n        limits = c(-1, 1), \n        cmap = bwrColors(), \n        maxDistance = 10000000, \n        caption = FALSE\n    ) + ggtitle('Prometaphase 30min') + compts_gg,\n    nrow = 1\n)\n##  Warning: Using alpha for a discrete variable is not advised.\n##  Using alpha for a discrete variable is not advised.\n##  Using alpha for a discrete variable is not advised.\n\n\n\n\n\n\n\nThese correlation matrices suggest that there are two different regimes of chromatin compartment remodeling in this chromosome section:\n\nCorrelation scores between genomic bins within the compartment A remain positive 5’ after G2 release (albeit reduced compared to G2 block) and eventually become null 30’ after G2 release.\nCorrelation scores between genomic bins within the compartment B are overall null as soon as 5’ after G2 release.",
    "crumbs": [
      "Advanced Hi-C topics",
      "Workflow 2: Chromosome compartment cohesion upon mitosis entry"
    ]
  },
  {
    "objectID": "pages/workflow-chicken.html#generating-saddle-plots",
    "href": "pages/workflow-chicken.html#generating-saddle-plots",
    "title": "Workflow 2: Chromosome compartment cohesion upon mitosis entry",
    "section": "Generating saddle plots",
    "text": "Generating saddle plots\nSaddle plots are typically used to measure the observed vs. expected interaction scores within or between genomic loci belonging to A and B compartments. Here, they can be used to check whether the two regimes of chromatin compartment remodeling are observed genome-wide.\nNon-overlapping genomic windows are grouped by nbins quantiles (typically between 10 and 50 bins) according to their A/B compartment eigenvector value, from lowest eigenvector values (i.e. strongest B compartments) to highest eigenvector values (i.e. strongest A compartments). The average observed vs. expected interaction scores are computed for pairwise eigenvector quantiles and plotted in a 2D heatmap.\n\npl &lt;- imap(hics, ~ plotSaddle(.x, nbins = 38, BPPARAM = bpparam) + ggtitle(.y)) \n##  \n  |                                                                         \n  |                                                                   |   0%\n  |                                                                         \n  |==                                                                 |   3%\n  |                                                                         \n  |=====                                                              |   7%\n  |                                                                         \n  |=======                                                            |  10%\n  |                                                                         \n  |=========                                                          |  14%\n  |                                                                         \n  |============                                                       |  17%\n  |                                                                         \n  |==============                                                     |  21%\n  |                                                                         \n  |================                                                   |  24%\n  |                                                                         \n  |==================                                                 |  28%\n  |                                                                         \n  |=====================                                              |  31%\n  |                                                                         \n  |=======================                                            |  34%\n  |                                                                         \n  |=========================                                          |  38%\n  |                                                                         \n  |============================                                       |  41%\n  |                                                                         \n  |==============================                                     |  45%\n  |                                                                         \n  |================================                                   |  48%\n  |                                                                         \n  |===================================                                |  52%\n  |                                                                         \n  |=====================================                              |  55%\n  |                                                                         \n  |=======================================                            |  59%\n  |                                                                         \n  |==========================================                         |  62%\n  |                                                                         \n  |============================================                       |  66%\n  |                                                                         \n  |==============================================                     |  69%\n  |                                                                         \n  |=================================================                  |  72%\n  |                                                                         \n  |===================================================                |  76%\n  |                                                                         \n  |=====================================================              |  79%\n  |                                                                         \n  |=======================================================            |  83%\n  |                                                                         \n  |==========================================================         |  86%\n  |                                                                         \n  |============================================================       |  90%\n  |                                                                         \n  |==============================================================     |  93%\n  |                                                                         \n  |=================================================================  |  97%\n  |                                                                         \n  |===================================================================| 100%\n##  \n##  \n  |                                                                         \n  |                                                                   |   0%\n  |                                                                         \n  |==                                                                 |   3%\n  |                                                                         \n  |=====                                                              |   7%\n  |                                                                         \n  |=======                                                            |  10%\n  |                                                                         \n  |=========                                                          |  14%\n  |                                                                         \n  |============                                                       |  17%\n  |                                                                         \n  |==============                                                     |  21%\n  |                                                                         \n  |================                                                   |  24%\n  |                                                                         \n  |==================                                                 |  28%\n  |                                                                         \n  |=====================                                              |  31%\n  |                                                                         \n  |=======================                                            |  34%\n  |                                                                         \n  |=========================                                          |  38%\n  |                                                                         \n  |============================                                       |  41%\n  |                                                                         \n  |==============================                                     |  45%\n  |                                                                         \n  |================================                                   |  48%\n  |                                                                         \n  |===================================                                |  52%\n  |                                                                         \n  |=====================================                              |  55%\n  |                                                                         \n  |=======================================                            |  59%\n  |                                                                         \n  |==========================================                         |  62%\n  |                                                                         \n  |============================================                       |  66%\n  |                                                                         \n  |==============================================                     |  69%\n  |                                                                         \n  |=================================================                  |  72%\n  |                                                                         \n  |===================================================                |  76%\n  |                                                                         \n  |=====================================================              |  79%\n  |                                                                         \n  |=======================================================            |  83%\n  |                                                                         \n  |==========================================================         |  86%\n  |                                                                         \n  |============================================================       |  90%\n  |                                                                         \n  |==============================================================     |  93%\n  |                                                                         \n  |=================================================================  |  97%\n  |                                                                         \n  |===================================================================| 100%\n##  \n##  \n  |                                                                         \n  |                                                                   |   0%\n  |                                                                         \n  |==                                                                 |   3%\n  |                                                                         \n  |=====                                                              |   7%\n  |                                                                         \n  |=======                                                            |  10%\n  |                                                                         \n  |=========                                                          |  14%\n  |                                                                         \n  |============                                                       |  17%\n  |                                                                         \n  |==============                                                     |  21%\n  |                                                                         \n  |================                                                   |  24%\n  |                                                                         \n  |==================                                                 |  28%\n  |                                                                         \n  |=====================                                              |  31%\n  |                                                                         \n  |=======================                                            |  34%\n  |                                                                         \n  |=========================                                          |  38%\n  |                                                                         \n  |============================                                       |  41%\n  |                                                                         \n  |==============================                                     |  45%\n  |                                                                         \n  |================================                                   |  48%\n  |                                                                         \n  |===================================                                |  52%\n  |                                                                         \n  |=====================================                              |  55%\n  |                                                                         \n  |=======================================                            |  59%\n  |                                                                         \n  |==========================================                         |  62%\n  |                                                                         \n  |============================================                       |  66%\n  |                                                                         \n  |==============================================                     |  69%\n  |                                                                         \n  |=================================================                  |  72%\n  |                                                                         \n  |===================================================                |  76%\n  |                                                                         \n  |=====================================================              |  79%\n  |                                                                         \n  |=======================================================            |  83%\n  |                                                                         \n  |==========================================================         |  86%\n  |                                                                         \n  |============================================================       |  90%\n  |                                                                         \n  |==============================================================     |  93%\n  |                                                                         \n  |=================================================================  |  97%\n  |                                                                         \n  |===================================================================| 100%\n##  \n##  \n  |                                                                         \n  |                                                                   |   0%\n  |                                                                         \n  |==                                                                 |   3%\n  |                                                                         \n  |=====                                                              |   7%\n  |                                                                         \n  |=======                                                            |  10%\n  |                                                                         \n  |=========                                                          |  14%\n  |                                                                         \n  |============                                                       |  17%\n  |                                                                         \n  |==============                                                     |  21%\n  |                                                                         \n  |================                                                   |  24%\n  |                                                                         \n  |==================                                                 |  28%\n  |                                                                         \n  |=====================                                              |  31%\n  |                                                                         \n  |=======================                                            |  34%\n  |                                                                         \n  |=========================                                          |  38%\n  |                                                                         \n  |============================                                       |  41%\n  |                                                                         \n  |==============================                                     |  45%\n  |                                                                         \n  |================================                                   |  48%\n  |                                                                         \n  |===================================                                |  52%\n  |                                                                         \n  |=====================================                              |  55%\n  |                                                                         \n  |=======================================                            |  59%\n  |                                                                         \n  |==========================================                         |  62%\n  |                                                                         \n  |============================================                       |  66%\n  |                                                                         \n  |==============================================                     |  69%\n  |                                                                         \n  |=================================================                  |  72%\n  |                                                                         \n  |===================================================                |  76%\n  |                                                                         \n  |=====================================================              |  79%\n  |                                                                         \n  |=======================================================            |  83%\n  |                                                                         \n  |==========================================================         |  86%\n  |                                                                         \n  |============================================================       |  90%\n  |                                                                         \n  |==============================================================     |  93%\n  |                                                                         \n  |=================================================================  |  97%\n  |                                                                         \n  |===================================================================| 100%\n##  \n##  \n  |                                                                         \n  |                                                                   |   0%\n  |                                                                         \n  |==                                                                 |   3%\n  |                                                                         \n  |====                                                               |   7%\n  |                                                                         \n  |=======                                                            |  10%\n  |                                                                         \n  |=========                                                          |  13%\n  |                                                                         \n  |===========                                                        |  17%\n  |                                                                         \n  |=============                                                      |  20%\n  |                                                                         \n  |================                                                   |  23%\n  |                                                                         \n  |==================                                                 |  27%\n  |                                                                         \n  |====================                                               |  30%\n  |                                                                         \n  |======================                                             |  33%\n  |                                                                         \n  |=========================                                          |  37%\n  |                                                                         \n  |===========================                                        |  40%\n  |                                                                         \n  |=============================                                      |  43%\n  |                                                                         \n  |===============================                                    |  47%\n  |                                                                         \n  |==================================                                 |  50%\n  |                                                                         \n  |====================================                               |  53%\n  |                                                                         \n  |======================================                             |  57%\n  |                                                                         \n  |========================================                           |  60%\n  |                                                                         \n  |==========================================                         |  63%\n  |                                                                         \n  |=============================================                      |  67%\n  |                                                                         \n  |===============================================                    |  70%\n  |                                                                         \n  |=================================================                  |  73%\n  |                                                                         \n  |===================================================                |  77%\n  |                                                                         \n  |======================================================             |  80%\n  |                                                                         \n  |========================================================           |  83%\n  |                                                                         \n  |==========================================================         |  87%\n  |                                                                         \n  |============================================================       |  90%\n  |                                                                         \n  |===============================================================    |  93%\n  |                                                                         \n  |=================================================================  |  97%\n  |                                                                         \n  |===================================================================| 100%\nplot_grid(plotlist = pl, nrow = 1)\n\n\n\n\n\n\n\nThese plots confirm the previous observation made on chr. 4 and reveal that intra-B compartment interactions are generally lost 5’ after G2 release, while intra-A interactions take up to 15’ after G2 release to disappear.\n\n\n\n\n\n\nWarningBeware\n\n\n\nThe plotSaddle() function requires an eigenvector corresponding to A/B compartments. In this example, this eigenvector is recovered from the 4DN data portal. If not already available, this eigenvector can be computed from the contact matrix using the getCompartments() function.",
    "crumbs": [
      "Advanced Hi-C topics",
      "Workflow 2: Chromosome compartment cohesion upon mitosis entry"
    ]
  },
  {
    "objectID": "pages/workflow-chicken.html#quantifying-interactions-within-and-between-compartments",
    "href": "pages/workflow-chicken.html#quantifying-interactions-within-and-between-compartments",
    "title": "Workflow 2: Chromosome compartment cohesion upon mitosis entry",
    "section": "Quantifying interactions within and between compartments",
    "text": "Quantifying interactions within and between compartments\nWe can leverage the replicate-merged contact matrices to quantify the interaction frequencies within A or B compartments or between A and B compartments, at different timepoints.\nWe can use the A/B compartment annotations obtained at the G2 block timepoint and extract O/E (observed vs expected) scores for interactions within A or B compartments or between A and B compartments, at different timepoints.\n\n## --- Extract the A/B compartments identified in G2 block\ncompts &lt;- topologicalFeatures(hics[[\"G2 block\"]], \"compartments\")\ncompts$ID &lt;- paste0(compts$compartment, seq_along(compts))\n\n## --- Iterate over timepoints to extract `detrended` (O/E) scores and \n##     compartment annotations\nlibrary(tibble)\nlibrary(plyranges)\n##  Loading required package: dplyr\n##  \n##  Attaching package: 'dplyr'\n##  The following objects are masked from 'package:GenomicRanges':\n##  \n##      intersect, setdiff, union\n##  The following object is masked from 'package:Seqinfo':\n##  \n##      intersect\n##  The following objects are masked from 'package:IRanges':\n##  \n##      collapse, desc, intersect, setdiff, slice, union\n##  The following objects are masked from 'package:S4Vectors':\n##  \n##      first, intersect, rename, setdiff, setequal, union\n##  The following objects are masked from 'package:dbplyr':\n##  \n##      ident, sql\n##  The following objects are masked from 'package:BiocGenerics':\n##  \n##      combine, intersect, setdiff, setequal, union\n##  The following object is masked from 'package:generics':\n##  \n##      explain\n##  The following objects are masked from 'package:stats':\n##  \n##      filter, lag\n##  The following objects are masked from 'package:base':\n##  \n##      intersect, setdiff, setequal, union\n##  \n##  Attaching package: 'plyranges'\n##  The following objects are masked from 'package:dplyr':\n##  \n##      between, n, n_distinct\ndf &lt;- imap(hics[c(1, 2, 5)], ~ {\n    ints &lt;- cis(.x) |&gt; ## Filter out trans interactions\n        detrend() |&gt; ## Compute O/E scores\n        interactions() ## Recover interactions \n    ints$comp_first &lt;- join_overlap_left(anchors(ints, \"first\"), compts)$ID\n    ints$comp_second &lt;- join_overlap_left(anchors(ints, \"second\"), compts)$ID\n    tibble(\n        sample = .y, \n        bin1 = ints$comp_first, \n        bin2 = ints$comp_second, \n        dist = InteractionSet::pairdist(ints), \n        OE = ints$detrended \n    ) |&gt; \n        filter(dist &gt; 5e6) |&gt;\n        mutate(type = dplyr::case_when(\n            grepl('A', bin1) & grepl('A', bin2) ~ 'AA',\n            grepl('B', bin1) & grepl('B', bin2) ~ 'BB',\n            grepl('A', bin1) & grepl('B', bin2) ~ 'AB',\n            grepl('B', bin1) & grepl('A', bin2) ~ 'BA'\n        )) |&gt; \n        filter(bin1 != bin2)\n}) |&gt; list_rbind() |&gt; mutate(\n    sample = factor(sample, names(hics)[c(1, 2, 5)])\n)\n\nWe can now plot the changes in O/E scores for intra-A, intra-B, A-B or B-A interactions, splitting boxplots by timepoint.\n\nggplot(df, aes(x = type, y = OE, group = type, fill = type)) + \n    geom_boxplot(outlier.shape = NA) + \n    facet_grid(~sample) + \n    theme_bw() + \n    ylim(c(-2, 2))\n##  Warning: Removed 66307 rows containing non-finite outside the scale range\n##  (`stat_boxplot()`).\n\n\n\n\n\n\n\nThis visualization suggests that interactions between genomic loci belonging to the B compartment are lost more rapidly than those between genomic loci belonging to the A compartment, when cells are released from G2 to enter mitosis.",
    "crumbs": [
      "Advanced Hi-C topics",
      "Workflow 2: Chromosome compartment cohesion upon mitosis entry"
    ]
  },
  {
    "objectID": "pages/workflow-centros.html",
    "href": "pages/workflow-centros.html",
    "title": "Workflow 3: Inter-centromere interactions in yeast",
    "section": "",
    "text": "Importing Hi-C data and plotting contact matrices\nlibrary(HiContactsData)\nlibrary(HiContacts)\n##  Registered S3 methods overwritten by 'readr':\n##    method                    from \n##    as.data.frame.spec_tbl_df vroom\n##    as_tibble.spec_tbl_df     vroom\n##    format.col_spec           vroom\n##    print.col_spec            vroom\n##    print.collector           vroom\n##    print.date_names          vroom\n##    print.locale              vroom\n##    str.col_spec              vroom\nlibrary(purrr)\nlibrary(ggplot2)\nhics &lt;- list(\n    'G1' = import(HiContactsData('yeast_g1', 'mcool'), format = 'cool', resolution = 4000),\n    'G2M' = import(HiContactsData('yeast_g2m', 'mcool'), format = 'cool', resolution = 4000)\n)\n##  see ?HiContactsData and browseVignettes('HiContactsData') for documentation\n##  loading from cache\n##  see ?HiContactsData and browseVignettes('HiContactsData') for documentation\n##  loading from cache\nimap(hics, ~ plotMatrix(\n    .x, use.scores = 'balanced', limits = c(-4, -1), caption = FALSE\n) + ggtitle(.y))\n##  $G1\n\n\n\n\n\n\n##  \n##  $G2M\nWe can visually appreciate that inter-chromosomal interactions, notably between centromeres, are less prominent in G2/M.",
    "crumbs": [
      "Advanced Hi-C topics",
      "Workflow 3: Inter-centromere interactions in yeast"
    ]
  },
  {
    "objectID": "pages/workflow-centros.html#checking-ps-and-cistrans-interactions-ratio",
    "href": "pages/workflow-centros.html#checking-ps-and-cistrans-interactions-ratio",
    "title": "Workflow 3: Inter-centromere interactions in yeast",
    "section": "Checking P(s) and cis/trans interactions ratio",
    "text": "Checking P(s) and cis/trans interactions ratio\n\nlibrary(dplyr)\n##  \n##  Attaching package: 'dplyr'\n##  The following objects are masked from 'package:dbplyr':\n##  \n##      ident, sql\n##  The following object is masked from 'package:Biobase':\n##  \n##      combine\n##  The following object is masked from 'package:matrixStats':\n##  \n##      count\n##  The following objects are masked from 'package:GenomicRanges':\n##  \n##      intersect, setdiff, union\n##  The following object is masked from 'package:Seqinfo':\n##  \n##      intersect\n##  The following objects are masked from 'package:IRanges':\n##  \n##      collapse, desc, intersect, setdiff, slice, union\n##  The following objects are masked from 'package:S4Vectors':\n##  \n##      first, intersect, rename, setdiff, setequal, union\n##  The following objects are masked from 'package:BiocGenerics':\n##  \n##      combine, intersect, setdiff, setequal, union\n##  The following object is masked from 'package:generics':\n##  \n##      explain\n##  The following objects are masked from 'package:stats':\n##  \n##      filter, lag\n##  The following objects are masked from 'package:base':\n##  \n##      intersect, setdiff, setequal, union\npairs &lt;- list(\n    'G1' = PairsFile(HiContactsData('yeast_g1', 'pairs')),\n    'G2M' = PairsFile(HiContactsData('yeast_g2m', 'pairs')) \n)\n##  see ?HiContactsData and browseVignettes('HiContactsData') for documentation\n##  loading from cache\n##  see ?HiContactsData and browseVignettes('HiContactsData') for documentation\n##  loading from cache\nps &lt;- imap_dfr(pairs, ~ distanceLaw(.x, by_chr = TRUE) |&gt; \n    mutate(sample = .y) \n)\n##  Importing pairs file /home/biocbuild/.cache/R/ExperimentHub/348fc5481fa073_8630 in memory. This may take a while...\n##  Importing pairs file /home/biocbuild/.cache/R/ExperimentHub/348fc53818f0f4_8631 in memory. This may take a while...\nplotPs(ps, aes(x = binned_distance, y = norm_p, group = interaction(sample, chr), color = sample)) + \n    scale_color_manual(values = c('black', 'red'))\n##  Warning: Removed 2133 rows containing missing values or values outside the scale\n##  range (`geom_line()`).\n\n\n\n\n\n\nplotPsSlope(ps, ggplot2::aes(x = binned_distance, y = slope, group = interaction(sample, chr), color = sample)) + \n    scale_color_manual(values = c('black', 'red'))\n##  Warning: Removed 2183 rows containing missing values or values outside the scale\n##  range (`geom_line()`).\n\n\n\n\n\n\n\nThis confirms that interactions in cells synchronized in G2/M are enriched for 10-30kb-long interactions.\n\nratios &lt;- imap_dfr(hics, ~ cisTransRatio(.x) |&gt; mutate(sample = .y))\nggplot(ratios, aes(x = chr, y = trans_pct, fill = sample)) + \n    geom_col() + \n    labs(x = 'Chromosomes', y = \"% of trans interactions\") + \n    scale_y_continuous(labels = scales::percent) + \n    facet_grid(~sample)\n\n\n\n\n\n\n\nWe can also highlight that trans (inter-chromosomal) interactions are proportionally decreasing in G2/M-synchronized cells.",
    "crumbs": [
      "Advanced Hi-C topics",
      "Workflow 3: Inter-centromere interactions in yeast"
    ]
  },
  {
    "objectID": "pages/workflow-centros.html#centromere-virtual-4c-profiles",
    "href": "pages/workflow-centros.html#centromere-virtual-4c-profiles",
    "title": "Workflow 3: Inter-centromere interactions in yeast",
    "section": "Centromere virtual 4C profiles",
    "text": "Centromere virtual 4C profiles\n\ndata(centros_yeast)\nv4c_centro &lt;- imap_dfr(hics, ~ virtual4C(.x, GenomicRanges::resize(centros_yeast[2], 8000)) |&gt; \n    as_tibble() |&gt; \n    mutate(sample = .y) |&gt; \n    filter(seqnames == 'IV')\n) \nggplot(v4c_centro, aes(x = start, y = score, fill = sample)) +\n    geom_area() +\n    theme_bw() +\n    labs(\n        x = \"chrIV position\", \n        y = \"Contacts with chrII centromere\", \n        title = \"Interaction profile of chrII centromere\"\n    ) + \n    coord_cartesian(ylim = c(0, 0.015))",
    "crumbs": [
      "Advanced Hi-C topics",
      "Workflow 3: Inter-centromere interactions in yeast"
    ]
  },
  {
    "objectID": "pages/workflow-centros.html#aggregated-2d-signal-over-all-pairs-of-centromeres",
    "href": "pages/workflow-centros.html#aggregated-2d-signal-over-all-pairs-of-centromeres",
    "title": "Workflow 3: Inter-centromere interactions in yeast",
    "section": "Aggregated 2D signal over all pairs of centromeres",
    "text": "Aggregated 2D signal over all pairs of centromeres\nWe can start by computing all possible pairs of centromeres.\n\ncentros_pairs &lt;- lapply(1:length(centros_yeast), function(i) {\n    lapply(1:length(centros_yeast), function(j) {\n        S4Vectors::Pairs(centros_yeast[i], centros_yeast[j])\n    })\n}) |&gt; \n    do.call(c, args = _) |&gt;\n    do.call(c, args = _) |&gt; \n    InteractionSet::makeGInteractionsFromGRangesPairs()\ncentros_pairs &lt;- centros_pairs[anchors(centros_pairs, 'first') != anchors(centros_pairs, 'second')]\n\ncentros_pairs\n##  GInteractions object with 240 interactions and 0 metadata columns:\n##          seqnames1       ranges1     seqnames2       ranges2\n##              &lt;Rle&gt;     &lt;IRanges&gt;         &lt;Rle&gt;     &lt;IRanges&gt;\n##      [1]         I 151583-151641 ---        II 238361-238419\n##      [2]         I 151583-151641 ---       III 114322-114380\n##      [3]         I 151583-151641 ---        IV 449879-449937\n##      [4]         I 151583-151641 ---         V 152522-152580\n##      [5]         I 151583-151641 ---        VI 147981-148039\n##      ...       ...           ... ...       ...           ...\n##    [236]       XVI 556255-556313 ---        XI 440229-440287\n##    [237]       XVI 556255-556313 ---       XII 151366-151424\n##    [238]       XVI 556255-556313 ---      XIII 268222-268280\n##    [239]       XVI 556255-556313 ---       XIV 628588-628646\n##    [240]       XVI 556255-556313 ---        XV 326897-326955\n##    -------\n##    regions: 16 ranges and 0 metadata columns\n##    seqinfo: 17 sequences (1 circular) from R64-1-1 genome\n\nThen we can aggregate the Hi-C signal over each pair of centromeres.\n\naggr_maps &lt;- purrr::imap(hics, ~ {\n    aggr &lt;- aggregate(.x, centros_pairs, maxDistance = 1e999)\n    plotMatrix(\n        aggr, use.scores = 'balanced', limits = c(-5, -1), \n        cmap = HiContacts::rainbowColors(), \n        caption = FALSE\n    ) + ggtitle(.y)\n})\n##  Warning in valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE): GRanges object contains 120 out-of-bound ranges located on sequences I,\n##    III, V, VI, VIII, IX, XII, and XIV. Note that ranges located on a sequence\n##    whose length is unknown (NA) or on a circular sequence are not considered\n##    out-of-bound (use seqlengths() and isCircular() to get the lengths and\n##    circularity flags of the underlying sequences). You can use trim() to trim\n##    these ranges. See ?`trim,GenomicRanges-method` for more information.\n##  Warning in valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE): GRanges object contains 120 out-of-bound ranges located on sequences III,\n##    V, VI, VIII, IX, XII, XIV, and I. Note that ranges located on a sequence\n##    whose length is unknown (NA) or on a circular sequence are not considered\n##    out-of-bound (use seqlengths() and isCircular() to get the lengths and\n##    circularity flags of the underlying sequences). You can use trim() to trim\n##    these ranges. See ?`trim,GenomicRanges-method` for more information.\n##  Warning in valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE): GRanges object contains 240 out-of-bound ranges located on sequences I,\n##    III, V, VI, VIII, IX, XII, and XIV. Note that ranges located on a sequence\n##    whose length is unknown (NA) or on a circular sequence are not considered\n##    out-of-bound (use seqlengths() and isCircular() to get the lengths and\n##    circularity flags of the underlying sequences). You can use trim() to trim\n##    these ranges. See ?`trim,GenomicRanges-method` for more information.\n##  Going through preflight checklist...\n##  Parsing the entire contact matrice as a sparse matrix...\n##  Modeling distance decay...\n##  Filtering for contacts within provided targets...\n##  Warning in valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE): GRanges object contains 120 out-of-bound ranges located on sequences I,\n##    III, V, VI, VIII, IX, XII, and XIV. Note that ranges located on a sequence\n##    whose length is unknown (NA) or on a circular sequence are not considered\n##    out-of-bound (use seqlengths() and isCircular() to get the lengths and\n##    circularity flags of the underlying sequences). You can use trim() to trim\n##    these ranges. See ?`trim,GenomicRanges-method` for more information.\n##  Warning in valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE): GRanges object contains 120 out-of-bound ranges located on sequences III,\n##    V, VI, VIII, IX, XII, XIV, and I. Note that ranges located on a sequence\n##    whose length is unknown (NA) or on a circular sequence are not considered\n##    out-of-bound (use seqlengths() and isCircular() to get the lengths and\n##    circularity flags of the underlying sequences). You can use trim() to trim\n##    these ranges. See ?`trim,GenomicRanges-method` for more information.\n##  Warning in valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE): GRanges object contains 240 out-of-bound ranges located on sequences I,\n##    III, V, VI, VIII, IX, XII, and XIV. Note that ranges located on a sequence\n##    whose length is unknown (NA) or on a circular sequence are not considered\n##    out-of-bound (use seqlengths() and isCircular() to get the lengths and\n##    circularity flags of the underlying sequences). You can use trim() to trim\n##    these ranges. See ?`trim,GenomicRanges-method` for more information.\n##  Going through preflight checklist...\n##  Parsing the entire contact matrice as a sparse matrix...\n##  Modeling distance decay...\n##  Filtering for contacts within provided targets...\n\ncowplot::plot_grid(plotlist = aggr_maps, nrow = 1)",
    "crumbs": [
      "Advanced Hi-C topics",
      "Workflow 3: Inter-centromere interactions in yeast"
    ]
  },
  {
    "objectID": "pages/workflow-centros.html#aggregated-1d-interaction-profile-of-centromeres",
    "href": "pages/workflow-centros.html#aggregated-1d-interaction-profile-of-centromeres",
    "title": "Workflow 3: Inter-centromere interactions in yeast",
    "section": "Aggregated 1D interaction profile of centromeres",
    "text": "Aggregated 1D interaction profile of centromeres\nOne can generalize the previous virtual 4C plot, by extracting the interaction profile between all possible pairs of centromeres in each dataset.\n\ndf &lt;- map_dfr(1:{length(centros_yeast)-1}, function(i) {\n    centro1 &lt;- GenomicRanges::resize(centros_yeast[i], fix = 'center', 8000)\n    map_dfr({i+1}:length(centros_yeast), function(j) {\n        centro2 &lt;- GenomicRanges::resize(centros_yeast[j], fix = 'center', 80000)\n        gi &lt;- InteractionSet::GInteractions(centro1, centro2)\n        imap_dfr(hics, ~ .x[gi] |&gt; \n            interactions() |&gt; \n            as_tibble() |&gt;\n            mutate(\n                sample = .y, \n                center = center2 - start(GenomicRanges::resize(centro2, fix = 'center', 1))\n            ) |&gt; \n            select(sample, seqnames1, seqnames2, center, balanced)\n        )\n    })\n}) \nggplot(df, aes(x = center/1e3, y = balanced)) + \n    geom_line(aes(group = interaction(seqnames1, seqnames2)), alpha = 0.03, col = \"black\") + \n    geom_smooth(col = \"red\", fill = \"red\") + \n    theme_bw() + \n    theme(legend.position = 'none') + \n    labs(\n        x = \"Distance from centromere (kb)\", y = \"Normalized interaction frequency\", \n        title = \"Centromere pairwise interaction profiles\"\n    ) +\n    facet_grid(~sample)\n##  `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = \"cs\")'\n##  Warning: Removed 25 rows containing non-finite outside the scale range\n##  (`stat_smooth()`).\n##  Warning: Removed 8 rows containing missing values or values outside the scale range\n##  (`geom_line()`).",
    "crumbs": [
      "Advanced Hi-C topics",
      "Workflow 3: Inter-centromere interactions in yeast"
    ]
  }
]