Help for package immundata

Title:

A Unified Data Layer for Single-Cell, Spatial and Bulk Immunomics

Version:

0.0.1

Description:

Provides a unified data layer for single-cell, spatial and bulk T-cell and B-cell immune receptor repertoire data, integrating diverse data formats such as AIRR and raw sequencing files. Includes utilities for receptor aggregation, metadata normalization, and clonotype filtering.

License:

Apache License (≥ 2)

Encoding:

UTF-8

RoxygenNote:

7.3.2

Depends:

R (≥ 4.1.0)

Imports:

cli, R6, checkmate, dplyr, duckplyr, glue, lifecycle, readr, rlang

Suggests:

testthat (≥ 3.0.0)

Config/testthat/edition:

NeedsCompilation:

Packaged:

2025-04-07 23:22:15 UTC; vdn

Author:

Vadim I. Nazarov

[aut, cre]

Maintainer:

Vadim I. Nazarov <support@immunomind.com>

Repository:

CRAN

Date/Publication:

2025-04-08 15:40:07 UTC

immundata: A Unified Data Layer for Single-Cell, Spatial and Bulk Immunomics

Description

Author(s)

Maintainer: Vadim I. Nazarov support@immunomind.com (ORCID)

Internal Immundata Global Configuration

Description

IMD_GLOBALS is an internal list that stores globally used constants across the Immundata system. It is not intended for direct use by package users, but rather to ensure consistency in schema field names, default file names, and internal error messages.

Usage

IMD_GLOBALS

Format

An object of class list of length 3.

Components

messages: Named list of default messages and error texts (e.g., "NotImpl").
schema: Standardized column names for internal schema usage. These include:
- barcode: Column name for cell barcode IDs.
- receptor: Column name for receptor unique identifiers.
- repertoire: Column name for repertoire group IDs.
- metadata_filename: Column name for metadata files (internal).
- count: Column name for receptor count per group.
- filename: Original column name used in user metadata.
files: Default file names used to store structured Immundata:
- receptors: File name for receptor-level data (receptors.parquet).
- annotations: File name for annotation-level data (annotations.parquet).

ImmunData: A Unified Structure for Immune Receptor Repertoire Data

Description

ImmunData is an abstract R6 class for managing and transforming immune receptor repertoire data. It supports flexible backends (e.g., Arrow, DuckDB, dbplyr) and lazy evaluation, and provides tools for filtering, aggregation, and receptor-to-repertoire mapping.

Public fields

.receptors: A receptor-level table containing immune receptor features (e.g., CDR3, V/J gene, clonotype ID, counts). This table is typically aggregated and is used for quantitative analysis of immune repertoire signatures. It can be a local tibble, Arrow Table, DuckDB table, or any other dplyr-compatible backend (including lazy data sources).
.annotations: A barcode-level table that links each barcode (i.e., cell ID) to a receptor in .receptors. It can also store cell-level metadata such as sample ID, donor, or tissue source. This table is not aggregated and typically contains one row per barcode.
schema_receptor: A named list describing how to interpret receptor-level data. This includes the fields used for aggregation (e.g., CDR3, V_gene, J_gene), and optionally unique identifiers for each receptor row. Used to ensure consistency across processing steps.
schema_repertoire: A named list defining how barcodes or annotations should be grouped into repertoires. This may include sample-level metadata (e.g., sample_id, donor_id) used to define unique repertoires.

Active bindings

receptors: Accessor for the receptor-level table (.receptors).
annotations: Accessor for the annotation-level table (.annotations).

Methods

Method `new()`

Creates a new ImmunData object. This constructor expects receptor-level and barcode-level data, along with a receptor schema defining aggregation and identity fields.

Usage

ImmunData$new(receptors, annotations, schema)

Arguments

receptors: A receptor-level dataset (e.g., grouped by CDR3/V/J).
annotations: A cell/barcode-level dataset mapping barcodes to receptor rows.
schema: A named list specifying the receptor schema (e.g., aggregate fields, ID columns).

Method `build_repertoires()`

Defines repertoires by concatenating selected annotation columns.

Usage

ImmunData$build_repertoires(schema = "repertoire_id", sep = "-")

Arguments

schema: A character vector of column names in .annotations used to define repertoire grouping.
sep: A string separator used to concatenate multiple columns into a single repertoire ID.

Method `print()`

Prints class information for the ImmunData object.

Usage

ImmunData$print()

Method `filter_receptors()`

Filters the receptor-level data using tidyverse filter syntax, and then updates the annotation table to include only linked barcodes.

Usage

ImmunData$filter_receptors(...)

Arguments

...: Filtering conditions applied to the receptor-level table.

Returns

A new ImmunData object with filtered receptor and annotation tables.

Method `filter_annotations()`

Filters the annotation-level table using tidyverse filter syntax, and updates the receptor table to include only matching receptor entries.

Usage

ImmunData$filter_annotations(...)

Arguments

...: Filtering conditions applied to the annotations table.

Returns

A new ImmunData object with filtered annotation and receptor tables.

Method `filter_barcodes()`

Filters the dataset by a set of barcodes (unique cell IDs). The resulting object contains only rows linked to those barcodes.

Usage

ImmunData$filter_barcodes(barcodes = c())

Arguments

barcodes: A character vector of barcodes to retain.

Returns

A new ImmunData object filtered by barcode.

Method `clone()`

The objects of this class are cloneable with this method.

Usage

ImmunData$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Get Immundata Default File Names

Description

Returns the standardized default filenames for storing receptor-level and annotation-level data as used in load_repertoires() and related Immundata I/O functions.

Usage

imd_files()

Value

A named list of file names (e.g., receptors.parquet, annotations.parquet).

Get Immundata Internal Schema Field Names

Description

Returns the standardized field names used across Immundata objects and processing functions, as defined in IMD_GLOBALS$schema. These include column names for barcodes, receptors, repertoires, and related metadata.

Usage

imd_schema()

Value

A named list of schema field names.

Load a Saved ImmunData Object from Disk

Description

Loads an ImmunData object from a directory created by Immundata I/O functions (e.g., load_repertoires() with file splitting or saving). It reads receptor-level and annotation-level data from standardized Parquet files and reconstructs a new ImmunData object with inferred schema.

This function expects two files inside the provided directory:

receptors.parquet: contains the receptor-level table
annotations.parquet: contains the annotation-level table

These filenames are defined in imd_files() and follow the Immundata storage convention.

Usage

load_immundata(path, verbose = TRUE)

Arguments

path

Path to the folder containing the saved ImmunData files.

verbose

Logical. If TRUE (default), prints progress messages to the console.

Value

A new ImmunData object containing receptor and annotation data.

Load and Validate Metadata Table for Immune Repertoire Files

Description

#' This function loads a metadata table from either a file path or a data frame, validates the presence of a column with repertoire file paths, and converts all file paths to absolute paths. It is used to support flexible pipelines for loading bulk or single-cell immune repertoire data across samples.

If the input is a file path, the function attempts to read it with readr::read_delim. If the input is a data frame, it checks whether file paths are absolute; relative paths are only allowed when metadata is loaded from a file.

It warns the user if many of the files listed in the metadata table are missing, and stops execution if none of the files exist.

The column with file paths is normalized and renamed to match the internal filename schema.

Usage

load_metadata(metadata, filename_col = "File", delim = "\t", ...)

Arguments

metadata

A metadata table. Can be either:

a data frame with metadata,
or a path to a text/TSV/CSV file that can be read with readr::read_delim.

filename_col

A string specifying the name of the column in the metadata table that contains paths to repertoire files. Defaults to "File".

delim

Delimiter used to read the metadata file (if a path is provided). Defaults to "\t".

...

Additional arguments passed to readr::read_delim() when reading metadata from a file.

Value

A validated and updated metadata data frame with absolute file paths, and an additional column renamed according to IMD_GLOBALS$schema$filename.

Load and Aggregate Immune Receptor Repertoire Data

Description

This function ingests a repertoire dataset (Parquet, CSV, or TSV), aggregates receptors based on a user-defined schema, and splits the result into receptor-level and annotation-level tables. The resulting data is saved to a designated output folder as two Parquet files (receptors and annotations) and then reloaded to create an ImmunData object.

Usage

load_repertoires(
  path,
  schema,
  metadata = NULL,
  barcode_col = NULL,
  count_col = NULL,
  repertoire_schema = NULL,
  output_folder = NULL,
  enforce_schema = TRUE,
  verbose = TRUE
)

Arguments

path

Path to an input file. This file may be Parquet, CSV, or TSV. The file extension is automatically detected and handled.

schema

Character vector defining which columns in the input data should be used to identify unique receptor signatures. For example, c("V_gene", "J_gene", "CDR3_nt").

metadata

An optional data frame containing additional metadata to merge into the annotation table. Default is NULL.

barcode_col

An optional character string specifying the column in the input data that represents cell barcodes or other unique identifiers. Default is NULL.

count_col

An optional character string specifying the column in the input data that stores bulk receptor counts. Default is NULL.

repertoire_schema

An optional character vector defining how annotations should be grouped into repertoires (for example, c("sample", "donor")). Currently unused in this function, but reserved for future expansions. Default is NULL.

output_folder

Character string specifying the directory to save the resulting Parquet files. If NULL, a folder named ⁠immundata-<basename_of_path>⁠ is created in the same directory as path.

enforce_schema

Logical. If TRUE, column names and types must strictly match between files. If FALSE, columns are unioned

verbose

. Logical. Not used – for now.

Details

Reading – The function automatically detects whether path points to a Parquet, CSV, or TSV file, using read_parquet_duckdb or read_csv_duckdb.
Aggregation – Receptor uniqueness is determined by the columns named in schema, while barcodes or counts are handled depending on which parameters (barcode_col, count_col) are provided.
Saving – The final receptor-level and annotation-level tables are written to Parquet files in output_folder.
Reloading – The function calls load_immundata() on the newly created folder to return a fully instantiated ImmunData.

immundata: A Unified Data Layer for Single-Cell, Spatial and Bulk Immunomics

Description

Author(s)

Internal Immundata Global Configuration

Description

Usage

Format

Components

ImmunData: A Unified Structure for Immune Receptor Repertoire Data

Description

Public fields

Active bindings

Methods

Public methods

Method new()

Usage

Arguments

Method build_repertoires()

Usage

Arguments

Method print()

Usage

Method filter_receptors()

Usage

Arguments

Returns

Method filter_annotations()

Usage

Arguments

Returns

Method filter_barcodes()

Usage

Arguments

Returns

Method clone()

Usage

Arguments

Get Immundata Default File Names

Description

Usage

Value

Get Immundata Internal Schema Field Names

Description

Usage

Value

Load a Saved ImmunData Object from Disk

Description

Usage

Arguments

Value

See Also

Load and Validate Metadata Table for Immune Repertoire Files

Description

Usage

Arguments

Value

Load and Aggregate Immune Receptor Repertoire Data

Description

Usage

Arguments

Details

See Also

Method `new()`

Method `build_repertoires()`

Method `print()`

Method `filter_receptors()`

Method `filter_annotations()`

Method `filter_barcodes()`

Method `clone()`