--- title: "Attributes In-Depth" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Attributes In-Depth} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` In HDF5, **attributes** are small pieces of metadata attached to groups or datasets. They are best used to store descriptive information: units, timestamps, descriptions, or experimental parameters—separately from the main data array. This vignette covers how to write, read, and manage these attributes using `h5lite`, as well as important limitations regarding their structure. ```{r setup} library(h5lite) file <- tempfile(fileext = ".h5") ``` ## Writing Attributes There are two ways to write attributes in `h5lite`: explicitly (targeting an object) or implicitly (saving R attributes). ### 1. Explicit Writing You can write an attribute to any existing group or dataset using the `attr` argument in `h5_write()`. This is useful for adding metadata after the data has been saved. ```{r} # First, write a dataset h5_write(1:10, file, "measurements/temperature") # Now, attach attributes to it h5_write(I("Celsius"), file, "measurements/temperature", attr = "units") h5_write(I("2023-10-27"), file, "measurements/temperature", attr = "date") h5_write(I(0.1), file, "measurements/temperature", attr = "precision") ``` *Note: If the attribute already exists, it will be overwritten.* ### 2. Implicit Writing (R Attributes) `h5lite` automatically preserves custom R attributes attached to your objects. When you write an R object, any attributes (except for standard internal ones like `dim`, `names`, or `class`) are written as HDF5 attributes. ```{r} # Create a vector with custom R attributes data <- rnorm(5) attr(data, "description") <- I("Randomized control group") attr(data, "valid") <- I(TRUE) # Write the object h5_write(data, file, "experiment/control") # Check the file - the attributes are there h5_attr_names(file, "experiment/control") h5_str(file) ``` ## Reading Attributes ### 1. Accessing Specific Attributes If you only need a specific piece of metadata without reading the full dataset, you can use `h5_read(..., attr = "name")`. ```{r} # Read just the 'units' attribute units <- h5_read(file, "measurements/temperature", attr = "units") print(units) ``` ### 2. Reading with the Dataset When you read a dataset, `h5lite` automatically reads all attached attributes and re-attaches them to the resulting R object. ```{r} # Read the full dataset temps <- h5_read(file, "measurements/temperature") # The attributes are available in R attributes(temps) str(temps) ``` ## Managing Attributes ### Listing Attributes Use `h5_attr_names()` to list the names of all attributes attached to a specific object. ```{r} h5_attr_names(file, "measurements/temperature") ``` ### Deleting Attributes You can remove a specific attribute using `h5_delete()`. ```{r} # Delete the 'precision' attribute h5_delete(file, "measurements/temperature", attr = "precision") # Verify removal h5_attr_names(file, "measurements/temperature") ``` ## Important Limitations While attributes are powerful for storing metadata, they are fundamentally simpler structures than HDF5 Datasets. HDF5 enforces specific constraints that affect how `h5lite` can store complex R objects as attributes. ### 1. No Dimension Scales (Loss of Names) HDF5 **Dimension Scales** (the mechanism `h5lite` uses to store `names`, `dimnames`, and `row.names`) can only be attached to **Datasets**. They cannot be attached to attributes. This means if you write a named vector, matrix, or array as an attribute, **the names will be lost**. ```{r} # A vector with names named_vec <- c(a = 1, b = 2, c = 3) # Write as a standard Dataset -> Names are preserved h5_write(named_vec, file, "my_dataset") h5_names(file, "my_dataset") # Write as an Attribute -> Names are LOST h5_write(named_vec, file, "measurements/temperature", attr = "meta_vec") h5_names(file, "measurements/temperature", attr = "meta_vec") ``` **Exception: Data Frames** There is one major exception: `data.frame` objects. Because HDF5 stores data frames as **Compound Types**, the column names are baked into the type definition itself, not stored as side-loaded metadata. Therefore, **column names are preserved** even when writing a data frame as an attribute. However, `row.names` (which rely on dimension scales) will still be lost. ```{r} # A data frame with metadata df <- data.frame( id = 1:3, status = c("ok", "fail", "ok") ) # Write as attribute h5_write(df, file, "measurements/temperature", attr = "log") # Column names survive! h5_names(file, "measurements/temperature", attr = "log") ``` ### 2. No Attributes on Attributes (Nesting) In HDF5, you cannot attach attributes to other attributes. This hierarchy is strictly one level deep: Groups/Datasets can have attributes, but attributes cannot. Consequently, you cannot treat an attribute as a "Group" or folder to store other items. If you need a hierarchical structure for your metadata, you should create a Group (e.g., `/metadata`) and store your metadata as Datasets inside it, rather than attaching them as attributes to another object. ## Controlling Attribute Types Attributes in HDF5 are typed just like datasets. `h5lite` allows you to control the storage type of attributes using the `as` argument in `h5_write()` or `h5_read()`. To target an attribute specifically, prefix the name with `@` in the `as` vector. ### Customizing Storage Type ```{r} # Write the temperature data again, but use a fixed length string for 'description' h5_write(data, file, "experiment/control", as = c("@description" = "ascii[]")) # Store an attribute as a `uint8` instead of the default `int32` h5_write(I(42), file, "measurements/temperature", "sensor_id", as = "uint8") ``` ### Customizing Read Type You can also coerce attributes when reading them. ```{r} # Force the 'valid' attribute to be read as logical, even if stored as integer meta <- h5_read(file, "experiment/control", attr = "valid", as = "logical") ``` ## Special Note: Dimensions You might notice that standard R attributes like `dim` are not visible in `h5_attr_names()`. This is because `h5lite` handles structural attributes implicitly. The dimensions of the attribute data itself are stored in the HDF5 Dataspace, not as a separate attribute. `h5lite` automatically restores the `dim` attribute on the R object when reading, ensuring matrices and arrays retain their shape. ```{r, include=FALSE} unlink(file) ```