
The goal of {rocrateR} is to provide an R package for
creating, manipulating and reading RO-Crates. Latest supported version
of the specification: https://w3id.org/ro/crate/1.2/.
An RO-Crate is an integrated view through which you can see an entire Research Object; the methods, the data, the output and the outcomes of a project or a piece of work. Linking all this together enables the sharing of research outputs with their context, as a coherent whole.
RO-Crates link data and metadata no matter where they are stored – so that from a paper, you can find the data, and from the data, you can find its authors, and so on.
For more details, please visit: https://www.researchobject.org/ro-crate/about_ro_crate
You can install the released version of {rocrateR} from
CRAN with:
# install.packages("pak")
pak::pak("rocrateR")And the development version from GitHub with:
# install.packages("pak")
pak::pak("ResearchObject/ro-crate-r@dev")| Function | Purpose |
|---|---|
rocrate() |
Create an empty or initialized RO-Crate |
entity() |
Define a new entity (Person, Dataset, etc.) |
add_entity() / add_entities() |
Add entities to a crate |
get_entity() |
Retrieve entities by @id or @type |
remove_entity() / remove_entities() |
Remove one or more entities |
write_rocrate() |
Save RO-Crate to disk |
bag_rocrate() / is_rocrate_bag() /
unbag_rocrate() |
Bagging and unbagging RO-Crates |
validate_rocrate() (planned) |
Validate RO-Crate using external Python validator |
The following command creates an RO-Crate Metadata descriptor
(ro-crate-metadata.json). This should be stored inside the
root (./) of your RO-Crate.
# library(rocrateR)
my_first_ro_crate <- rocrateR::rocrate()This object is a list with the basic components of an RO-Crate. It
can be visualised in the console with the print
command:
print(my_first_ro_crate)
#> {
#> "@context": "https://w3id.org/ro/crate/1.2/context",
#> "@graph": [
#> {
#> "@id": "ro-crate-metadata.json",
#> "@type": "CreativeWork",
#> "about": {
#> "@id": "./"
#> },
#> "conformsTo": {
#> "@id": "https://w3id.org/ro/crate/1.2"
#> }
#> },
#> {
#> "@id": "./",
#> "@type": "Dataset",
#> "name": "",
#> "description": "",
#> "datePublished": "2025-11-04",
#> "license": {
#> "@id": "http://spdx.org/licenses/CC-BY-4.0"
#> }
#> }
#> ]
#> }This object can be saved to disk using the following command:
my_first_ro_crate |>
rocrateR::write_rocrate("/path/to/ro-crate/ro-crate-metadata.json")For example, using a temporary directory:
tmp <- file.path(tempdir(), "ro-crate-metadata.json")
my_first_ro_crate |>
rocrateR::write_rocrate(tmp)
# load lines / flat file
readLines(tmp)
#> [1] "{"
#> [2] " \"@context\": \"https://w3id.org/ro/crate/1.2/context\","
#> [3] " \"@graph\": ["
#> [4] " {"
#> [5] " \"@id\": \"ro-crate-metadata.json\","
#> [6] " \"@type\": \"CreativeWork\","
#> [7] " \"about\": {"
#> [8] " \"@id\": \"./\""
#> [9] " },"
#> [10] " \"conformsTo\": {"
#> [11] " \"@id\": \"https://w3id.org/ro/crate/1.2\""
#> [12] " }"
#> [13] " },"
#> [14] " {"
#> [15] " \"@id\": \"./\","
#> [16] " \"@type\": \"Dataset\","
#> [17] " \"name\": \"\","
#> [18] " \"description\": \"\","
#> [19] " \"datePublished\": \"2025-11-04\","
#> [20] " \"license\": {"
#> [21] " \"@id\": \"http://spdx.org/licenses/CC-BY-4.0\""
#> [22] " }"
#> [23] " }"
#> [24] " ]"
#> [25] "}"
# delete temporal file
unlink(tmp)In the previous section we created a very basic RO-Crate with the
rocrateR::rocrate() function; however, you are likely to
include additional entities in your RO-Crate. Entities must contain at
least two components @id and @type (see https://w3id.org/ro/crate/1.2/ for details).
For example, a contextual entity can be defined as follows:
# create entity for an organisation
organisation_uol <- rocrateR::entity(
x = "https://ror.org/04xs57h96",
type = "Organization",
name = "University of Liverpool",
url = "http://www.liv.ac.uk"
)
# create an entity for a person
person_rvd <- rocrateR::entity(
x = "https://orcid.org/0000-0001-5036-8661",
type = "Person",
name = "Roberto Villegas-Diaz"
)These two can be linked to each other via the @id
field:
person_rvd$affiliation <- list(`@id` = organisation_uol$`@id`)These entities can be attached to an RO-Crate using the
rocrateR::add_entity() function:
my_second_ro_crate <- rocrateR::rocrate() |>
rocrateR::add_entity(person_rvd) |>
rocrateR::add_entity_value(id = "./", key = "author", value = list(`@id` = person_rvd$`@id`)) |>
rocrateR::add_entity(organisation_uol)Alternatively, the same result can be achieved with the following code:
my_second_ro_crate <- rocrateR::rocrate(person_rvd, organisation_uol) |>
rocrateR::add_entity_value(id = "./", key = "author", value = list(`@id` = person_rvd$`@id`))print(my_second_ro_crate)
#> {
#> "@context": "https://w3id.org/ro/crate/1.2/context",
#> "@graph": [
#> {
#> "@id": "ro-crate-metadata.json",
#> "@type": "CreativeWork",
#> "about": {
#> "@id": "./"
#> },
#> "conformsTo": {
#> "@id": "https://w3id.org/ro/crate/1.2"
#> }
#> },
#> {
#> "@id": "./",
#> "@type": "Dataset",
#> "name": "",
#> "description": "",
#> "datePublished": "2025-11-04",
#> "license": {
#> "@id": "http://spdx.org/licenses/CC-BY-4.0"
#> },
#> "author": {
#> "@id": "https://orcid.org/0000-0001-5036-8661"
#> }
#> },
#> {
#> "@id": "https://orcid.org/0000-0001-5036-8661",
#> "@type": "Person",
#> "name": "Roberto Villegas-Diaz",
#> "affiliation": {
#> "@id": "https://ror.org/04xs57h96"
#> }
#> },
#> {
#> "@id": "https://ror.org/04xs57h96",
#> "@type": "Organization",
#> "name": "University of Liverpool",
#> "url": "http://www.liv.ac.uk"
#> }
#> ]
#> }Previously, we covered how to include additional entities, other
valid operations are to extract (rocrateR::get_entity())
and remove (rocrateR::remove_entities()).
# create basic RO-Crate
basic_ro_crate <- rocrateR::rocrate()
# create some entities for a project and datasets
dataset_entities <- seq_len(5) |>
lapply(\(x) rocrateR::entity(x, type = "Dataset", name = paste0("Data ", x)))
project_entity <- rocrateR::entity(
"#proj101",
type = "Project",
name = "Project 101",
hasPart = dataset_entities |>
lapply(\(x) list(`@id` = x[["@id"]]))
)
# add project and entities to the RO-Crate
basic_ro_crate <- basic_ro_crate |>
rocrateR::add_entity(project_entity) |>
# note that here we are using `rocrateR::add_entities` and `rocrateR::add_entity`
rocrateR::add_entities(dataset_entities)
#> Adding entity with @id='1'...
#> Adding entity with @id='2'...
#> Adding entity with @id='3'...
#> Adding entity with @id='4'...
#> Adding entity with @id='5'...
print(basic_ro_crate)
#> {
#> "@context": "https://w3id.org/ro/crate/1.2/context",
#> "@graph": [
#> {
#> "@id": "ro-crate-metadata.json",
#> "@type": "CreativeWork",
#> "about": {
#> "@id": "./"
#> },
#> "conformsTo": {
#> "@id": "https://w3id.org/ro/crate/1.2"
#> }
#> },
#> {
#> "@id": "./",
#> "@type": "Dataset",
#> "name": "",
#> "description": "",
#> "datePublished": "2025-11-04",
#> "license": {
#> "@id": "http://spdx.org/licenses/CC-BY-4.0"
#> }
#> },
#> {
#> "@id": "#proj101",
#> "@type": "Project",
#> "name": "Project 101",
#> "hasPart": [
#> {
#> "@id": 1
#> },
#> {
#> "@id": 2
#> },
#> {
#> "@id": 3
#> },
#> {
#> "@id": 4
#> },
#> {
#> "@id": 5
#> }
#> ]
#> },
#> {
#> "@id": 1,
#> "@type": "Dataset",
#> "name": "Data 1"
#> },
#> {
#> "@id": 2,
#> "@type": "Dataset",
#> "name": "Data 2"
#> },
#> {
#> "@id": 3,
#> "@type": "Dataset",
#> "name": "Data 3"
#> },
#> {
#> "@id": 4,
#> "@type": "Dataset",
#> "name": "Data 4"
#> },
#> {
#> "@id": 5,
#> "@type": "Dataset",
#> "name": "Data 5"
#> }
#> ]
#> }We can extract entities via the @id, @type
or both:
@idbasic_ro_crate_project <- basic_ro_crate |>
rocrateR::get_entity(id = "#proj101")
print(basic_ro_crate_project)
#> [[1]]
#> RO-Crate entity:
#> @id = '#proj101'
#> @type = 'Project'@typebasic_ro_crate_datasets <- basic_ro_crate |>
rocrateR::get_entity(type = "Dataset")
print(basic_ro_crate_datasets)
#> [[1]]
#> RO-Crate entity:
#> @id = './'
#> @type = 'Dataset'
#>
#> [[2]]
#> RO-Crate entity:
#> @id = '1'
#> @type = 'Dataset'
#>
#> [[3]]
#> RO-Crate entity:
#> @id = '2'
#> @type = 'Dataset'
#>
#> [[4]]
#> RO-Crate entity:
#> @id = '3'
#> @type = 'Dataset'
#>
#> [[5]]
#> RO-Crate entity:
#> @id = '4'
#> @type = 'Dataset'
#>
#> [[6]]
#> RO-Crate entity:
#> @id = '5'
#> @type = 'Dataset'@id
and @typebasic_ro_crate_dataset_root <- basic_ro_crate |>
rocrateR::get_entity(id = "./", type = "Dataset")
print(basic_ro_crate_dataset_root)
#> [[1]]
#> RO-Crate entity:
#> @id = './'
#> @type = 'Dataset'Similarly, we can remove entities from an RO-Crate:
@idbasic_ro_crate_alt <- basic_ro_crate |>
rocrateR::remove_entity("#proj101")
#> Removing the entity with @id = '#proj101'.entity objectbasic_ro_crate_alt <- basic_ro_crate |>
rocrateR::remove_entity(project_entity)
#> Removing the entity with @id = '#proj101'.basic_ro_crate_alt <- basic_ro_crate |>
rocrateR::remove_entities(dataset_entities)
#> Removing the entity with @id = '1'.
#> Removing the entity with @id = '2'.
#> Removing the entity with @id = '3'.
#> Removing the entity with @id = '4'.
#> Removing the entity with @id = '5'.Here we will explore the BagIt file packaging format, which is the recommended to use for bagging RO-Crates. BagIt is described in RFC 8493:
[BagIt is] … a set of hierarchical file layout conventions for storage and transfer of arbitrary digital content. A “bag” has just enough structure to enclose descriptive metadata “tags” and a file “payload” but does not require knowledge of the payload’s internal semantics. This BagIt format is suitable for reliable storage and transfer.
In this package, the function rocrateR::bag_rocrate will
take either a path pointing to the root of an RO-Crate
(must have at least an RO-Crate metadata descriptor file,
ro-crate-metadata.json) or an RO-Crate object created with
rocrateR::rocrate (and alternatives), as shown in step
1.
For more details, run the following command:
?rocrateR::bag_rocraterocrateR::bag_rocrate()Here we will create an RO-Crate bag inside temporary directory:
# create basic RO-Crate
basic_ro_crate <- rocrateR::rocrate()
# create temporary directory
tmp_dir <- file.path(tempdir(), paste0("rocrate-", digest::digest(Sys.time())))
dir.create(tmp_dir, showWarnings = FALSE, recursive = TRUE)
# then, we can create the RO-Crate bag
path_to_rocrate_bag <- basic_ro_crate |>
rocrateR::bag_rocrate(path = tmp_dir)
#> RO-Crate successfully 'bagged'!
#> For details, see: /tmp/RtmpJDledw/rocrate-dc0b8f71bc4ab2e85c9c47e17b4c6cfc/rocrate-198ae50c2d60ae04b7a9a2da24243c5e.ziprocrateR::is_rocrate_bag()We can use the function rocrateR::is_rocrate_bag() to
verify that a given path points to a ZIP file or a directory with a
valid RO-Crate bag. The expected files are
bagit.txt with the BagIt definitiondata directory with payload
of the RO-Cratemanifest-[algorithm].txt with the checksum for each
file inside the data directory; .basic_ro_crate_contents <- path_to_rocrate_bag |>
rocrateR::is_rocrate_bag()
#> Valid RO-Crate found!And then, the RO-Crate can be displayed
print(basic_ro_crate_contents)
#> {
#> "@context": "https://w3id.org/ro/crate/1.2/context",
#> "@graph": [
#> {
#> "@id": "ro-crate-metadata.json",
#> "@type": "CreativeWork",
#> "about": {
#> "@id": "./"
#> },
#> "conformsTo": {
#> "@id": "https://w3id.org/ro/crate/1.2"
#> }
#> },
#> {
#> "@id": "./",
#> "@type": "Dataset",
#> "name": "",
#> "description": "",
#> "datePublished": "2025-11-04",
#> "license": {
#> "@id": "http://spdx.org/licenses/CC-BY-4.0"
#> }
#> }
#> ]
#> }rocrateR::unbag_rocrate()We can explore the contents of the RO-Crate bag with the following commands:
# extract files in temporary directory
path_to_rocrate_bag_contents <- path_to_rocrate_bag |>
rocrateR::unbag_rocrate(output = file.path(tmp_dir, "ROC"))
#> RO-Crate bag successfully extracted! For details, see:
#> /tmp/RtmpJDledw/rocrate-dc0b8f71bc4ab2e85c9c47e17b4c6cfc/ROC
# create tree with the files
fs::dir_tree(path_to_rocrate_bag_contents)
#> /tmp/RtmpJDledw/rocrate-dc0b8f71bc4ab2e85c9c47e17b4c6cfc/ROC/.
#> ├── bagit.txt
#> ├── data
#> │ └── ro-crate-metadata.json
#> ├── manifest-sha512.txt
#> └── tagmanifest-sha512.txt# delete temporary directory
unlink(tmp_dir, recursive = TRUE, force = TRUE)As you develop your RO-Crates, you might want to validate them. There
are few validators online (some of which can be found at https://www.researchobject.org/ro-crate/tools), here we
will explore the Python package rocrate-validator.
For installation details, please visit https://github.com/crs4/rocrate-validator.
⚠️ The validation workflow depends on Python’s rocrate-validator.
Ensure you have a working Python installation and {reticulate}
configured correctly (reticulate::py_config()). On Windows,
you may need to restart R after installation.
{reticulate}pak::pkg_install("reticulate")rocrate-validatorreticulate::py_install("roc-validator", env = "rocrateR")basic_ro_crate <- rocrateR::rocrate()
# store crate inside temporal directory
tmp <- file.path(tempdir(), "ro-crate-metadata.json")
basic_ro_crate |>
rocrateR::write_rocrate(tmp)
# wrap crate into zip file (expected by validator)
tmp_zip <- paste(tmp, ".zip")
zip(tmp_zip, tmp)
# validate (note the name of the module: rocrate_validator)
reticulate::use_virtualenv("rocrateR")
rocrate_validator <- reticulate::import("rocrate_validator")
status <- rocrate_validator$utils$validate_rocrate_uri(tmp_zip)
if (status) {
message("RO-Crate is valid!")
} else {
message("RO-Crate is invalid!")
}
#> RO-Crate is valid!
# delete temporal files
unlink(tmp)
unlink(tmp_zip)