The dataset R Package

rhub lifecycle Project Status: WIP CRAN_Status_Badge CRAN_time_from_release Status at rOpenSci Software Peer Review DOI devel-version dataobservatory Codecov test coverage

Overview

The dataset package helps you create semantically rich, machine-readable, and interoperable datasets in R. It introduces S3 classes that extend data frames, vectors, and bibliographic entries with formal metadata structures inspired by:

The goal is to preserve metadata when reusing statistical and repository datasets, improve interoperability, and make it easy to turn tidy data frames into web-ready, publishable datasets that comply with ISO and W3C standards.

Installation

You can install the latest released version of dataset from CRAN with:

install.packages("dataset")

To install the development version from GitHub with pak or remotes:

# install.packages("pak")
pak::pak("dataobservatory-eu/dataset")

# install.packages("remotes")
remotes::install_github("dataobservatory-eu/dataset")

Minimal Example

library(dataset)
df <- dataset_df(
  country = defined(
    c("AD", "LI"),
    label = "Country",
    namespace = "https://www.geonames.org/countries/$1/"
  ),
  gdp = defined(c(3897, 7365),
    label = "GDP",
    unit = "million euros"
  ),
  dataset_bibentry = dublincore(
    title = "GDP Dataset",
    creator = person("Jane", "Doe", role = "aut"),
    publisher = "Small Repository"
  )
)
print(df)
#> Doe (2025): GDP Dataset [dataset]
#>   rowid     country   gdp       
#>   <defined> <defined> <defined>
#> 1 obs1      AD        3897     
#> 2 obs2      LI        7365

Export as RDF triples:

dataset_to_triples(df, format = "nt")
#> [1] "<http://example.com/dataset#obsobs1> <http://example.com/prop/country> <https://www.geonames.org/countries/AD/> ."
#> [2] "<http://example.com/dataset#obsobs2> <http://example.com/prop/country> <https://www.geonames.org/countries/LI/> ."
#> [3] "<http://example.com/dataset#obsobs1> <http://example.com/prop/gdp> \"3897\"^^<xsd:decimal> ."                     
#> [4] "<http://example.com/dataset#obsobs2> <http://example.com/prop/gdp> \"7365\"^^<xsd:decimal> ."

Retain automatically recorded provenance:

provenance(df)
#> [1] "<http://example.com/dataset_prov.nt> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#Bundle> ."                  
#> [2] "<http://example.com/dataset#> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#Entity> ."                         
#> [3] "<http://example.com/dataset#> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/linked-data/cube#DataSet> ."                 
#> [4] "_:doejane <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#Agent> ."                                              
#> [5] "<https://doi.org/10.32614/CRAN.package.dataset> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#SoftwareAgent> ."
#> [6] "<http://example.com/creation> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#Activity> ."                       
#> [7] "<http://example.com/creation> <http://www.w3.org/ns/prov#generatedAtTime> \"2025-08-25T21:44:14Z\"^^<xsd:dateTime> ."

Contributing

We welcome contributions and discussion!

Code of Conduct

This project follows the rOpenSci Code of Conduct. By participating, you are expected to uphold these guidelines.