Type: Package
Title: Read and Write CSV on the Web (CSVW) Tables and Metadata
Version: 0.1.7
Author: Robin Gower
Maintainer: Robin Gower <csvwr@infonomics.ltd.uk>
Description: Provide functions for reading and writing CSVW - i.e. CSV tables and JSON metadata. The metadata helps interpret CSV by setting the types and variable names.
License: GPL-3
URL: https://robsteranium.github.io/csvwr/, https://github.com/Robsteranium/csvwr
BugReports: https://github.com/Robsteranium/csvwr/issues
Encoding: UTF-8
Suggests: testthat (≥ 3.0.0), knitr, markdown, rmarkdown
Imports: cli, magrittr, jsonlite, purrr, readr, stringr, rlang
Config/testthat/edition: 3
RoxygenNote: 7.2.1
VignetteBuilder: knitr, rmarkdown
Language: en-GB
NeedsCompilation: no
Packaged: 2022-11-20 13:47:03 UTC; robin
Repository: CRAN
Date/Publication: 2022-11-21 11:20:02 UTC

csvwr: Read and write CSV on the Web (CSVW)

Description

Read and write csv tables annotated with csvw metadata. This helps to ensure consistent processing and reduce the amount of manual work needed to parse and prepare data before it can be used in analysis.

Getting started

The best place to start is the Reading and Writing CSVW vignette.

Reading annotated tables

Writing table annotations

See Also

Useful links:


Add data frame to csvw table annotation

Description

Add data frame to csvw table annotation

Usage

add_dataframe(table, filename, group)

Arguments

table

a csvw:Table annotation

filename

a filename/ URL for the csv table

group

a list of metadata for the table group to use as a fallback

Value

a table annotation with a dataframe attribute added with data frame holding the contents of the table


Retrieve the base URI from configuration

Description

Retrieve the base URI from configuration

Usage

base_uri()

Value

returns the value of csvwr_base_uri option, defaulting to example.net

Examples

## Not run: 
base_uri() # returns default

options(csvwr_base_uri="http://www.w3.org/2013/csvw/tests/")
base_uri()

## End(Not run)

Determine the base URL for CSVW metadata

Description

Determine the base URL for CSVW metadata

Usage

base_url(metadata, location)

Arguments

metadata

the csvw metadata

location

where the metadata was originally located

Value

A string containing the base URL


Coalesce value to truthiness

Description

Determine whether the input is true, with missing values being interpreted as false.

Usage

coalesce_truth(x)

Arguments

x

logical, NA or NULL

Value

FALSE if x is anything but TRUE


Compact objects to values

Description

Follows the rules for JSON-LD to JSON conversion set out in the csv2json standard.

Usage

compact_json_ld(value)

Arguments

value

an element from a list (could be a vector or another list)

Value

A compacted value.


Create tabular metadata from a list of tables

Description

The table annotations should each be a list with keys for url and tableSchema. You can use derive_table_schema to derive a schema from a data frame.

Usage

create_metadata(tables)

Arguments

tables

a list of csvw:table annotations

Value

a list describing a tabular metadata annotation

Examples

d <- data.frame(foo="bar")
table <- list(url="filename.csv", tableSchema=derive_table_schema(d))
create_metadata(tables=list(table))

Convert a csvw metadata to a list (csv2json)

Description

Convert a csvw metadata to a list (csv2json)

Usage

csvw_to_list(csvw)

Arguments

csvw

a csvw metadata list

Value

a list following the csv2json translation rules

Examples

## Not run: 
csvw_to_list(read_csvw("example.csv"))

## End(Not run)

Get path to csvwr example

Description

The csvwr package includes some example csvw files in it's inst/extdata directory. You can use this function to find them.

Usage

csvwr_example(path = NULL)

Arguments

path

The filename. If NULL, the example files will be listed.

Details

Inspired by readr::readr_example()

Value

either a file path or a vector of filenames

Examples

csvwr_example()
csvwr_example("computer-scientists.csv")

Map csvw datatypes to R types

Description

Translate csvw datatypes to R types. This implementation currently targets readr::cols column specifications.

Usage

datatype_to_type(datatypes)

Arguments

datatypes

a list of csvw datatypes

Value

a readr::cols specification - a list of collectors

Examples

## Not run: 
cspec <- datatype_to_type(list("double", list(base="date", format="yyyy-MM-dd")))
readr::read_csv(readr::readr_example("challenge.csv"), col_types=cspec)

## End(Not run)

CSVW default dialect

Description

The CSVW Default Dialect specification described in CSV Dialect Description Format.

Usage

default_dialect

Format

An object of class list of length 13.

Value

a list specifying a default csv dialect


Create a default table schema given a csv file and dialect

Description

If neither the table nor the group have a tableSchema annotation, then this default schema will used.

Usage

default_schema(filename, dialect = default_dialect)

Arguments

filename

a csv file

dialect

specification of the csv's dialect (default: default_dialect)

Value

a table schema


Derive csvw metadata from a csv file

Description

Derive csvw metadata from a csv file

Usage

derive_metadata(filename)

Arguments

filename

a csv file

Value

a list of csvw metadata

Examples

derive_metadata(csvwr_example("computer-scientists.csv"))

Derive csvw table schema from a data frame

Description

Derive csvw table schema from a data frame

Usage

derive_table_schema(d)

Arguments

d

a data frame

Value

a list describing a csvw:tableSchema

Examples

derive_table_schema(data.frame(a=1,b=2))

Extract a referenced table from CSVW metadata

Description

Extract a referenced table from CSVW metadata

Usage

extract_table(csvw, reference)

Arguments

csvw

the metadata

reference

a foreign key reference expressed as a list containing either a reference attribute or a schemaReference attribute

Value

a csvw table


Find the first existing file from a set of candidates

Description

Find the first existing file from a set of candidates

Usage

find_existing_file(filenames)

Arguments

filenames

a vector of candidates

Value

If one of the filenames passed is found, then the first is returned. If none of the filenames exist, NULL is returned


Find metadata for a tabular file

Description

Searches through the default locations attempting to locate metadata.

Usage

find_metadata(filename)

Arguments

filename

a csv file

Value

a uri for the metadata, or null if none were found


Does the string provide an absolute URL

Description

Does the string provide an absolute URL

Usage

is_absolute_url(string)

Arguments

string

the url, path or template

Value

true if the string is an absolute url


Determine if an annotation is non-core

Description

Checks if the annotation is non-core, and should thus be treated as a json-ld note.

Usage

is_non_core_annotation(property)

Arguments

property

a list element

Value

TRUE the annotation is core, FALSE otherwise


Convert json-ld annotation to json

Description

Follows the rules for JSON-LD to JSON conversion set out in the csv2json standard.

Usage

json_ld_to_json(property)

Arguments

property

a json-ld annotation (single list element)

Value

A compacted list element


Parse list of lists specification into a data frame

Description

Parse list of lists specification into a data frame

Usage

list_of_lists_to_df(ll)

Arguments

ll

a list of lists

Value

a data frame with a row per list


Locate metadata for a table

Description

Follows the procedure defined in the csvw model:

Usage

locate_metadata(filename, metadata)

Arguments

filename

a path for a csv table or a json metadata document

metadata

optional user metadata

Details

  1. Metadata supplied by the user

  2. Metadata referenced by a link header

  3. Metadata located through default paths

  4. Metadata embedded in the file

We extend this to use the derive_metadata function to inspect the table itself.

Value

csvw metadata list


Locate csv data table

Description

Locate csv data table

Usage

locate_table(filename, url)

Arguments

filename

the file passed to read_csvw in the first place (could be the csv or json annotations)

url

the location of the the table as defined in the metadata

Value

The location of the table


Identify metadata location configurations for a tabular file

Description

Returns default locations. Will ultimately retrieve remote configuration

Usage

location_configuration(filename)

Arguments

filename

a csv file

Value

a character vector of URI templates


Normalise metadata

Description

The spec defines a normalisation process.

Usage

normalise_metadata(metadata, location)

Arguments

metadata

a csvw metadata list

location

the location of the metadata

Value

metadata with normalised properties


Normalise an annotation property

Description

This follows the normalisation process set out in the csvw specification.

Usage

normalise_property(property, base_url)

Arguments

property

an annotation property (a list)

base_url

the base URL for normalisation

Value

a property (list) a


Normalise a URL

Description

Ensures that a url is specified absolutely with reference to a base

Usage

normalise_url(url, base)

Arguments

url

a string

base

the base to use for normalisation

Value

A string containing a normalised URL


Override defaults

Description

Merges two lists applying override values on top of the default values.

Usage

override_defaults(...)

Arguments

...

any number of lists with configuration values

Value

a list with the values from the first list replacing those in the second and so on


Parse columns schema

Description

Parse columns schema

Usage

parse_columns(columns)

Arguments

columns

a list of lists specification of columns

Value

a data frame with a row per column specification


Parse metadata

Description

Coerces the metadata to ensure it describes a table group. Retrieves any linked tableSchema.

Usage

parse_metadata(metadata, location)

Arguments

metadata

a csvw metadata list

location

the location of the metadata

Value

metadata coerced into a table group description


Read CSV on the Web

Description

If the argument to filename is a json metadata document, this will be used to find csv files for each table using the value of csvw:url.

Usage

read_csvw(filename, metadata = NULL)

Arguments

filename

a path for a csv table or a json metadata document

metadata

optional user metadata

Details

If the argument to filename is a csv file, and no metadata is provided, an attempt is made to derive metadata.

If the argument to filename is a csv file, and the metadata is provided, then the given csv will override the value of csvw:url.

The csvw metadata is returned as a list. In each table in the table group, an element named dataframe is added which provides the contents of the csv table parsed into a data frame using the table schema.

Value

csvw metadata list, with a dataframe property added to each table

Examples

## Not run: 
read_csvw("metadata.json")
read_csvw("table.csv", "metadata.json")

## End(Not run)

Read a data frame from the first table in a csvw

Description

Wrapper around read_csvw convenient when you're only interested in the data and there's only one table

Usage

read_csvw_dataframe(filename, metadata = NULL)

Arguments

filename

a path for a csv table or a json metadata document

metadata

optional user metadata

Value

A data frame parsed using the table schema


Read and parse CSVW Metadata

Description

Reads in a json document as a list, transforming columns specifications into a dataframe.

Usage

read_metadata(filename)

Arguments

filename

a path for a json metadata document

Value

csvw metadata list


Serialise cell values for JSON representation

Description

Serialise cell values for JSON representation

Usage

render_cell(cell)

Arguments

cell

a typed value

Value

a representation comparable with the JSON representation (typically a string)


Render URI templates

Description

Interpolate variable bindings into a URI template.

Usage

render_uri_templates(templates, bindings = NULL, ...)

Arguments

templates

a character vector with URI templates

bindings

a list of variable bindings to be interpolated into templates

...

further bindings specified as named function arguments

Details

This doesn't yet implement the whole of RFC 6570, just enough to make the tests pass

You can bind variables by passing a list to the explicit bindings argument, or variadically with ... by naming arguments according to the variable name you wish to bind.

Value

a character vector with the expanded URI

Examples

render_uri_templates("{+url}/resource?query=value", list(url="http://example.net"))
render_uri_templates("{+url}", url="http://example.net")

Resolve one URL against another

Description

Resolve one URL against another

Usage

resolve_url(url1, url2)

Arguments

url1

the base url

url2

a relative url

Value

A single absolute url


Recursive lmap

Description

Applies function .f to each list-element in .x as per purrr::lmap. If the value of the list-element is itself a list, then the function is applied to that in turn. The process is followed recursively until an atomic value at the leaf nodes of the list is found. If .f modifies the name, it is thrown away and replaced by the original name.

Usage

rlmap(.x, .f, ...)

Arguments

.x

a list

.f

a function (called with elements of .x as the first argument)

...

further arguments passed to the function .f

Value

A list


Recursive map

Description

Applies function .f to each element in .x as per purrr::map. If the value of the element is itself a list, then the function is applied to that in turn. The process is followed recursively until an atomic value at the leaf nodes of the list is found.

Usage

rmap(.x, .f)

Arguments

.x

a list

.f

a function (called with elements of .x as the first argument)

Value

A list


Set the base of a URI template

Description

Set the base of a URI template

Usage

set_uri_base(t, url)

Arguments

t

a character vector of URI templates

url

a filename url being used as a context (string)

Value

a character vector of templates with base paths/ domains set appropriately


Convert a table to a list

Description

Follows the pattern for csv2json

Usage

table_to_list(table, group)

Arguments

table

the csvw table

group

list of metadata for the group used for a fallback schema and dialect

Value

a list representation of the table's contents


Transform date/time format string from Unicode TR35 to POSIX 1003.1

Description

As per the csvw specification for date and time formats we accept format strings using the date field symbols defined in unicode TR35. These are converted to POSIX 1003.1 date format strings for use in base::strptime() or readr::parse_date()/readr::parse_datetime().

Usage

transform_datetime_format(format_string)

Arguments

format_string

a UAX35 date format string

Value

a POSIX date format string

Examples

## Not run: 
fmt <- transform_datetime_format("dd.MM.yyyy")
strptime("01.01.2001", format=fmt)

## End(Not run)

Try to add a dataframe to the table

Description

If this fails, a list describing the error is added instead

Usage

try_add_dataframe(table, ...)

Arguments

table

a csvw:Table annotation

...

arguments to add_dataframe

Value

A table annotation with a dataframe attribute added with data frame holding the contents of the table or an error.


Map R types to csvw datatype

Description

Translate R types to csvw datatypes. Acts as an inverse of datatype_to_type but doesn't provide a 1:1 correspondence.

Usage

type_to_datatype(types)

Arguments

types

a list of R types

Value

a list of csvw datatypes


Unlist unless the list-elements are themselves lists

Description

Convert a list of elements to a vector. Unlike base::unlist this doesn't convert the elements of inner lists to vector elements. Thus only a list a single layer deep is flattened to a vector.

Usage

unlist1(l)

Arguments

l

a list

Value

A list of lists or a vector


Validate CSVW specification

Description

Follows the csvw table validation procedure.

Usage

validate_csvw(csvw)

Arguments

csvw

a csvw metadata specification (a list)

Value

a validation report (list)


Validate the referential integrity of a csvw table group

Description

Fails if foreign keys aren't found in the referenced tables

Usage

validate_referential_integrity(csvw)

Arguments

csvw

the metadata annotation

Value

a list specifying any foreign key violations


Calculate depth of vector safely

Description

Like purrr::vec_depth but doesn't attempt to descend into errors

Usage

vec_depth(x)

Arguments

x

a vector

Value

An integer