Maintainer: Mark van der Loo <mark.vanderloo@gmail.com>
License: GPL-3
Title: Modify Data Using Externally Defined Modification Rules
Type: Package
LazyLoad: yes
Description: Data cleaning scripts typically contain a lot of 'if this change that' type of statements. Such statements are typically condensed expert knowledge. With this package, such 'data modifying rules' are taken out of the code and become in stead parameters to the work flow. This allows one to maintain, document, and reason about data modification rules as separate entities.
Version: 0.9.0
Depends: methods
URL: https://github.com/data-cleaning/dcmodify
BugReports: https://github.com/data-cleaning/dcmodify/issues
Encoding: UTF-8
Imports: yaml, validate (≥ 1.1.3), lumberjack (≥ 1.3.1), settings, utils,
Suggests: simplermarkdown, tinytest,
VignetteBuilder: simplermarkdown
Collate: 'dplyr_verbs.R' 'guard.R' 'modifier.R' 'modify.R' 'validate.R'
RoxygenNote: 7.3.1
NeedsCompilation: no
Packaged: 2024-03-27 15:07:31 UTC; mark
Author: Mark van der Loo ORCID iD [cre, aut], Edwin de Jonge ORCID iD [aut], Sjabbo Schaveling [ctb], Floris Ruijter [ctb]
Repository: CRAN
Date/Publication: 2024-03-28 08:20:06 UTC

Data Modification By Modifying Rules

Description

Data often contain errors and missing data. Experts can often correct commonly occuring errors based on simple conditional rules. This package facilitates the expression, management, and application of such rules on data sets.

The general workflow in dcmodify follows the following pattern.

There are several convenience functions that allow one to define modification rules from the commandline, through a (freeform or yaml) file and to investigate and maintain the rules themselves. Please have a look at the introductory vignette

vignette("introduction",package="dcmodify")

Author(s)

Maintainer: Mark van der Loo mark.vanderloo@gmail.com (ORCID)

Authors:

Other contributors:

See Also

Useful links:


Create or read a set of data modification rules

Description

Create or read a set of data modification rules

Usage

modifier(..., .file, .data)

Arguments

...

A comma-separated list of modification rules.

.file

(optional) A character vector of file locations.

.data

(optional) A data.frame with at least a column "rule" of type character. Optionally, the following columns of metadata can be provided (all character, except "created" which should be POSIXct): "name", "label", "description", "origin", "created".

Value

An object of class modifier.

Examples

m <- modifier( if (height < mean(height)) height <- 2*height
, if ( weight > mean(weight) ) weight <- weight/2  )
modify(women,m)



Store modification rules

Description

Store modification rules


Modify a data set

Description

Modify a data set

Usage

modify(dat, x, ref, ...)

## S4 method for signature 'data.frame,modifier,environment'
modify(dat, x, ref, logger = NULL, ...)

## S4 method for signature 'data.frame,modifier,ANY'
modify(dat, x, logger = NULL, ...)

## S4 method for signature 'data.frame,modifier,data.frame'
modify(dat, x, ref, logger = NULL, ...)

## S4 method for signature 'data.frame,modifier,list'
modify(dat, x, ref, logger = NULL, ...)

Arguments

dat

A data.frame

x

A modifier object containing modifying rules.

ref

A environment

...

Extra arguments.

logger

Optional. A lumberjack-compatible logger object.

Examples

m <- modifier( if (height < mean(height)) height <- 2*height
, if ( weight > mean(weight) ) weight <- weight/2  )
modify(women,m)

Shortcut to modify data

Description

Shortcut to modify data

Usage

modify_so(dat, ...)

Arguments

dat

A data.frame

...

A comma-separated list of modifying rules.


Objects exported from other packages

Description

These objects are imported from other packages. Follow the links below to see their documentation.

validate

as_yaml, description, description<-, do_by, export_yaml, label, label<-, max_by, mean_by, min_by, origin, origin<-, sum_by