Help for package testdat

Type:

Package

Title:

Data Unit Testing for R

Version:

0.4.2

Description:

Test your data! An extension of the 'testthat' unit testing framework with a family of functions and reporting tools for checking and validating data frames.

License:

MIT + file LICENSE

URL:

https://socialresearchcentre.github.io/testdat/, https://github.com/socialresearchcentre/testdat

BugReports:

https://github.com/socialresearchcentre/testdat/issues

Depends:

R (≥ 3.2.2), testthat (≥ 2.0.0)

Imports:

dplyr (≥ 0.8.0), glue, lifecycle, rlang, stringr, tidyselect

Suggests:

covr, crayon, knitr, labelled, lubridate, openxlsx, rmarkdown

VignetteBuilder:

knitr

Encoding:

UTF-8

RoxygenNote:

7.2.3

Collate:

'chk-filter.R' 'chk.R' 'comparison.R' 'deprec-chk.R' 'deprec-expect.R' 'deprec-reporter.R' 'expectation.R' 'expect-generic.R' 'expect-make.R' 'expect-chk.R' 'expect-conditional.R' 'expect-data.R' 'expect-datacomp.R' 'expect-exclusive.R' 'expect-labels.R' 'expect-proportion.R' 'expect-unique.R' 'expect_depends.R' 'reporter-excel.R' 'reporter-zzz.R' 'testdat-package.R' 'utils.R' 'zzz.R'

NeedsCompilation:

Packaged:

2023-09-03 23:49:23 UTC; danny.smith

Author:

Danny Smith [aut, cre], Kinto Behr [aut], The Social Research Centre [cph]

Maintainer:

Danny Smith <danny@gorcha.org>

Repository:

CRAN

Date/Publication:

2023-09-04 00:10:02 UTC

testdat: Data Unit Testing for R

Description

Test your data! An extension of the 'testthat' unit testing framework with a family of functions and reporting tools for checking and validating data frames.

Options

testdat.miss: A vector of values to consider missing (default: c(NA, "")).
testdat.miss_text: A vector of values to consider missing in text variables (default: c("error", "null", "0", ".", "-", ",", "na", "#n/a", "", NA)).
testdat.stop_on_fail: Should an expectation raise an error on failure? Useful for interactive use of expectation functions (default: TRUE).
testdat.scipen: When it is necessary to convert a numeric vector to character for checking, this value will be used for scipen (default: 999).

Author(s)

Maintainer: Danny Smith danny@gorcha.org

Authors:

Kinto Behr kinto.behr@srcentre.com.au

Other contributors:

The Social Research Centre [copyright holder]

Checks: dates

Description

Check that a vector conforms to a given date format such as YYYYMMDD.

Usage

chk_date_yyyymmdd(x)

chk_date_yyyymm(x)

chk_date_yyyy(x)

Arguments

x

A vector to check.

Value

A logical vector flagging records that have passed or failed the check.

Examples


date <- c(20210101, 20211301, 20210132, 202101, 2021)
chk_date_yyyymmdd(date)

date <- c(202101, 202112, 202113, 2021)
chk_date_yyyymm(date)

date <- c("0001", "1688", "1775", "1789", "1791", "1848")
chk_date_yyyy(date)

Defunct checking functions

Description

These functions are defunct.

chk_filter_where() works exactly like chk_filter_all(). When testdat used dplyr::vars() as standard chk_filter_where() provided an alternative interface using tidy-select.

Usage

chk_filter_vars(data, vars, func, flt = TRUE, args = list())

chk_filter_where(data, where, func, flt = TRUE, args = list())

Arguments

where

<tidy-select> Columns to check.

Value

A logical vector flagging records that have passed or failed the check.

Deprecated checking functions

Description

These functions are deprecated.

Usage

chk_length(x, len)

chk_miss(x, miss = getOption("testdat.miss_text"))

chk_nmiss(x, miss = getOption("testdat.miss_text"))

Arguments

x

A vector to check.

len

Maximum string length for checking string variables.

Value

A logical vector flagging records that have passed or failed the check.

Checks: dummy

Description

These functions provide common, simple data checks.

Usage

chk_dummy(x)

Arguments

x

A vector to check.

Value

A logical vector flagging records that have passed or failed the check.

Examples


chk_dummy(LETTERS)

Checks: data frame helpers

Description

These helper functions allowing easy checking using an arbitrary function (func) over multiple columns (vars) of a data frame (data), with an optional filter (flt).

Usage

chk_filter(data, vars, func, flt = TRUE, args = list())

chk_filter_all(data, vars, func, flt = TRUE, args = list())

chk_filter_any(data, vars, func, flt = TRUE, args = list())

Arguments

data

A data frame to check.

vars

<tidy-select> A set of columns to check.

func

A function to use for checking that takes a vector as the first argument and returns a logical vector of the same length showing whether an element passed or failed.

flt

<data-masking> A filter specifying a subset of the data frame to test.

args

A list of additional arguments to be added to the function calls.

Details

chk_filter() applies func with args to vars in data filtered with flt and returns a data frame containing the resulting logical vectors.
chk_filter_all() and chk_filter_any() both run chk_filter() and return a single logical vector flagging whether all or any values in each row are TRUE (i.e. the conjunction and disjunction, respectively, of the columns in the output of chk_filter()).

Value

A logical vector or data frame of logical vectors flagging records that have passed or failed the check, with NA where records do not meet the filter condition.

Examples


# Check that every 4-cylinder car has an engine displacement of < 100 cubic
# inches AND < 100 horsepower - return a data frame
chk_filter(
  mtcars,
  c("disp", "hp"),
  chk_range,
  cyl == 4,
  list(min = 0, max = 100)
)

# Check that every 4-cylinder car has an engine displacement of < 100 cubic
# inches AND < 100 horsepower
chk_filter_all(
  mtcars,
  c("disp", "hp"),
  chk_range,
  cyl == 4,
  list(min = 0, max = 100)
)

# Check that every 4-cylinder car has an engine displacement of < 100 cubic
# inches OR < 100 horsepower
chk_filter_any(
  mtcars,
  c("disp", "hp"),
  chk_range,
  cyl == 4,
  list(min = 0, max = 100)
)

# Check that columns made up of whole numbers are binary
chk_filter_all(
  mtcars,
  where(~ all(. %% 1 == 0)),
  chk_values,
  TRUE,
  list(0:1)
)

Checks: labels

Description

Check that a vector is labelled in a given way.

Usage

chk_labels(x, val_labels = NULL, var_label = NULL)

Arguments

x

A vector to check.

val_labels

What value label check should be performed? One of:

A character vector of expected value labels.
A named vector of expected label-value pairs.
TRUE to test for the presence of value labels in general.
FALSE to test for the absence of value labels.
NULL to ignore value labels when checking.

var_label

What variable label check should be performed? One of:

A character vector of expected variable labels.
TRUE to test for the presence of a variable labels.
FALSE to test for the absence of a variable labels.
NULL to ignore the variable label when checking.

Value

A logical vector flagging records that have passed or failed the check.

Examples


df <- data.frame(
  x = labelled::labelled(c("M", "M", "F"), c(Male = "M", Female = "F"), "Sex"),
  y = labelled::labelled(c("M", "M", "F"), c(Male = "M", Female = "F", Other = "X")),
  z = c("M", "M", "F")
)

# Check for a value-label pairing
chk_labels(df$x, c(Male = "M"))

# Check that two variables have the same values
chk_labels(df$x, labelled::val_labels(df$y))

# Check for the presence of a particular label
chk_labels(df$x, "Male")
chk_labels(df$x, var_label = "Sex")

# Check that a variable is labelled at all
chk_labels(df$z, val_labels = TRUE)
chk_labels(df$z, var_label = TRUE)

# Check that a variable isn't labelled
chk_labels(df$z, val_labels = FALSE)
chk_labels(df$z, var_label = FALSE)

Checks: patterns

Description

Check that a vector conforms to a certain pattern.

Usage

chk_regex(x, pattern)

chk_max_length(x, len)

Arguments

x

A vector to check.

pattern

A str_detect() pattern to match.

len

Maximum string length.

Value

A logical vector flagging records that have passed or failed the check.

Examples


x <- c("a_1", "b_2", "c_2", NA, "NULL")
chk_regex(x, "[a-z]_[0-9]")
chk_max_length(x, 3)

Checks: text

Description

Check character vectors for non-ASCII characters or common NULL value placeholders.

Usage

chk_ascii(x)

chk_text_miss(x, miss = getOption("testdat.miss_text"))

chk_text_nmiss(x, miss = getOption("testdat.miss_text"))

Arguments

x

A vector to check.

miss

A vector of values to be treated as missing. The testdat.miss or testdat.miss_text option is used by default.

Value

A logical vector flagging records that have passed or failed the check.

Examples


chk_ascii(c("a", "\U1f642")) # detect non-ASCII characters

imported_data <- c(1, "#n/a", 2, "", 3, NA)
chk_text_miss(imported_data)
chk_text_nmiss(imported_data) # Equivalent to !chk_text_miss(imported_data)

Checks: uniqueness

Description

Check that each value in a vector is unique.

Usage

chk_unique(x)

Arguments

x

A vector to check.

Value

A logical vector flagging records that have passed or failed the check.

Examples


x <- c(NA, 1:10, NA)
chk_unique(x)

x <- c(10, 1:10, 10)
chk_unique(x)

Checks: values

Description

Check that a vector contains only certain values.

Usage

chk_equals(x, val)

chk_values(x, ..., miss = getOption("testdat.miss"))

chk_range(x, min, max, ...)

chk_blank(x)

Arguments

x

A vector to check.

val

A scalar value for the equality check.

...

Vectors of valid values.

miss

A vector of values to be treated as missing. The testdat.miss or testdat.miss_text option is used by default.

min

Minimum value for range check.

max

Maximum value for range check.

Value

A logical vector flagging records that have passed or failed the check.

Examples


x <- c(NA, 0, 1, 0.5, 0, NA, 99)
chk_blank(x) # Blank
chk_equals(x, 0) # Either blank or 0
chk_values(x, 0, 1) # Either blank, 0, 1, or 99
chk_range(x, 0, 1) # Either blank or in [0,1]
chk_range(x, 0, 1, 99) # Either blank, in [0,1], or equal to 99

Expectations: consistency

Description

These functions test whether multiple conditions coexist.

Usage

expect_cond(cond1, cond2, data = get_testdata())

expect_base(
  var,
  base,
  miss = getOption("testdat.miss"),
  missing_valid = FALSE,
  data = get_testdata()
)

Arguments

cond1

<data-masking> First condition (antecedent) for consistency check.

cond2

<data-masking> Second condition (consequent) for consistency check.

data

A data frame to test. The global test data is used by default.

var

An unquoted column name to test.

base

<data-masking> The condition that determines which records should be non-missing.

miss

A vector of values to be treated as missing. The testdat.miss option is used by default.

missing_valid

Should missing values be treated as valid for records meeting the base condition? This allows 'one way' base checks. This is FALSE by default.

Value

⁠expect_*()⁠ functions are mainly called for their side effects. The expectation signals its result (e.g. "success", "failure"), which is logged by the current test reporter. In a non-testing context the expectation will raise an error with class expectation_failure if it fails.

Functions

expect_cond(): Checks the coexistence of two conditions. It can be read as "if cond1 then cond2".
expect_base(): A special case that checks missing data against a specified condition. It can be read as "if base then var not missing, if not base then var missing".

Examples

my_survey <- data.frame(
  resp_id = 1:5,
  q1a = c(0, 1, 0, 1, 0),
  q1b = c(NA, NA, NA, 1, 0), # Asked if q1a %in% 1
  q2a = c(90, 80, 60, 40, 90),
  q2b = c("", "", NA, "Some reason for low rating", "") # Asked if q2a < 50
)

# Check that q1b has a value if and only if q1a %in% 1
try(expect_base(q1b, q1a %in% 1, data = my_survey)) # Fails for resp_id 2 and 5

# Check that q2b has a value if and only if q2a < 50
expect_base(q2b, q2a < 50, data = my_survey)

# Check that if q1a %in% 0 then q2a > 50 (but not vice-versa)
expect_cond(q1a %in% 0, q2a > 50, data = my_survey)

Expectation params

Description

Expectation params

Arguments

var

An unquoted column name to test.

vars

<tidy-select> A set of columns to test.

flt

<data-masking> A filter specifying a subset of the data frame to test.

miss

A vector of values to be treated as missing. The testdat.miss option is used by default.

data

A data frame to test. The global test data is used by default.

Value

Expectations: comparisons

Description

These functions allow for comparison between two data frames.

Usage

expect_valmatch(
  data2,
  vars,
  by,
  not = FALSE,
  flt = TRUE,
  data = get_testdata()
)

expect_subset(data2, by = NULL, not = FALSE, flt = TRUE, data = get_testdata())

Arguments

data2

The data frame to compare against.

vars

<tidy-select> A set of columns to test.

by

A character vector of columns to join by. See dplyr::join() for details.

not

Reverse the results of the check?

flt

<data-masking> A filter specifying a subset of the data frame to test.

data

A data frame to test. The global test data is used by default.

Details

expect_valmatch() compares the observations appearing in one data frame (data) to the same observations, as picked out by a key (by), in another data frame (data2). It fails if the selected columns (vars) aren't the same for those observations in both data frames.
expect_subset() compares one data frame (data) to another (data2) and fails if all of the observations in the first, as picked out by a key (by), do not appear in the second.

Value

Examples


df1 <- data.frame(
  id = 0:99,
  binomial = sample(0:1, 100, TRUE),
  even = abs(0:99%%2 - 1) * 0:99
)

df2 <- data.frame(
  id = 0:99,
  binomial = sample(0:1, 100, TRUE),
  odd = 0:99%%2 *0:99
)


# Check that same records 'succeeded' across data frames
try(expect_valmatch(df2, binomial, by = "id", data = df1))

# Check that all records in `df1`, as picked out by `id`, exist in `df2`
expect_subset(df2, by = "id", data = df1)

Expectations: dates

Description

Test whether variables in a data frame conform to a given date format such as YYYYMMDD.

Usage

expect_date_yyyy(vars, flt = TRUE, data = get_testdata())

expect_date_yyyymm(vars, flt = TRUE, data = get_testdata())

expect_date_yyyymmdd(vars, flt = TRUE, data = get_testdata())

Arguments

vars

<tidy-select> A set of columns to test.

flt

<data-masking> A filter specifying a subset of the data frame to test.

data

A data frame to test. The global test data is used by default.

Value

Examples


sales <- data.frame(
  sale_id = 1:5,
  date = c("20200101", "20200101", "20200102", "20200103", "20220101"),
  quarter = c(202006, 202009, 202012, 20203, 20200101),
  published = c(1999, 19991, 21, 0001, 20200101)
)

try(expect_date_yyyymmdd(date, data = sales)) # Full date of sale valid
try(expect_date_yyyymm(quarter, data = sales)) # Quarters given as YYYYMM
try(expect_date_yyyy(published, data = sales)) # Publication years valid

Expectations: exclusivity

Description

expect_exclusive tests that vars are exclusive - that, if any one of vars is set to exc_val, no other column in vars or var_set is also set to exc_val.

Usage

expect_exclusive(vars, var_set, exc_val = 1, flt = TRUE, data = get_testdata())

Arguments

vars

<tidy-select> A set of columns to test.

var_set

<tidy-select> The full set of columns to check against. This should include all columns specified in the vars argument.

exc_val

The value that flags a variable as "selected" (default: 1)

flt

<data-masking> A filter specifying a subset of the data frame to test.

data

A data frame to test. The global test data is used by default.

Details

This expectation is designed to check exclusivity in survey multiple response sets, where one response is only valid on its own.

See the example data set below:

No record should have q10_98, "None of the above", selected while also having any other response selected, so we refer to this as an "exclusive" response.
expect_exclusive() checks whether q10_98 "None of the above" or q10_99 "Don't know", the exclusive responses, have been selected alongside any other ⁠q10_*⁠ response.
The expectation fails, since the first record has both q10_1 and q10_98 selected.

Value

Examples


my_q_block <- data.frame(
  resp_id = 1:5, # Unique to respondent
  q10_1  = c(1, 1, 0, 0, 0),
  q10_2  = c(0, 1, 0, 0, 0),
  q10_3  = c(0, 0, 1, 0, 0),
  q10_98 = c(1, 0, 0, 1, 0), # None of the above
  q10_99 = c(0, 0, 0, 0, 1)  # Item not answered
)

# Make sure that if "None of the above" and "Item skipped" are selected
# none of the other question options are selected:
try(
expect_exclusive(
  c(q10_98, q10_99),
  starts_with("q10_"),
  data = my_q_block
)
)

Defunct expectation functions

Description

These functions are defunct.

expect_where() works exactly like expect_all(). When testdat used dplyr::vars() as standard expect_where() provided an alternative interface using tidy-select.

Usage

filter_expect(data, expect_function, ..., not = TRUE)

expect_where(
  where,
  func,
  flt = TRUE,
  data = get_testdata(),
  args = list(),
  func_desc = NULL
)

Arguments

data

A data frame to test. The global test data is used by default.

expect_function

An expectation function.

...

Arguments to pass to expect_function.

not

Reverse the results of the check.

where

<tidy-select> Columns to check

flt

<data-masking> A filter specifying a subset of the data frame to test.

Value

The input data frame filtered to records failing the expectation.

Deprecated expectation functions

Description

These functions are deprecated.

Usage

expect_func(var, ...)

expect_join(data2, by = NULL, not = FALSE, flt = TRUE, data = get_testdata())

expect_similar(
  var,
  data2,
  var2,
  flt = TRUE,
  flt2 = flt,
  threshold = 0.05,
  min = 100,
  data = get_testdata()
)

expect_allany(
  vars,
  func,
  flt = TRUE,
  data = get_testdata(),
  args = list(),
  allany = c(chk_filter_all, chk_filter_any),
  func_desc = NULL
)

Arguments

var

An unquoted column name to test.

...

Arguments to pass to expect_allany().

data2

The data frame to compare against.

by

A character vector of columns to join by. See dplyr::join() for details.

not

Reverse the results of the check?

flt

<data-masking> A filter specifying a subset of the data frame to test.

data

A data frame to test. The global test data is used by default.

var2

An unquoted column name from data2.

flt2

A filter specifying a subset of data2 to test.

threshold

The maximum proportional difference allowed between the two categories.

min

The minimum number of responses for a category to allow comparison. This avoids small categories raising spurious errors.

vars

<tidy-select> A set of columns to test.

func

A function to use for testing that takes a vector as the first argument and returns a logical vector of the same length showing whether an element passed or failed.

args

A named list of arguments to pass to func.

allany

The function to combine the func results for each row.

func_desc

A human friendly description of func to use in the expectation failure message.

Value

Extension of 'expect' to allow inclusion of custom fields

Description

Use expect_custom to allow inclusion of arbitrary data in expectation results. Additional data is stored in a list in an attribute called custom in the resulting expectation. This allows data expectations to store information about the number of failed and successful cases for reporting of test results.

Usage

expect_custom(
  ok,
  failure_message,
  info = NULL,
  srcref = NULL,
  trace = NULL,
  ...
)

Arguments

ok

TRUE or FALSE indicating if the expectation was successful.

failure_message

Message to show if the expectation failed.

info

Character vector continuing additional information. Included for backward compatibility only and new expectations should not use it.

srcref

Location of the failure. Should only needed to be explicitly supplied when you need to forward a srcref captured elsewhere.

...

Additional data to be added to a list in the custom attribute of the resulting expectation.

Value

An expectation object. Signals the expectation condition with a continue_test restart.

Examples

# calling expect_custom directly with some custom data
x <- expect_custom(TRUE, "Test", extra_data = 1:5, more_data = "Hello")
str(x)

# an example expectation (note additional libraries used)
library(rlang)

expect_example <- function(var, data = get_testdata()) {
  act <- quasi_label(enquo(data))
  act$var_desc <- expr_label(get_expr(enquo(var)))
  act$var <- expr_text(get_expr(enquo(var)))

  act$result <- act$val[[act$var]] > 0
  act$result[is.na(act$result)] <- FALSE

  expect_custom(
    all(act$result, na.rm = TRUE),
    glue::glue("{act$lab} has {sum(!act$result, na.rm = TRUE)} cases with \\
                {act$var_desc} not greater than 0."),
    failed_count = sum(!act$result, na.rm = TRUE),
    total_count = sum(!is.na(act$result))
  )

  invisible(act$result)
}

try(expect_example(x, data = data.frame(x = c(NA, -2:2))))

Expectations: functional dependency

Description

Test whether one set of variables functionally depend on another set of variables.

Usage

expect_depends(vars, on, flt = TRUE, data = get_testdata())

Arguments

vars

<tidy-select> A set of columns to test.

on

<tidy-select> A set of columns which vars are expected to depend on.

flt

<data-masking> A filter specifying a subset of the data frame to test.

data

A data frame to test. The global test data is used by default.

Details

One set of variables, X, functionally depends on another, Y, if and only if each value in Y corresponds to exactly one value in X. For instance, course_duration and course_topic functionally depend on course_code if each course_code corresponds to just one combination of course_duration and ⁠course topic⁠. That is, if two records have the same course_code then they must have the same course_duration and course_topic.

See the wikipedia page for more information.

Value

Examples


student_course <- data.frame(
  student_id = 1:5,
  course_code = c(1, 2, 1, 3, 4),
  course_duration = c(12, 12, 12, 12, 12),
  course_topic = c("Song", "Dance", "Song", "Painting", "Pottery")
)

# Check that each `course_code` corresponds to exactly one combination of
# `course_duration` and `course_topic`
expect_depends(
  c(course_duration, course_topic),
  on = course_code,
  data = student_course
)

Create an expectation from a check function

Description

expect_make() creates an expectation from a vectorised checking function to allow simple generation of domain specific data checks.

Usage

expect_make(
  func,
  func_desc = NULL,
  vars = FALSE,
  all = TRUE,
  env = caller_env()
)

Arguments

func

A function whose first argument takes a vector to check, and returns a logical vector of the same length with the results.

func_desc

A character function description to use in the expectation failure message.

vars

Included for backwards compatibility only.

all

Function to use to combine results for each vector.

env

The parent environment of the function, defaults to the calling environment of expect_make().

Value

An ⁠expect_*()⁠ style function.

Examples

# Create a custom check
chk_binary <- function(x) {
  suppressWarnings(as.integer(x) %in% 0:1)
}

# Create custom expectation function
expect_binary <- expect_make(chk_binary)

# Validate a data frame
try(expect_binary(vs, data = mtcars))
try(expect_binary(cyl, data = mtcars))

Expectations: generic helpers

Description

These functions allow for testing of multiple columns (vars) of a data frame (data), with an optional filter (flt), using an arbitrary function (func).

Usage

expect_all(
  vars,
  func,
  flt = TRUE,
  data = get_testdata(),
  args = list(),
  func_desc = NULL
)

expect_any(
  vars,
  func,
  flt = TRUE,
  data = get_testdata(),
  args = list(),
  func_desc = NULL
)

Arguments

vars

<tidy-select> A set of columns to test.

func

A function to use for testing that takes a vector as the first argument and returns a logical vector of the same length showing whether an element passed or failed.

flt

<data-masking> A filter specifying a subset of the data frame to test.

data

A data frame to test. The global test data is used by default.

args

A named list of arguments to pass to func.

func_desc

A human friendly description of func to use in the expectation failure message.

Details

expect_allany() tests the columns in vars to see whether func returns TRUE for each of them, and combines the results for each row using the function in allany. Both expect_all() and expect_any() are wrappers around expect_allany().
expect_all() tests the vars to see whether func returns TRUE for all of them (i.e. whether the conjunction of results of applying func to each of the vars is TRUE).
expect_any() tests the vars to see whether func returns TRUE for any of them (i.e. whether the disjunction of the results of applying func to each of the vars is TRUE).

Value

Examples

# Check that every 4-cylinder car has an engine displacement of < 100 cubic
# inches *AND* < 100 horsepower
try(
expect_all(
  vars = c(disp, hp),
  func = chk_range,
  flt = (cyl == 4),
  args = list(min = 0, max = 100),
  data = mtcars
)
)

# Check that every 4-cylinder car has an engine displacement of < 100 cubic
# inches *OR* < 100 horsepower
try(
expect_any(
  vars = c(disp, hp),
  func = chk_range,
  flt = (cyl == 4),
  args = list(min = 0, max = 100),
  data = mtcars
)
)

# Check that all variables are numeric:
try(expect_all(
  vars = everything(),
  func = is.numeric,
  data = iris
))

Get/set test data

Description

A global test data set is used to avoid having to re-specify the testing data frame in every test. These functions get and set the global data or set the data for the current context.

Usage

set_testdata(data, quosure = TRUE)

get_testdata()

with_testdata(data, code, quosure = TRUE)

data %E>% code

Arguments

data

Data frame to be used.

quosure

If TRUE, the default, the data frame is stored as a quosure and lazily evaluated when get_testdata() is called, so get_testdata() will return the current state of the data frame.

If FALSE, the data frame will be copied and get_testdata() will return the state of the data frame at the time set_testdata() was called.

code

Code to execute with the test data set to data.

Value

set_testdata() invisibly returns the previous test data. The test data is returned as it was stored - if it was stored with quosure = TRUE it will be returned as a quosure.
get_testdata() returns the current test data frame.
with_testdata() and the test data pipe ⁠%E>%⁠ invisibly return the input data for easy piping.

Examples

set_testdata(mtcars)
head(get_testdata())

with_testdata(iris, {
  x <- get_testdata()
  print(head(x))
})

mtcars %E>%
  expect_base(mpg, TRUE) %E>%
  expect_range(carb, 1, 8)

Expectations: labels

Description

Test whether variables in a data frame are labelled in a given way.

Usage

expect_labels(
  vars,
  val_labels = NULL,
  var_label = NULL,
  flt = TRUE,
  data = get_testdata()
)

Arguments

vars

<tidy-select> A set of columns to test.

val_labels

What value label check should be performed? One of:

A character vector of expected value labels.
A named vector of expected label-value pairs.
TRUE to test for the presence of value labels in general.
FALSE to test for the absence of value labels.
NULL to ignore value labels when checking.

var_label

What variable label check should be performed? One of:

A character vector of expected variable labels.
TRUE to test for the presence of a variable labels.
FALSE to test for the absence of a variable labels.
NULL to ignore the variable label when checking.

flt

<data-masking> A filter specifying a subset of the data frame to test.

data

A data frame to test. The global test data is used by default.

Value

Examples


df <- data.frame(
  x = labelled::labelled(c("M", "M", "F"), c(Male = "M", Female = "F"), "Sex"),
  y = labelled::labelled(c("M", "M", "F"), c(Male = "M", Female = "F", Other = "X")),
  z = c("M", "M", "F")
)

# Check for a value-label pairing
try(expect_labels(x, c(Male = "M"), data = df))

# Check that two variables have the same values
expect_labels(x, labelled::val_labels(df$y), data = df) # N.B. This passes!

# Check for the presence of a particular label
try(expect_labels(x, "Male", data = df))
expect_labels(x, var_label = "Sex", data = df)

# Check that a variable is labelled at all
try(expect_labels(z, val_labels = TRUE, data = df))
try(expect_labels(z, var_label = TRUE, data = df))

# Check that a variable isn't labelled
expect_labels(z, val_labels = FALSE, data = df)
expect_labels(z, var_label = FALSE, data = df)

Output `ListReporter` results in Excel format

Description

Output formatted ListReporter results to an Excel workbook using openxlsx. The workbook consists of a summary sheet showing aggregated results for each context, and one sheet per context showing details of each unsuccessful test.

Usage

output_results_excel(results, file)

Arguments

results

An object of class testthat_results, e.g. output from test_dir() or test_file().

file

Output file name

Value

The return value of openxlsx::saveWorkbook().

Examples

## Not run: 
# Output the results from running all tests in a directory
x <- test_dir(".")
output_results_excel(x, "Test results.xlsx")

## End(Not run)

Expectations: patterns

Description

Test whether variables in a data frame conform to a given pattern.

Usage

expect_regex(vars, pattern, flt = TRUE, data = get_testdata())

expect_max_length(vars, len, flt = TRUE, data = get_testdata())

Arguments

vars

<tidy-select> A set of columns to test.

pattern

A str_detect() pattern to match.

flt

<data-masking> A filter specifying a subset of the data frame to test.

data

A data frame to test. The global test data is used by default.

len

Maximum string length.

Value

Examples


sales <- data.frame(
  sale_id = 1:5,
  item_code = c("a_1", "b_2", "c_2", NA, "NULL")
)

try(expect_regex(item_code, "[a-z]_[0-9]", data = sales)) # Codes match regex
try(expect_max_length(item_code,  3, data = sales)) # Code width <= 3

Expectations: proportions

Description

These test the proportion of data in a data frame satisfying some condition. The generic functions, expect_prop_lte() and expect_prop_gte(), can be used with any arbitrary function. The ⁠chk_*()⁠ functions, like chk_values(), are useful in this regard.

Usage

expect_prop_lte(
  var,
  func,
  prop,
  flt = TRUE,
  data = get_testdata(),
  args = list(),
  func_desc = NULL
)

expect_prop_gte(
  var,
  func,
  prop,
  flt = TRUE,
  data = get_testdata(),
  args = list(),
  func_desc = NULL
)

expect_prop_nmiss(
  var,
  prop,
  miss = getOption("testdat.miss"),
  flt = TRUE,
  data = get_testdata()
)

expect_prop_values(var, prop, ..., flt = TRUE, data = get_testdata())

Arguments

var

An unquoted column name to test.

func

A function to use for testing that takes a vector as the first argument and returns a logical vector of the same length showing whether an element passed or failed.

prop

The proportion of the data frame expected to satisfy the condition.

flt

<data-masking> A filter specifying a subset of the data frame to test.

data

A data frame to test. The global test data is used by default.

args

A named list of arguments to pass to func.

func_desc

A human friendly description of func to use in the expectation failure message.

miss

A vector of values to be treated as missing. The testdat.miss option is used by default.

...

Vectors of valid values.

Details

Given the use of quasi-quotation within these functions, to make a new functions using one of the generics such as expect_prop_gte() one must defuse the var argument using the embracing operator {{ }}. See the examples sections for an example.

Value

Examples

sales <- data.frame(
  sale_id = 1:5,
  date = c("20200101", "20200101", "20200102", "20200103", "2020003"),
  sale_price = c(10, 20, 30, 40, -1),
  book_title = c(
    "Phenomenology of Spirit",
    NA,
    "Critique of Practical Reason",
    "Spirit of Trust",
    "Empiricism and the Philosophy of Mind"
  ),
  stringsAsFactors = FALSE
)

# Create a custom expectation
expect_prop_length <- function(var, len, prop, data) {
  expect_prop_gte(
    var = {{var}}, # Notice the use of the embracing operator
    func = chk_max_length,
    prop = prop,
    data = data,
    args = list(len = len),
    func_desc = "length_check"
  )
}

# Use it to check that dates are mostly <= 8 char wide
expect_prop_length(date, 8, 0.9, sales)

# Check price values mostly between 0 and 100
try(expect_prop_values(sale_price, 0.9, 1:100, data = sales))

Objects exported from other packages

Description

These objects are imported from other packages. Follow the links below to see their documentation.

tidyselect: all_of, any_of, contains, ends_with, everything, last_col, matches, num_range, one_of, starts_with

Deprecated reporter functions

Description

These functions are deprecated.

See Get/set test data for working with global test data.

Usage

context_data(data)

Arguments

data

Data frame to be used.

Expectations: text

Description

Test whether variables in a data frame contain common NULL placeholders.

Usage

expect_text_miss(
  vars,
  miss = getOption("testdat.miss_text"),
  flt = TRUE,
  data = get_testdata()
)

expect_text_nmiss(
  vars,
  miss = getOption("testdat.miss_text"),
  flt = TRUE,
  data = get_testdata()
)

Arguments

vars

<tidy-select> A set of columns to test.

miss

A vector of values to be treated as missing. The testdat.miss or testdat.miss_text option is used by default.

flt

<data-masking> A filter specifying a subset of the data frame to test.

data

A data frame to test. The global test data is used by default.

Value

Examples


sales <- data.frame(
  sale_id = 1:5,
  date = c("20200101", "null", "20200102", "20200103", "null"),
  sale_price = c(10, -1, 30, 40, -1)
)

# Dates not missing
try(expect_text_nmiss(date, data = sales))

# Date missing if price negative
try(expect_text_miss(date, flt = sale_price %in% -1, data = sales))

Expectations: uniqueness

Description

These functions test variables for uniqueness.

Usage

expect_unique(
  vars,
  exclude = getOption("testdat.miss"),
  flt = TRUE,
  data = get_testdata()
)

expect_unique_across(
  vars,
  exclude = getOption("testdat.miss"),
  flt = TRUE,
  data = get_testdata()
)

expect_unique_combine(
  vars,
  exclude = getOption("testdat.miss"),
  flt = TRUE,
  data = get_testdata()
)

Arguments

vars

<tidy-select> A set of columns to test.

exclude

a vector of values to exclude from uniqueness check. The testdat.miss option is used by default. To include all values, set exclude = NULL.

flt

<data-masking> A filter specifying a subset of the data frame to test.

data

A data frame to test. The global test data is used by default.

Details

expect_unique() tests a set of columns (vars) and fails if the combined columns do not uniquely identify each row.
expect_unique_across() tests a set of columns (vars) and fails if each row does not have unique values in each column.
expect_unique_combine() tests a set of columns (vars) and fails if any value appears more than once across all of them.

By default the uniqueness check excludes missing values (as specified by the testdat.miss option). Setting exclude = NULL will include all values.

Value

Examples


student_fruit_preferences <- data.frame(
  student_id = c(1:5, NA, NA),
  apple = c(1, 1, 1, 1, 99, NA, NA),
  orange = c(2, 3, 2, 3, 99, NA, NA),
  banana = c(3, 2, 3, 2, 99, NA, NA),
  phone1 = c(123, 456, 789, 987, 654, NA, NA),
  phone2 = c(345, 678, 987, 567, 000, NA, NA)
)

# Check that key is unique, excluding NAs by default
expect_unique(student_id, data = student_fruit_preferences)

# Check that key is unique, including NAs
try(expect_unique(student_id, exclude = NULL, data = student_fruit_preferences))

# Check each fruit has unique preference number
try(
expect_unique_across(
  c(apple, orange, banana),
  data = student_fruit_preferences
)
)

# Check each fruit has unique preference number, allowing multiple 99 (item
# skipped) codes
expect_unique_across(
  c(apple, orange, banana),
  exclude = c(99, NA), data = student_fruit_preferences
)

# Check that each phone number appears at most once
try(expect_unique_combine(c(phone1, phone2), data = student_fruit_preferences))

Expectations: values

Description

Test whether variables in a data frame contain only certain values.

Usage

expect_values(
  vars,
  ...,
  miss = getOption("testdat.miss"),
  flt = TRUE,
  data = get_testdata()
)

expect_range(vars, min, max, ..., flt = TRUE, data = get_testdata())

Arguments

vars

<tidy-select> A set of columns to test.

...

Vectors of valid values.

miss

A vector of values to be treated as missing. The testdat.miss or testdat.miss_text option is used by default.

flt

<data-masking> A filter specifying a subset of the data frame to test.

data

A data frame to test. The global test data is used by default.

min

Minimum value for range check.

max

Maximum value for range check.

Value

Examples


sales <- data.frame(
  sale_id = 1:5,
  date = c("20200101", "20200101", "20200102", "20200103", "20220101"),
  sale_price = c(10, 20, 30, 40, -1)
)

try(expect_values(date, 20000000:20210000, data = sales)) # Dates between 2000 and 2021
try(expect_range(sale_price, min = 0, max = Inf, data = sales)) # Prices non-negative

testdat: Data Unit Testing for R

Description

Options

Author(s)

See Also

Checks: dates

Description

Usage

Arguments

Value

See Also

Examples

Defunct checking functions

Description

Usage

Arguments

Value

See Also

Deprecated checking functions

Description

Usage

Arguments

Value

See Also

Checks: dummy

Description

Usage

Arguments

Value

See Also

Examples

Checks: data frame helpers

Description

Usage

Arguments

Details

Value

See Also

Examples

Checks: labels

Description

Usage

Arguments

Value

See Also

Examples

Checks: patterns

Description

Usage

Arguments

Value

See Also

Examples

Checks: text

Description

Usage

Arguments

Value

See Also

Examples

Checks: uniqueness

Description

Usage

Arguments

Value

See Also

Examples

Checks: values

Description

Usage

Arguments

Value

See Also

Examples

Expectations: consistency

Description

Usage

Arguments

Value

Functions