Title: | Unified Framework for Data Quality Control |
Version: | 0.1.0 |
Maintainer: | Luis Garcez <luisgarcez1@gmail.com> |
Description: | An easy framework to set a quality control workflow on a dataset. Includes a various range of functions that allow to establish an adaptable data quality control. |
Imports: | dplyr, stringr, janitor, openxlsx, readxl |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.1.1 |
URL: | https://github.com/luisgarcez11/qualitycontrol |
BugReports: | https://github.com/luisgarcez11/qualitycontrol/issues |
Suggests: | knitr, rmarkdown, testthat |
Depends: | R (≥ 2.10) |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2022-11-25 13:16:49 UTC; jjferreira-admin |
Author: | Luis Garcez |
Repository: | CRAN |
Date/Publication: | 2022-11-28 09:30:02 UTC |
Amyotrophic lateral sclerosis Example dataset
Description
An Amyotrophic lateral sclerosis related example dataset.
Usage
als_data
Format
A list
subjidSubject ID
p1ALSFRS-R 1
p2ALSFRS-R 2
p3ALSFRS-R 3
p4ALSFRS-R 4
p5ALSFRS-R 5
p6ALSFRS-R 6
p7ALSFRS-R 7
p8ALSFRS-R 8
p9ALSFRS-R 9
x1rALSFRS-R R1
x2rALSFRS-R R2
x3rALSFRS-R R3
age_at_baselineAge at baseline
age_at_onsetAge at onsite
onsetRegion of onset
baseline_dateBaseline date3
death_dateDeath date
An example dataset containing a Quality Control mapping
Description
An example dataset containing a Quality Control mapping
Usage
als_data_qc_mapping
Format
A list of 3 tibbles
.
missingTable with all the 'missing' tests.
inconsistenciesTable with all the 'inconsistencies' tests.
rangeTable with all the 'out of range' tests.
QC dataset using a specific variable mapping
Description
QC dataset using a specific variable mapping
Usage
qc_data(data, qc_mapping, output_file = NULL)
Arguments
data |
A data frame, data frame extension (e.g. a |
qc_mapping |
A list of data frame or data frame extension (e.g. a |
output_file |
(optional) File path ended in |
Value
A data frame containing all the findings.
Examples
qc_data(als_data, als_data_qc_mapping)
Read Quality Control mapping file
Description
read_qc_mapping
reads an .xlsx
file that contains
the QC mapping.
Usage
read_qc_mapping(path)
Arguments
path |
excel file path to be read. Each tab should contain 3 tabs with the names missing, inconsistencies and range. Each tab will correspond to one QC mapping table. QC mapping
The columns specified above should contain specific values:
|
Value
A list containing all the QC mapping tables
Test if variable values are duplicated
Description
Test if variable values are duplicated
Usage
test_duplicated(data, variable)
Arguments
data |
data to be tested. |
variable |
The variable to be tested. |
Value
A data frame containing all the findings regarding the applied test.
Examples
test_duplicated(als_data, 'subjid')
Test the inconsistencies between variables on a dataset
Description
Test the inconsistencies between variables on a dataset
Usage
test_inconsistencies(data, variable1, variable2, relation)
Arguments
data |
data to be tested. |
variable1 |
The variable to be tested. |
variable2 |
The variable to be tested. |
relation |
String such as 'greater_than', 'greater_than_or_equal' 'lower_than_or_equal' and 'lower_than'. |
Value
A data frame containing all the findings regarding the applied test.
Examples
test_inconsistencies(als_data, 'baseline_date', 'death_date', relation = 'lower_than')
test_inconsistencies(als_data, 'age_at_baseline', 'age_at_onset', relation = 'greater_than')
Test the variable missingness on a dataset
Description
Test the variable missingness on a dataset
Usage
test_missing(data, variable)
Arguments
data |
data to be tested. |
variable |
The variable to be tested. |
Value
A data frame containing all the findings regarding the applied test.
Examples
test_missing(als_data, 'p8')
test_missing(als_data, 'p1')
Test the range of a variable on a dataset
Description
Test the range of a variable on a dataset
Usage
test_range(
data,
variable,
type,
categories = NULL,
lower_value = NULL,
upper_value = NULL
)
Arguments
data |
data to be tested. |
variable |
The variable to be tested. |
type |
String such as 'categorical', 'date' or 'numeric' |
categories |
Only to be filled if |
lower_value |
Only to be filled if |
upper_value |
Only to be filled if |
Value
A data frame containing all the findings regarding the applied test.
Examples
test_range(als_data, 'onset', c('bulbar','respiratory', 'spinal'), type = 'categorical')
test_range(als_data, 'age_at_baseline', lower_value = 20, upper_value = 100,
type = 'numeric')
test_range(als_data, 'age_at_onset', lower_value = 20, upper_value = 100,
type = 'numeric')
test_range(als_data, 'baseline_date', lower_value = '2000-01-01', upper_value = '2022-01-01',
type = 'date')
test_range(als_data, 'death_date', lower_value = '2000-01-01', upper_value = '2022-01-01',
type = 'date')