100% AI-free: we did not use any AI technologies in developing this package.
The goal of mrap is to provide wrapper functions to reduce the user’s
effort in writing machine-readable data with the dtreg package. The
set of all-in-one wrappers will cover functions from stats
and other well-known packages. These are very easy to use, see Example III: an all-in-one wrapper for anova. The
package also contains wrappers for analytical schemata used by TIB Knowledge Loom. This
vignette discusses in detail how to apply such a wrapper to write the
results of your data analysis as JSON-LD in five steps:
Select a wrapper for the schema you will use.
Check the types of arguments the wrapper requires.
Create an instance of the schema-related class.
Modify the instance by setting or correcting its fields manually.
Write the finalised instance as a machine-readable JSON-LD file.
To select a wrapper for an analytical schema, please check the help page. For
instance, for a t-test you will need a group_comparison
wrapper.
The wrappers are very easy in use, when the required arguments are specified correctly, which is crucial for transparent reporting of results. This section explains how to do it.
Argument code_string should be a string (in R, a
character vector). The argument cannot be omitted; please indicate “N/A”
if this information is not provided. In Example
I, we use the following code
string:'stats::t.test(setosa, virginica, var.equal = FALSE)'
To specify the name of the package in the code is always a good
practice. In mrap, we made it a requirement, and you will get an error
message if the code_string does not contain
package::function. In most cases, it is the beginning of
the string, but we allow for generic method summary, in this case it is
summary(package::function(formula)). For base R, please
indicate base::.
Your data can be a string (URL), a named list, or a data frame (see
Input data below). In case of a string, you can add
the data name manually (see Modify the instance);
if your data is a named list, as in Example I,
mrap easily extracts the elements’ names. In these cases, the
code_string does not play a role, and the data name is not
specified in it. However, if your data is a single data frame, and you
want mrap to extract its name from the code_string, please
indicate it as 'data = dataset_name'(e.g.,
'data = iris'), although most R packages allow for merely
dataset_name.
Our wrappers extract the name of a target variable from the
code_string if the variable is before the ~
sign in the formula:
"package::function(Petal.Length ~ Species), data = iris"
"package::function(iris$Petal.Length ~ iris$Species), data = iris"We also allow for a few target variables in special cases such as MANOVA:
Alternatively, a target variable can be explicitly specified in two or more vectors:
In the following case we cannot extract the name, and you can add the target label manually to the instance:
You will get a warning reminding to do it.
In code_string, level variable is recognized by our
wrappers in “x | level” or “x || level” syntax:
"lme4::lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy)"
"lme4::lmer(Reaction ~ Days + (Days || Subject), data = sleepstudy)"A level can be written more than once in a formula, in this case mrap also recognizes it:
More than one level is possible, mrap will capture all level names:
If we cannot extract the name, you will get a warning reminding you to add the level label manually to the instance.
Argument input_data can be:
species_list <- list("setosa" = setosa, "virginica" = virginica)
# check it is a list
is.list(species_list)
# check that the list is named
names(species_list)Please be sure that the argument is one of these three types. You will get an error message if a type is wrong (for instance, a list instead of a named list).
Argument test_results can be either a data frame or a
list of data frames. You can check whether you are writing down the
argument correctly. For a data frame:
For a list of data frames:
# assume you have a few data frames in a list
iris_new <- iris[, -1]
my_results <- list(iris, iris_new)
# check each of them in a loop
for (element in my_results) {
print(is.data.frame(element))
}Argument named_list_results is only used for the
algorithm_evaluation schema.
Now when we know which arguments to use, let us create a
group_comparison instance as in Example
I:
inst_gc <-
mrap::group_comparison(
"stats::t.test(setosa, virginica, var.equal = FALSE)",
list("setosa" = setosa, "virginica" = virginica),
df_results
)Here, the code_string is a string and contains the
package name; there is no need for the data name as the
input data argument is specified as a named list; and the
test_result argument is a data frame.
For the instance specified above, you will receive a warning message: “Target label is not available, you can set it manually”. Let us add the target name:
This is how you can add or correct any information after creating an instance.
data_analysis instanceThe data_analysis instance should include all analytic
instances. For one instance:
For more than one instance, use a list:
Let us assume you conducted a t-test on the Iris data comparing petal length in setosa and virginica species:
data(iris)
library(dplyr)
setosa <- iris |>
dplyr::filter(Species == "setosa") |>
dplyr::select(Petal.Length)
virginica <- iris |>
dplyr::filter(Species == "virginica") |>
dplyr::select(Petal.Length)
tt <- stats::t.test(setosa, virginica, var.equal = FALSE)The results of the test should be presented as a data frame:
df_results <- data.frame(
t.statistic = tt$statistic,
df = tt$parameter,
p.value = tt$p.value
)
rownames(df_results) <- "value"Now, let us follow the steps described above to create a
group_comparison instance, modify it, include in
data_analysis instance, and write it as a JSON-LD file:
To report an algorithm performance, you write the evaluation results as a named list:
Typically, there is no specific line of code to report as
code_string, therefore “N/A” is allowed, as explained in
the Code string section above. The data is
reported as a URL string:
You need to add the name of the algorithm and the task manually:
This can be further included in the data_analysis instance and written as JSON-LD file as explained above.
Currently, mrap contains an all-in-one wrapper for
stats::aov function, and more such wrappers will be added
in the future. Let us assume you are currently using
stats::aov for conducting your ANOVA tests:
The all-in-one wrapper is as easy in use as the original function:
The wrapper returns a list, the first element of which is the resulting object from the original function:
The second element is a group_comparison instance:
The instance includes all required information. Of course, there is still a possibility to modify it, e.g., to add a label:
This can be further included in the data_analysis instance and written as JSON-LD file as explained above.