Introduction to mrap

100% AI-free: we did not use any AI technologies in developing this package.

library(mrap)

The goal of mrap is to provide wrapper functions to reduce the user’s effort in writing machine-readable data with the dtreg package. The set of all-in-one wrappers will cover functions from stats and other well-known packages. These are very easy to use, see Example III: an all-in-one wrapper for anova. The package also contains wrappers for analytical schemata used by TIB Knowledge Loom. This vignette discusses in detail how to apply such a wrapper to write the results of your data analysis as JSON-LD in five steps:

1. Select a wrapper

To select a wrapper for an analytical schema, please check the help page. For instance, for a t-test you will need a group_comparison wrapper.

2. Check arguments

The wrappers are very easy in use, when the required arguments are specified correctly, which is crucial for transparent reporting of results. This section explains how to do it.

2.1. Code string

Argument code_string should be a string (in R, a character vector). The argument cannot be omitted; please indicate “N/A” if this information is not provided. In Example I, we use the following code string:'stats::t.test(setosa, virginica, var.equal = FALSE)'

Package name

To specify the name of the package in the code is always a good practice. In mrap, we made it a requirement, and you will get an error message if the code_string does not contain package::function. In most cases, it is the beginning of the string, but we allow for generic method summary, in this case it is summary(package::function(formula)). For base R, please indicate base::.

Data name

Your data can be a string (URL), a named list, or a data frame (see Input data below). In case of a string, you can add the data name manually (see Modify the instance); if your data is a named list, as in Example I, mrap easily extracts the elements’ names. In these cases, the code_string does not play a role, and the data name is not specified in it. However, if your data is a single data frame, and you want mrap to extract its name from the code_string, please indicate it as 'data = dataset_name'(e.g., 'data = iris'), although most R packages allow for merely dataset_name.

Target variable(s)

Our wrappers extract the name of a target variable from the code_string if the variable is before the ~ sign in the formula:

"package::function(Petal.Length ~ Species), data = iris"
"package::function(iris$Petal.Length ~ iris$Species), data = iris"

We also allow for a few target variables in special cases such as MANOVA:

"package::function(cbind(Petal.Length, Petal.Width) ~ Species), data = iris"

Alternatively, a target variable can be explicitly specified in two or more vectors:

"package::function(setosa$Petal.Length, virginica$Petal.Length)"

In the following case we cannot extract the name, and you can add the target label manually to the instance:

"package::function(one_vector, another_vector)"

You will get a warning reminding to do it.

Level variable(s)

In code_string, level variable is recognized by our wrappers in “x | level” or “x || level” syntax:

"lme4::lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy)"
"lme4::lmer(Reaction ~ Days + (Days || Subject), data = sleepstudy)"

A level can be written more than once in a formula, in this case mrap also recognizes it:

"lme4::lmer(math ~ homework + (homework | schid) + (class_size | schid))"

More than one level is possible, mrap will capture all level names:

"lme4::lmer(math ~ homework + (1 | schid) + (1 | classid))"

If we cannot extract the name, you will get a warning reminding you to add the level label manually to the instance.

2.2. Input data

Argument input_data can be:

is.character("ABC")
is.data.frame(iris)
species_list <- list("setosa" = setosa, "virginica" = virginica)
# check it is a list
is.list(species_list)
# check that the list is named
names(species_list)

Please be sure that the argument is one of these three types. You will get an error message if a type is wrong (for instance, a list instead of a named list).

2.3. Test results or named list results

Argument test_results can be either a data frame or a list of data frames. You can check whether you are writing down the argument correctly. For a data frame:

is.data.frame(iris)

For a list of data frames:

# assume you have a few data frames in a list
iris_new <- iris[, -1]
my_results <- list(iris, iris_new)
# check each of them in a loop
for (element in my_results) {
  print(is.data.frame(element))
}

Argument named_list_results is only used for the algorithm_evaluation schema.

3. Create an instance

Now when we know which arguments to use, let us create a group_comparison instance as in Example I:

inst_gc <-
  mrap::group_comparison(
    "stats::t.test(setosa, virginica, var.equal = FALSE)",
    list("setosa" = setosa, "virginica" = virginica),
    df_results
  )

Here, the code_string is a string and contains the package name; there is no need for the data name as the input data argument is specified as a named list; and the test_result argument is a data frame.

4. Modify the instance

For the instance specified above, you will receive a warning message: “Target label is not available, you can set it manually”. Let us add the target name:

inst_gc$targets <- "Petal.Length"

This is how you can add or correct any information after creating an instance.

5. Include the instance into the overarching data_analysis instance

The data_analysis instance should include all analytic instances. For one instance:

inst_da <- mrap::data_analysis(inst_gc)

For more than one instance, use a list:

inst_da_all <- mrap::data_analysis(list(inst_preprocessing, inst_regression))

6. Write JSON-LD

json <- mrap::to_jsonld(inst_da)
write(json, "data-analysis-1.json")

Example I: group comparison

Let us assume you conducted a t-test on the Iris data comparing petal length in setosa and virginica species:

data(iris)
library(dplyr)
setosa <- iris |>
  dplyr::filter(Species == "setosa") |>
  dplyr::select(Petal.Length)
virginica <- iris |>
  dplyr::filter(Species == "virginica") |>
  dplyr::select(Petal.Length)
tt <- stats::t.test(setosa, virginica, var.equal = FALSE)

The results of the test should be presented as a data frame:

df_results <- data.frame(
  t.statistic = tt$statistic,
  df = tt$parameter,
  p.value = tt$p.value
)
rownames(df_results) <- "value"

Now, let us follow the steps described above to create a group_comparison instance, modify it, include in data_analysis instance, and write it as a JSON-LD file:

inst_gc <-
  mrap::group_comparison(
    "stats::t.test(setosa, virginica, var.equal = FALSE)",
    list("setosa" = setosa, "virginica" = virginica),
    df_results
  )

inst_gc$targets <- "Petal.Length"
inst_da <- mrap::data_analysis(inst_gc)
json <- mrap::to_jsonld(inst_da)
write(json, "data-analysis-1.json")

Example II: algorithm evaluation

To report an algorithm performance, you write the evaluation results as a named list:

eval_results <- list(F1 = 0.46, recall = 0.51)

Typically, there is no specific line of code to report as code_string, therefore “N/A” is allowed, as explained in the Code string section above. The data is reported as a URL string:

inst_ae <- algorithm_evaluation("N/A", "data_url", eval_results)

You need to add the name of the algorithm and the task manually:

inst_ae$evaluates <- "my_algorithm_name"
inst_ae$evaluates_for <- "Classification"

This can be further included in the data_analysis instance and written as JSON-LD file as explained above.

Example III: an all-in-one wrapper for anova

Currently, mrap contains an all-in-one wrapper for stats::aov function, and more such wrappers will be added in the future. Let us assume you are currently using stats::aov for conducting your ANOVA tests:

data(iris)
anova_stats_results <- stats::aov(Petal.Length ~ Species, data = iris)

The all-in-one wrapper is as easy in use as the original function:

aov <- mrap::stats_aov(Petal.Length ~ Species, data = iris)

The wrapper returns a list, the first element of which is the resulting object from the original function:

anova_mrap_results <- aov$anova

The second element is a group_comparison instance:

inst_gc_anova <- aov$dtreg_object

The instance includes all required information. Of course, there is still a possibility to modify it, e.g., to add a label:

inst_gc_anova$label <- "my_fancy_results"

This can be further included in the data_analysis instance and written as JSON-LD file as explained above.