--- title: "The labelled_spss_survey class" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{The labelled_spss_survey class} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(retroharmonize) ``` Use the `labelled_spss_survey()` helper function to create vectors of class *retroharmonize_labelled_spss_survey*. ```{r} sl1 <- labelled_spss_survey ( x = c(1,1,0,8,8,8), labels = c("yes" =1, "no" = 0, "declined" = 8), label = "Do you agree?", na_values = 8, id = "survey1") print(sl1) ``` You can check the type: ```{r} is.labelled_spss_survey (sl1) ``` The `labelled_spss_survey()` class inherits some properties from `haven::labelled()`, which can be manipulated by the `labelled` package (See particularly the vignette *Introduction to labelled* by Joseph Larmarange.) ```{r} haven::is.labelled(sl1) ``` ```{r} labelled::val_labels(sl1) ``` ```{r} labelled::na_values(sl1) ``` It can also be subsetted: ```{r} sl1[3:4] ``` When used within the modernized version of *data.frame*, `tibble::tibble()`, the summary of the variable content prints in an informative way. ```{r} df <- tibble::tibble (v1 = sl1) ## Use tibble instead of data.frame(v1=sl1) ... print(df) ## ... which inherits the methods of a data.frame subset(df, v1 == 1) ``` ## Coercion rules and type casting To avoid any confusion with mis-labelled surveys, coercion with double or integer vectors will result in a double or integer vector. The use of `vctrs::vec_c` is generally safer than base R `c()`. ```{r} #double c(sl1, 1/7) vctrs::vec_c(sl1, 1/7) ``` ```{r integer} c(sl1, 1:3) ``` Conversion to character works as expected: ```{r character} as.character(sl1) ``` The base `as.factor` converts to integer and uses the integers as levels, because base R factors are integers with a `levels` attribute. ```{r as.factor} as.factor(sl1) ``` Conversion to factor with `as_factor` converts the value labels to factor levels: ```{r as_factor} as_factor(sl1) ``` Similarly, when converting to numeric types, we have to convert the user-defined missing values to `NA` values used in the R language. For numerical analysis, convert with `as_numeric`. ```{r numerics} as.numeric(sl1) as_numeric(sl1) ``` ## Arithmetics The median value is correctly displayed, because user-defined missing values are removed from the calculation. Only a few arithmetic methods are implemented, such as * median() ```{r} median (as.numeric(sl1)) median (sl1) ``` * quantile() ```{r} quantile (as.numeric(sl1), 0.9) quantile (sl1, 0.9) ``` * mean() ```{r} mean (as.numeric(sl1)) mean (sl1) mean (sl1, na.rm=TRUE) ``` * weighted.mean() - always removes NA values. ```{r} weights1 <- runif (n = 6, min = 0, max = 1) weighted.mean(as.numeric(sl1), weights1) weighted.mean(sl1, weights1) ``` * sum() ```{r} sum (as.numeric(sl1)) sum (sl1, na.rm=TRUE) ``` The result of the conversion to numeric can be used for other mathematical / statistical function. ```{r} as_numeric(sl1) min ( as_numeric(sl1)) min ( as_numeric(sl1), na.rm=TRUE) ```