--- title: "How to use statisR" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{How to use statisR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 8, fig.height = 5 ) ``` ```{r setup, echo=FALSE} library(statisR) ``` ## statisR Package version 1.0 ## Oldemar Rodríguez R. ## Installing the package ### CRAN ```{r, eval=FALSE} install.packages("statisR", dependencies=TRUE) ``` ## STATIS Method ### How to read a Table from a CSV file? ```{r} DT1 <- read.table("STATIS_TABLE1.csv", header = TRUE, sep = ",", dec = ".", row.names = 1) DT1 ``` ```{r, echo=FALSE} DT2 <- read.table("STATIS_TABLE2.csv", header = TRUE, sep = ",", dec = ".", row.names = 1) DT3 <- read.table("STATIS_TABLE3.csv", header = TRUE, sep = ",", dec = ".", row.names = 1) DT4 <- read.table("STATIS_TABLE4.csv", header = TRUE, sep = ",", dec = ".", row.names = 1) ``` ### Principal Functions #### statis Applies the STATIS method to a set of matrices (data tables) **with the same rows**. STATIS is a multivariate analysis technique that allows studying the common structure and the evolution of individuals and variables across multiple tables. #### plot.statis.circle This function generates a correlation circle plot from two-dimensional coordinates, commonly used in principal component analysis (PCA) or other multivariate methods. #### plot.statis.plane This function generates a two-dimensional scatter plot with centered axes, useful for representing the results of multivariate analyses. ### Example 1: Article on Wine Evaluation by Experts ```{r} rows <- paste0("Wine", 1:6) expert1 <- data.frame( fruity = c(1, 5, 6, 7, 2, 3), woody = c(6, 3, 1, 1, 5, 4), coffee = c(7, 2, 1, 2, 4, 4), row.names = rows ) expert2 <- data.frame( red_fruit = c(2, 4, 5, 7, 3, 3), roasted = c(5, 4, 2, 2, 5, 5), vanillin = c(7, 4, 1, 1, 6, 4), woody = c(6, 2, 1, 2, 5, 5), row.names = rows ) expert3 <- data.frame( fruity = c(3, 4, 7, 2, 2, 1), butter = c(6, 4, 1, 2, 6, 7), woody = c(7, 3, 1, 2, 6, 5), row.names = rows ) labels <- c("Expert 1", "Expert 2", "Expert 3") ``` #### Apply statis without any selection and save the result ```{r} res <- statis(list(expert1, expert2, expert3), table.labels = labels) ``` ##### Plot Correlation Circle of all the tables ```{r, warning=FALSE} inter <- res$circle.inter inter.graph <- plot.statis.circle(inter$points, inertia = inter$inertia, labels = inter$labels, title = inter$title) inter.graph ``` **Graphic Interpretation** - Each expert (1, 2, and 3) appears as a point. - All three are located relatively close to each other, with an explained inertia of 99%, which indicates that their evaluations are highly consistent with one another, although Expert 3 is slightly different in their assessments. - This means that, despite having different sensory variables (fruity, woody, coffee, etc.), the experts agree on the general structure of their evaluations, with Expert 3 being somewhat different in their assessments. ##### Plot Correlation Circle of all variables evolution ```{r, warning=FALSE} intra <- res$circle.intra intra.graph <- plot.statis.circle(intra$points, inertia = intra$inertia, labels = intra$labels, title = intra$title) intra.graph ``` **Graphic Interpretation** All the variables from the three experts are projected here: for example, Expert 1: fruity, Expert 2: red_fruit, Expert 3: butter, etc. - The inertia (95.02%) shows that the first two dimensions summarize almost all of the variability. - Some variables form small angles, which implies a strong and positive correlation (e.g., fruity from Expert 1 and red_fruit from Expert 2), reflecting that they capture similar sensations. - Others appear opposite, forming large angles, which implies a strong and negative correlation (e.g., butter from Expert 3). If the angle is close to 90°, this suggests that certain attributes are more specific to one evaluator and are not correlated with those of the others. ##### Plot Principal Plane of Average Individuals ```{r, warning=FALSE} individuals <- res$plane.individuals ind.graph <- plot.statis.plane(individuals$points, inertia = individuals$inertia, labels = individuals$labels, title = individuals$title) ind.graph ``` **Graphic Interpretation** - The wines (Wine1…Wine6) are represented in the average space of the experts. - Groupings can be observed: for example, Wine1 and Wine5 are located close to each other and in the same quadrant, which implies similar sensory profiles. - In contrast, Wine3 and Wine4 are projected in the opposite area, indicating that they have distinct sensory characteristics. - The inertia (95%) ensures that the representation is accurate. ##### Plot Principal Plane of the Evolution of Individuals ```{r, warning=FALSE} evolution <- res$plane.evolution evol.graph <- plot.statis.plane(evolution$points, inertia = evolution$inertia, labels = evolution$labels, title = evolution$title) evol.graph ``` **Graphic Interpretation** Here we can see how each wine is evaluated separately by each expert (e.g., “Wine1·Expert1,” “Wine1·Expert2,” etc.). - The trajectories of each wine (three connected points) allow us to analyze the consistency among experts. - If the three points of a wine are very close to each other, it means that the experts perceive it similarly. - If they are dispersed, it shows disagreement among them. For example, Wine1 and Wine5 show greater coherence in evaluations among the experts. Wine3 and Wine4 show more dispersion, suggesting that they generate divergent perceptions depending on the evaluator. #### Apply statis with specific selections and save the result ##### Selecting tables 1 and 3 ```{r} res <- statis(list(expert1, expert2, expert3), selected.tables = c(1, 3), table.labels = labels) ``` ##### Plot Correlation Circle of all the tables ```{r, warning=FALSE} inter <- res$circle.inter inter.graph <- plot.statis.circle(inter$points, inertia = inter$inertia, labels = inter$labels, title = inter$title) inter.graph ``` ##### Plot Correlation Circle of all variables evolution ```{r, warning=FALSE} intra <- res$circle.intra intra.graph <- plot.statis.circle(intra$points, inertia = intra$inertia, labels = intra$labels, title = intra$title) intra.graph ``` ##### Selecting rows 1 and 5 ```{r} res <- statis(list(expert1, expert2, expert3), selected.rows = c(1, 5), table.labels = labels) ``` ##### Plot Principal Plane of Average Individuals ```{r, warning=FALSE} individuals <- res$plane.individuals ind.graph <- plot.statis.plane(individuals$points, inertia = individuals$inertia, labels = individuals$labels, title = individuals$title) ind.graph ``` ##### Plot Principal Plane of the Evolution of Individuals ```{r, warning=FALSE} evolution <- res$plane.evolution evol.graph <- plot.statis.plane(evolution$points, inertia = evolution$inertia, labels = evolution$labels, title = evolution$title) evol.graph ``` ##### Selecting rows 3 and 4 ```{r} res <- statis(list(expert1, expert2, expert3), selected.rows = c(3, 4), table.labels = labels) ``` ##### Plot Principal Plane of Average Individuals ```{r, warning=FALSE} individuals <- res$plane.individuals ind.graph <- plot.statis.plane(individuals$points, inertia = individuals$inertia, labels = individuals$labels, title = individuals$title) ind.graph ``` ##### Plot Principal Plane of the Evolution of Individuals ```{r, warning=FALSE} evolution <- res$plane.evolution evol.graph <- plot.statis.plane(evolution$points, inertia = evolution$inertia, labels = evolution$labels, title = evolution$title) evol.graph ``` ##### Selecting tables (1,3) and rows (1,4) ```{r} res <- statis(list(expert1, expert2, expert3), selected.tables = c(1, 3), selected.rows = c(1, 4), labels) ``` ##### Plot Correlation Circle of all the tables ```{r, warning=FALSE} inter <- res$circle.inter inter.graph <- plot.statis.circle(inter$points, inertia = inter$inertia, labels = inter$labels, title = inter$title) inter.graph ``` ##### Plot Correlation Circle of all variables evolution ```{r, warning=FALSE} intra <- res$circle.intra intra.graph <- plot.statis.circle(intra$points, inertia = intra$inertia, labels = intra$labels, title = intra$title) intra.graph ``` ##### Plot Principal Plane of Average Individuals ```{r, warning=FALSE} individuals <- res$plane.individuals ind.graph <- plot.statis.plane(individuals$points, inertia = individuals$inertia, labels = individuals$labels, title = individuals$title) ind.graph ``` ##### Plot Principal Plane of the Evolution of Individuals ```{r, warning=FALSE} evolution <- res$plane.evolution evol.graph <- plot.statis.plane(evolution$points, inertia = evolution$inertia, labels = evolution$labels, title = evolution$title) evol.graph ``` ### Example 2: Tarcoles River Basin of Costa Rica Project "Development and Application of Low-Cost Effective Methods for Biological Monitoring of Costa Rican Rivers" from the National University. With 4 measurements taken over time. #### Read csv files and load data ```{r} M1 <- read.table("STATIS_TABLE1.csv", header = TRUE, sep = ",", dec = ".", row.names = 1) M2 <- read.table("STATIS_TABLE2.csv", header = TRUE, sep = ",", dec = ".", row.names = 1) M3 <- read.table("STATIS_TABLE3.csv", header = TRUE, sep = ",", dec = ".", row.names = 1) M4 <- read.table("STATIS_TABLE4.csv", header = TRUE, sep = ",", dec = ".", row.names = 1) M1 M2 M3 M4 ``` #### Statis without selections ```{r} labels <- c("Measurement 1", "Measurement 2", "Measurement 3", "Measurement 4") res <- statis(list(M1, M2, M3, M4), table.labels = labels) ``` ##### Plot Correlation Circle of all the tables ```{r, warning=FALSE} inter <- res$circle.inter inter.graph <- plot.statis.circle(inter$points, inertia = inter$inertia, labels = inter$labels, title = inter$title) inter.graph ``` ##### Plot Correlation Circle of all variables evolution ```{r, warning=FALSE} intra <- res$circle.intra intra.graph <- plot.statis.circle(intra$points, inertia = intra$inertia, labels = intra$labels, title = intra$title) intra.graph ``` ##### Plot Principal Plane of Average Individuals ```{r, warning=FALSE} individuals <- res$plane.individuals ind.graph <- plot.statis.plane(individuals$points, inertia = individuals$inertia, labels = individuals$labels, title = individuals$title) ind.graph ``` ##### Plot Principal Plane of the Evolution of Individuals ```{r, warning=FALSE} evolution <- res$plane.evolution evol.graph <- plot.statis.plane(evolution$points, inertia = evolution$inertia, labels = evolution$labels, title = evolution$title) evol.graph ``` #### Selecting only table 2 ```{r} res <- statis(list(M1, M2, M3, M4), selected.tables = c(2), table.labels = labels) ``` ##### Plot Correlation Circle of all the tables ```{r, warning=FALSE} inter <- res$circle.inter inter.graph <- plot.statis.circle(inter$points, inertia = inter$inertia, labels = inter$labels, title = inter$title) inter.graph ``` ##### Plot Correlation Circle of all variables evolution ```{r, warning=FALSE} intra <- res$circle.intra intra.graph <- plot.statis.circle(intra$points, inertia = intra$inertia, labels = intra$labels, title = intra$title) intra.graph ``` #### Selecting only row 3 ```{r} res <- statis(list(M1, M2, M3, M4), selected.rows = c(3), table.labels = labels) ``` ##### Plot Principal Plane of Average Individuals ```{r, warning=FALSE} individuals <- res$plane.individuals ind.graph <- plot.statis.plane(individuals$points, inertia = individuals$inertia, labels = individuals$labels, title = individuals$title) ind.graph ``` ##### Plot Principal Plane of the Evolution of Individuals ```{r, warning=FALSE} evolution <- res$plane.evolution evol.graph <- plot.statis.plane(evolution$points, inertia = evolution$inertia, labels = evolution$labels, title = evolution$title) evol.graph ``` #### Selecting table 2 and row 3 ```{r} res <- statis(list(M1, M2, M3, M4), selected.tables = c(2), selected.rows = c(3), table.labels = labels) ``` ##### Plot Correlation Circle of all the tables ```{r, warning=FALSE} inter <- res$circle.inter inter.graph <- plot.statis.circle(inter$points, inertia = inter$inertia, labels = inter$labels, title = inter$title) inter.graph ``` ##### Plot Correlation Circle of all variables evolution ```{r, warning=FALSE} intra <- res$circle.intra intra.graph <- plot.statis.circle(intra$points, inertia = intra$inertia, labels = intra$labels, title = intra$title) intra.graph ``` ##### Plot Principal Plane of Average Individuals ```{r, warning=FALSE} individuals <- res$plane.individuals ind.graph <- plot.statis.plane(individuals$points, inertia = individuals$inertia, labels = individuals$labels, title = individuals$title) ind.graph ``` ##### Plot Principal Plane of the Evolution of Individuals ```{r, warning=FALSE} evolution <- res$plane.evolution evol.graph <- plot.statis.plane(evolution$points, inertia = evolution$inertia, labels = evolution$labels, title = evolution$title) evol.graph ``` ## STATIS-DUAL Method ### How to read a Table from a CSV file? ```{r} Tuis5_95 <- read.table("Tuis5_95.csv", header=TRUE, sep=';', dec=',') Tuis5_95 ``` ### Principal Functions #### statis.dual Implementation of the STATIS DUAL method for the joint analysis of multiple tables that share the same variables **(same columns)**. This approach allows evaluating the common structure between tables (interstructure), building a compromise (weighted average of structures), and analyzing the trajectories of variables across the tables. #### plot.statis.dual.circle This function generates a 2D scatter plot with support for multiple groups, labels, arrows from the origin, reference circles, cross axes, and full style customization using ggplot2. #### plot.statis.dual.trajectories Visualizes the evolution of one or more variables across the different tables in a STATIS DUAL analysis. Each trajectory represents the sequence of positions of a variable in the compromise space. #### select.super.variables This function selects a predefined subset of variables from a supervision matrix, checks dimension consistency, verifies missing variables, and constructs a clean data frame containing the first two coordinates typically used for PCA or STATIS DUAL correlation plots. ### Example 1: Sugarcane in Costa Rica #### Read csv files and apply STATIS DUAL ```{r} Tuis5_95 <- read.table("Tuis5_95.csv", header=TRUE, sep=';', dec=',') Tuis5_96 <- read.table("Tuis5_96.csv", header=TRUE, sep=';', dec=',') Tuis5_97 <- read.table("Tuis5_97.csv", header=TRUE, sep=';', dec=',') Tuis5_98 <- read.table("Tuis5_98.csv", header=TRUE, sep=';', dec=',') labels = c("95","96","97","98") res <- statis.dual(list(Tuis5_95, Tuis5_96, Tuis5_97, Tuis5_98), labels.tables = labels) ``` #### Use plot.statis.dual.circle to get the Interstructure graph ```{r} plot.statis.dual.circle(points.list = list(res$interstructure), labels = res$labels.tables) + ggplot2::ggtitle("Interstructure") ``` **Graphic Interpretation** Each point represents a data table (that is, a group of sugarcane plants with their chemical measurements). - The proximity between points indicates that those tables share a similar pattern of correlations among the chemical variables. - If a table appears farther away, it means its correlation profile is different, possibly because those plants were under different soil, water, or management conditions. This allows us to compare which sets of plants are more “similar” in their chemistry and which ones differ. #### Use plot.statis.dual.circle to get the Correlation Circle for all variables ```{r} plot.statis.dual.circle(list(res$supervariables), labels = row.names(res$supervariables)) + ggplot2::ggtitle("Correlation (all variables)") ``` **Graphic Interpretation** Here we can see how the 19 chemical variables are related. + Variables with arrows pointing in the same direction are positively correlated: they tend to increase together in the plants. + Variables with arrows pointing in opposite directions are negatively correlated: when one increases, the other decreases. + Those forming an angle close to 90° are almost independent. Typical example in sugarcane: - Ca, Mg, and HCO₃ often align (positive correlation) because they originate from calcareous soil conditions. - PO₄ and NO₃ (fertilization nutrients) may appear correlated. - DO and BOD reflect water quality, with specific relationships that can oppose (negative correlation) salts such as Cl or SO₄. #### Use plot.statis.dual.circle to get the Correlation Circle for the selected variables It´s important to mention here that you have to use **select.super.variables** function to save the selected variables in a data frame, and use this data frame as a parameter in the plot.statis.dual.circle function. ```{r} selected.variables <- c("Ph", "Temp", "DBO", "ST", "PO4", "NO3", "POD", "Cal") superv.sel.df <- select.super.variables(res$supervariables, res$vars.names, selected.variables) plot.statis.dual.circle(list(superv.sel.df), labels = row.names(superv.sel.df)) + ggplot2::ggtitle("Correlation (selected variables)") ``` **Graphic Interpretation** This subplot highlights only the key variables for interpreting sugarcane physiology. - pH and Temp tend to show a positive correlation: this suggests that water/soil temperature and pH are related under the cultivation conditions. - BOD and DO are associated with organic matter and oxygen consumption: they are positively correlated, indicating a strong link between microbial metabolism and nutrient dynamics. - TDS, PO₄, and NO₃ reflect the influence of fertilization and mineral load: if they cluster together (positive correlation), it suggests that the plants absorb these nutrients jointly. - Calcium (Ca) may oppose (negative correlation) mobile nutrients like NO₃, reflecting differences between more calcareous soils and more leached soils. #### Use plot.statis.dual.trajectories to get the trajectories graph for the selected variables ```{r} vars.A <- c("Ph","ST","NO3") plot.statis.dual.trajectories(vars = vars.A, trajectories = res$trajectories, labels.tables = res$labels.tables) + ggplot2::ggtitle(sprintf("Trajectories (%s)", paste(vars.A, collapse = ", "))) vars.B <- c("OD","DBO","PO4") plot.statis.dual.trajectories(vars = vars.B, trajectories = res$trajectories, labels.tables = res$labels.tables) + ggplot2::ggtitle(sprintf("Trajectories (%s)", paste(vars.B, collapse = ", "))) # If you want to select an specific variable vars.1 <- "Temp" plot.statis.dual.trajectories(vars = vars.1, trajectories = res$trajectories, labels.tables = res$labels.tables) + ggplot2::ggtitle(sprintf("Trajectory (%s)", vars.1)) ``` #### Use plot.statis.dual.trajectories to get the trajectories graph for all variables ```{r} plot.statis.dual.trajectories(vars = res$vars.names, trajectories = res$trajectories, labels.tables = res$labels.tables) ``` **Graphic Interpretation** Each trajectory shows how the position of a variable changes in the correlation space across the different plant tables. - A short and compact trajectory (e.g., Temp) indicates that this variable maintains stable relationships with the others across all plant groups. - A long trajectory or one with changes in direction (e.g., Ozone in the other example; here it could be NO₃ or PO₄) indicates that the role of that variable changes between tables: under some conditions it is strongly associated with productivity, while in others it is more related to water or soil quality. In sugarcane, this can reflect differences in fertilization, soil type, or irrigation water quality. ### Example 2: airquality (base R) ⇒ K = 5 = months New York air quality (1973) by day, with the variables Ozone, Solar.R, Wind, Temp (Temperature). Z: is separating by month (May–September ⇒ K = 5). The group sizes may differ from one month to another. #### Data ```{r} vars <- c("Ozone","Solar.R","Wind","Temp") AQ <- na.omit(airquality[, c(vars, "Month")]) Z <- split(AQ[ , vars], AQ$Month) # list(Z5, Z6, Z7, Z8, Z9) names(Z) <- paste0("M", names(Z)) # "M5","M6","M7","M8","M9" Z <- lapply(Z, as.matrix) ``` #### Apply STATIS DUAL ```{r} labels <- c("May","June","July","August","September") res <- statis.dual(Z, labels.tables = labels) ``` #### Interstructure graph by Month ```{r, warning=FALSE} interstructure <- list(res$interstructure) labels <- res$labels.tables plot.statis.dual.circle(points.list = interstructure, labels = labels) + ggplot2::ggtitle("Airquality (NY): Interstructure by Month") ``` **Graphic Interpretation** This plot shows the relationship between the tables (in this case, each month with its observations of Ozone, Solar.R, Wind, and Temp). - Months that appear close together in the circle share a similar correlation structure among the variables. - Months that are far apart or opposite reflect different patterns in the relationships between variables. For example, since July and August are close, it indicates that these months have similar correlations among Ozone, Solar Radiation, Wind, and Temperature. May appears isolated, meaning that the correlation pattern in May is different (possibly due to being a cooler month with lower solar radiation). #### Correlation Circle for all variables ```{r, warning=FALSE} plot.statis.dual.circle(list(res$supervariables), labels = row.names(res$supervariables)) + ggplot2::ggtitle("Airquality (NY): Correlation (all variables)") ``` **Graphic Interpretation** This plot allows us to see how the original variables (Ozone, Solar.R, Wind, Temp) correlate with the main axes of the compromise. - Arrows pointing in the same direction indicate positively correlated variables. - Opposite arrows (angle close to 180°) indicate negative correlation. - Arrows forming an angle of ~90° indicate that the variables are nearly independent. Typically in this dataset: - Temp and Solar.R tend to align, meaning a positive correlation (more radiation on warmer days). - Wind appears opposite (negative correlation) to Temp and Ozone: when wind is stronger, ozone levels and temperature tend to be lower. - Ozone usually correlates with solar radiation and temperature. #### Correlation Circle with Selected Variables ```{r, warning=FALSE} selected.variables <- c("Ozone","Wind","Temp") superv.sel.df <- select.super.variables(res$supervariables, res$vars.names, selected.variables) plot.statis.dual.circle(list(superv.sel.df), labels = row.names(superv.sel.df)) + ggplot2::ggtitle("Airquality (NY): Correlation (selected variables)") ``` **Graphic Interpretation** Here we focus only on these three variables to highlight the contrast: - Ozone and Temp are usually strongly positively correlated → ozone tends to increase on warmer days. - Wind points in the opposite direction (negative correlation) → confirming that wind reduces ozone accumulation in the air. This provides a clearer way to visualize the key relationship: heat and sun increase ozone, while wind disperses it. #### Variable Trajectories by Month ```{r, warning=FALSE} vars.A <- c("Ozone","Temp") p.tray.A <- plot.statis.dual.trajectories(vars = vars.A, trajectories = res$trajectories, labels.tables = labels) + ggplot2::ggtitle(sprintf("Airquality (NY): Trajectories (%s)", paste(vars.A, collapse = ", "))) p.tray.A vars.B <- c("Solar.R","Wind") p.tray.B <- plot.statis.dual.trajectories(vars = vars.B, trajectories = res$trajectories, labels.tables = labels) + ggplot2::ggtitle(sprintf("Airquality (NY): Trajectories (%s)", paste(vars.B, collapse = ", "))) p.tray.B vars.1 <- "Ozone" p.tray.1 <- plot.statis.dual.trajectories(vars = vars.1, trajectories = res$trajectories, labels.tables = labels) + ggplot2::ggtitle(sprintf("Airquality (NY): Trajectory (%s)", vars.1)) p.tray.1 ``` #### Trajectory of All Variables ```{r, warning=FALSE} p.tray.all <- plot.statis.dual.trajectories(vars = res$vars.names, trajectories = res$trajectories, labels.tables = res$labels.tables) p.tray.all ``` **Graphic Interpretation** Each trajectory plot shows how each variable moves (its correlations with the main axes) across the months from May to September. - A stable trajectory (points close together) indicates that the variable behaves consistently from month to month. - A long trajectory or one that changes direction shows that the variable’s relationship with the others changes from one month to another. Example: - Ozone: usually shows strong variation (more ozone in summer, less in May/September). The trajectory can elongate, indicating that ozone’s correlations with Temp and Solar.R change throughout the summer. - Temp: may have a more stable trajectory, because temperature follows a gradual seasonal pattern. - Wind: may show changing trajectories, reflecting that in some months wind plays a larger role in dispersing ozone.