Datasets and Basic Statistics for Symbolic Data Analysis
dataSDA collects a diverse range of symbolic data and
offers a comprehensive set of functions that facilitate the conversion
of traditional data into the symbolic data format. It supports reading,
writing, and conversion of symbolic data in diverse formats, as well as
computing descriptive statistics of symbolic variables.
# install.packages("devtools")
devtools::install_github("hanmingwu1103/dataSDA")Download the latest release from the Releases page, then:
install.packages("dataSDA_0.1.5.tar.gz", repos = NULL, type = "source")int_*)Compute mean, variance, covariance, and correlation for interval-valued data with 8 methods: CM, VM, QM, SE, FV, EJD, GQ, SPT.
library(dataSDA)
data(mushroom.int)
int_mean(mushroom.int, var_name = "Pileus.Cap.Width")
int_var(mushroom.int, var_name = c("Stipe.Length", "Stipe.Thickness"), method = c("CM", "FV", "EJD"))
int_cov(mushroom.int, var_name1 = "Pileus.Cap.Width",
var_name2 = c("Stipe.Length", "Stipe.Thickness"),
method = c("CM", "VM", "EJD", "GQ", "SPT"))
int_cor(mushroom.int, var_name1 = "Pileus.Cap.Width",
var_name2 = "Stipe.Length", method = "CM")hist_*)Compute mean, variance, covariance, and correlation for histogram-valued data with methods BG and L2W (cov/cor also support BD, B).
library(HistDAWass)
hist_mean(HistDAWass::BLOOD, var_name = "Cholesterol", method = "BG")
hist_var(HistDAWass::BLOOD, var_name = "Cholesterol", method = "L2W")
hist_cov(HistDAWass::BLOOD, var_name1 = "Cholesterol",
var_name2 = "Hemoglobin", method = "BD")
hist_cor(HistDAWass::BLOOD, var_name1 = "Cholesterol",
var_name2 = "Hemoglobin", method = "BG")| Function | Description |
|---|---|
RSDA_to_MM |
Convert RSDA / symbolic_tbl to MM (min-max) format |
MM_to_iGAP |
Convert MM format to iGAP format |
iGAP_to_MM |
Convert iGAP format to MM format |
RSDA_to_iGAP |
Convert RSDA format to iGAP format |
SODAS_to_MM |
Convert SODAS format to MM format |
SODAS_to_iGAP |
Convert SODAS format to iGAP format |
RSDA_format |
Convert conventional data to RSDA format |
set_variable_format |
One-hot encode set variables for RSDA format |
| Function | Description |
|---|---|
clean_colnames |
Clean column names of a data frame |
write_csv_table |
Write data to CSV file |
The package includes 32 built-in datasets for symbolic data analysis:
symbolic_tbl class)Abalone, Cars.int,
ChinaTemp.int, age_cholesterol_weight.int,
baseball.int, bird.int,
blood_pressure.int, car.int,
finance.int, hierarchy.int,
horses.int, lackinfo.int,
LoansbyPurpose.int, mushroom.int,
nycflights.int, ohtemp.int,
profession.int, soccer.bivar.int,
veterinary.int
Abalone.iGAP, Face.iGAP,
airline_flights, airline_flights2,
crime, crime2, fuel_consumption,
health_insurance, health_insurance2,
hierarchy, mushroom, occupations,
occupations2
GPL (>= 2)