--- title: "Introduction to vivaglint" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to vivaglint} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Overview **vivaglint** is an R package built for HR analysts who work with Microsoft Viva Glint survey exports. It handles the repetitive data wrangling that Glint's native UI doesn't support — multi-cycle trend analysis, manager roll-ups, demographic segmentation, attrition risk scoring, and comment search — so you can spend more time interpreting results and less time reshaping data. ## Key Capabilities - **Data Import & Validation**: Reads Glint CSV exports and validates the structure automatically - **Summary Statistics**: Mean, SD, Glint Score (0–100), response counts, favorability percentages - **Multi-Cycle Trending**: Compare engagement scores across survey waves - **Manager Roll-Ups**: Aggregate results by direct or full reporting tree - **Demographic Segmentation**: Slice results by any employee attribute (department, gender, tenure, etc.) - **Attrition Risk Analysis**: Link survey scores to actual turnover outcomes - **Comment Search**: Full-text search across all comment columns at once ## Getting Started ### Installation ```{r eval=FALSE} devtools::install_github("microsoft/vivaglint") library(vivaglint) ``` ### Basic Workflow #### 1. Import Your Glint Export Export your survey from Viva Glint as a CSV and load it with `read_glint_survey()`: ```{r eval=FALSE} survey_path <- system.file("extdata", "survey_export.csv", package = "vivaglint") survey <- read_glint_survey(survey_path, emp_id_col = "EMP ID") ``` If you prefer to pull data directly from the Viva Glint API, configure credentials once and use `read_glint_survey_api()`: ```{r eval=FALSE} glint_setup( tenant_id = "your-tenant-id", client_id = "your-client-id", client_secret = "your-client-secret", experience_name = "your-experience-name" ) survey <- read_glint_survey_api( survey_uuid = "your-survey-uuid", cycle_id = "your-cycle-id", emp_id_col = "EMP ID" ) ``` This returns a `glint_survey` object containing: - **data**: The full survey response table - **metadata**: Question names, column mappings, and respondent counts #### 2. Get a Question-Level Summary Calculate metrics across all survey questions in one call: ```{r eval=FALSE} summary <- summarize_survey(survey, scale_points = 5) ``` Each row in the output represents one question and includes: - `mean` — Average response on the raw scale - `sd` — Standard deviation - `glint_score` — Score transformed to 0–100, matching what appears in the Viva Glint UI - `n_responses` — Number of employees who answered the question - `n_skips` — Number of employees who skipped the question - `n_total` — Total respondents - `pct_favorable` — Percentage of favorable responses - `pct_neutral` — Percentage of neutral responses - `pct_unfavorable` — Percentage of unfavorable responses **About Glint Score**: The Glint Score is calculated as `round(((mean - 1) / (scale_points - 1)) * 100)`, placing every question on a common 0–100 scale regardless of the original scale format. **About Favorability**: The package uses Viva Glint's standard favorability thresholds for each scale type. On a 5-point scale, for example, 4–5 is favorable, 3 is neutral, and 1–2 is unfavorable. #### 3. Focus on Specific Questions If you're presenting to a leadership team or investigating a particular theme, filter to just the questions you care about: ```{r eval=FALSE} # Engagement-specific questions engagement_qs <- c( "I would recommend my team as a great place to work", "My work is meaningful" ) engagement_summary <- summarize_survey(survey, scale_points = 5, questions = engagement_qs) ``` #### 4. Explore Response Distributions See exactly how many employees chose each response value — useful when a mean alone doesn't tell the full story: ```{r eval=FALSE} distributions <- get_response_dist(survey, scale_points = 5) ``` The output adds columns like `count_1`, `count_2`, `pct_1`, `pct_2`, etc. for each response value. --- ## Multi-Cycle Trend Analysis One of the most common HR reporting tasks is tracking whether engagement improved since the last survey. `compare_cycles()` takes multiple survey objects and aligns them by question: ```{r eval=FALSE} survey_path <- system.file("extdata", "survey_export.csv", package = "vivaglint") survey_q1 <- read_glint_survey(survey_path, emp_id_col = "EMP ID") survey_q2 <- read_glint_survey(survey_path, emp_id_col = "EMP ID") survey_q3 <- read_glint_survey(survey_path, emp_id_col = "EMP ID") trends <- compare_cycles( survey_q1, survey_q2, survey_q3, scale_points = 5, cycle_names = c("Q1 FY25", "Q2 FY25", "Q3 FY25") ) ``` The output includes all metrics from `summarize_survey()` for each cycle, plus: - `change_from_previous` — Point change in mean score vs. prior cycle - `pct_change_from_previous` — Percentage change vs. prior cycle This is the foundation for executive trend slides: which items are improving, holding steady, or declining quarter over quarter. --- ## Manager-Level Analysis HR business partners often need to understand which managers' teams are scoring below average, or identify pockets of high engagement to learn from. ### Roll Up to Manager Level ```{r eval=FALSE} # Direct reports only manager_summary <- aggregate_by_manager(survey, scale_points = 5) # Full org tree (includes indirect reports) manager_full <- aggregate_by_manager(survey, scale_points = 5, full_tree = TRUE) ``` Each row represents one manager × question combination and includes: - `manager_id`, `manager_name` - `team_size` - All standard metrics: `mean`, `sd`, `glint_score`, `n_responses`, `n_skips`, `n_total`, `pct_favorable`, `pct_neutral`, `pct_unfavorable` You can filter the result to identify managers with low favorability on a specific question: ```{r eval=FALSE} library(dplyr) # Managers where fewer than 50% of their team is favorable on a key item low_engagement_managers <- manager_summary %>% filter(question == "I would recommend my team as a great place to work", pct_favorable < 50) %>% arrange(pct_favorable) ``` --- ## Demographic Analysis ### Segment by Employee Attributes `analyze_by_attributes()` lets you break survey results down by any combination of employee attributes — department, gender, tenure group, location, job level, etc. ```{r eval=FALSE} attr_path <- system.file("extdata", "employee_attributes.csv", package = "vivaglint") demo_results <- analyze_by_attributes( survey, attribute_file = attr_path, scale_points = 5, attribute_cols = c("Department", "Gender", "Tenure Group"), min_group_size = 10 # Suppress groups below this size for privacy ) ``` This is useful for identifying which employee populations have systematically lower scores and on which questions. ### Pre-Joining Attributes for Multiple Analyses If you plan to run several analyses against the same attribute file, join it once and reuse the enriched survey object: ```{r eval=FALSE} # Join once attr_path <- system.file("extdata", "employee_attributes.csv", package = "vivaglint") survey_enriched <- join_attributes(survey, attr_path) # Reuse for multiple analyses — no need to re-read the file dept_results <- analyze_by_attributes( survey_enriched, scale_points = 5, attribute_cols = "Department" ) gender_results <- analyze_by_attributes( survey_enriched, scale_points = 5, attribute_cols = "Gender" ) # Filter to a subpopulation before analyzing na_only <- survey_enriched na_only$data <- filter(survey_enriched$data, Region == "North America") na_results <- analyze_by_attributes(na_only, scale_points = 5, attribute_cols = "Department") ``` Attributes are stored in `survey$metadata$attribute_cols` after joining, so they are excluded from question detection in downstream analyses. --- ## Attrition Risk Analysis Linking engagement scores to actual turnover is a high-value analysis for HR leaders. `analyze_attrition()` computes attrition rates by favorability group — showing whether employees who scored low on engagement were more likely to leave. ```{r eval=FALSE} # Basic attrition analysis (90, 180, and 365 days post-survey) attrition_path <- system.file("extdata", "attrition.csv", package = "vivaglint") attrition <- analyze_attrition( survey, attrition_file = attrition_path, emp_id_col = "EMP ID", term_date_col = "Termination Date", scale_points = 5 ) ``` The output shows, for each question and time period, the attrition rates for favorable vs. unfavorable responders along with a risk ratio — making it straightforward to identify which survey items are the strongest leading indicators of turnover. ### Segment Attrition by Demographics Combine attrition analysis with employee attributes to answer questions like "Are unfavorable responders in Engineering leaving at higher rates than those in Sales?": ```{r eval=FALSE} attr_path <- system.file("extdata", "employee_attributes.csv", package = "vivaglint") attrition_path <- system.file("extdata", "attrition.csv", package = "vivaglint") survey_enriched <- join_attributes(survey, attr_path) attrition_by_dept <- analyze_attrition( survey_enriched, attrition_file = attrition_path, emp_id_col = "EMP ID", term_date_col = "Termination Date", scale_points = 5, attribute_cols = c("Department", "Job Level"), min_group_size = 10 ) ``` --- ## Correlation and Factor Analysis ### Understand Which Items Move Together Correlation analysis is useful for identifying clusters of related questions — often as a first step before building a composite score or validating that a set of items measures a single construct: ```{r eval=FALSE} # Pearson correlations in long format (default) correlations <- get_correlations(survey) # Spearman correlations (more robust for ordinal scale data) correlations_spearman <- get_correlations(survey, method = "spearman") # Correlation matrix cor_matrix <- get_correlations(survey, format = "matrix") ``` Supported methods: `"pearson"` (default), `"spearman"`, `"kendall"` ### Factor Analysis Factor analysis identifies the latent constructs underlying a set of survey items. This can validate whether your "manager effectiveness" items truly cluster together, or reveal unexpected groupings: ```{r eval=FALSE} # Requires the psych package factors <- extract_survey_factors(survey, n_factors = 3, rotation = "oblimin") # Consolidated summary: item, factor assignment, loading, label, communality print(factors$factor_summary) # Filter to items with strong factor loadings only strong_loaders <- dplyr::filter(factors$factor_summary, loading_label == "Strong") # Access the raw psych object for advanced use factors$fa_object ``` Loading labels: **Strong** (≥ 0.75), **Medium** (0.60–0.74), **Weak** (< 0.60) --- ## Working with Comments ### Full-Text Comment Search Search across all comment columns at once — useful for surfacing themes around specific topics like "flexibility", "burnout", or a manager's name: ```{r eval=FALSE} # Fuzzy search (default) — tolerates minor spelling differences flexibility_comments <- search_comments(survey, "flexibility") # Exact, case-sensitive match exact_results <- search_comments(survey, "work from home", exact = TRUE) # Broaden fuzzy tolerance to catch more spelling variation results <- search_comments(survey, "colaboration", max_distance = 0.3) ``` Each result row includes: - `question` — Which question the comment was attached to - `response` — The numeric score the employee gave - `comment` — The comment text - `topics` — Topic tags assigned by Glint ### Convert to Long Format for NLP To route comments into a text analysis or NLP pipeline, reshape the survey to long format and filter to rows with comments: ```{r eval=FALSE} # All responses in long format long_all <- pivot_long(survey, data_type = "all") # Comments only long_comments <- pivot_long(survey, data_type = "comments") # Both as separate tibbles both <- pivot_long(survey, data_type = "both") comments_df <- both$comments ``` --- ## Separating Quantitative and Qualitative Data For workflows that route numeric scores and open-text comments to different pipelines (e.g., numeric data to a statistical model, comments to an LLM), use `split_survey_data()`: ```{r eval=FALSE} parts <- split_survey_data(survey) # Numeric scores only — standard respondent columns + one score column per question quantitative <- parts$quantitative # Comments only — EMP ID + all _COMMENT, _COMMENT_TOPICS, _SENSITIVE_COMMENT_FLAG columns qualitative <- parts$qualitative # Pass numeric data directly to vivaglint functions summary <- summarize_survey(parts$quantitative, scale_points = 5, emp_id_col = "EMP ID") # Rejoin at any time using EMP ID full_data <- dplyr::left_join(parts$quantitative, parts$qualitative, by = "EMP ID") ``` --- ## Privacy and Data Handling ### Minimum Group Sizes Use `min_group_size` wherever it's available to suppress results for groups that are too small to protect individual anonymity: ```{r eval=FALSE} # Default is 5; consider 10 or higher for sensitive analyses attr_path <- system.file("extdata", "employee_attributes.csv", package = "vivaglint") demo_results <- analyze_by_attributes( survey, attribute_file = attr_path, scale_points = 5, attribute_cols = c("Department", "Gender"), min_group_size = 10 ) ``` ### Local Processing This package processes all data locally within your R environment. No employee data is transmitted to any external service, including Microsoft. Always follow your organization's data handling and privacy policies when working with employee survey data. --- ## Additional Resources - **Function Documentation**: `?read_glint_survey`, `?summarize_survey`, `?analyze_by_attributes`, etc. - **GitHub**: https://github.com/microsoft/vivaglint - **Issues**: https://github.com/microsoft/vivaglint/issues