Introduction to vivaglint

Overview

vivaglint is an R package built for HR analysts who work with Microsoft Viva Glint survey exports. It handles the repetitive data wrangling that Glint’s native UI doesn’t support — multi-cycle trend analysis, manager roll-ups, demographic segmentation, attrition risk scoring, and comment search — so you can spend more time interpreting results and less time reshaping data.

Key Capabilities

Data Import & Validation: Reads Glint CSV exports and validates the structure automatically
Summary Statistics: Mean, SD, Glint Score (0–100), response counts, favorability percentages
Multi-Cycle Trending: Compare engagement scores across survey waves
Manager Roll-Ups: Aggregate results by direct or full reporting tree
Demographic Segmentation: Slice results by any employee attribute (department, gender, tenure, etc.)
Attrition Risk Analysis: Link survey scores to actual turnover outcomes
Comment Search: Full-text search across all comment columns at once

Getting Started

Installation

devtools::install_github("microsoft/vivaglint")
library(vivaglint)

Basic Workflow

1. Import Your Glint Export

Export your survey from Viva Glint as a CSV and load it with read_glint_survey():

survey_path <- system.file("extdata", "survey_export.csv", package = "vivaglint")
survey <- read_glint_survey(survey_path, emp_id_col = "EMP ID")

If you prefer to pull data directly from the Viva Glint API, configure credentials once and use read_glint_survey_api():

glint_setup(
  tenant_id = "your-tenant-id",
  client_id = "your-client-id",
  client_secret = "your-client-secret",
  experience_name = "your-experience-name"
)
survey <- read_glint_survey_api(
  survey_uuid = "your-survey-uuid",
  cycle_id = "your-cycle-id",
  emp_id_col = "EMP ID"
)

This returns a glint_survey object containing: - data: The full survey response table - metadata: Question names, column mappings, and respondent counts

2. Get a Question-Level Summary

Calculate metrics across all survey questions in one call:

summary <- summarize_survey(survey, scale_points = 5)

Each row in the output represents one question and includes: - mean — Average response on the raw scale - sd — Standard deviation - glint_score — Score transformed to 0–100, matching what appears in the Viva Glint UI - n_responses — Number of employees who answered the question - n_skips — Number of employees who skipped the question - n_total — Total respondents - pct_favorable — Percentage of favorable responses - pct_neutral — Percentage of neutral responses - pct_unfavorable — Percentage of unfavorable responses

About Glint Score: The Glint Score is calculated as round(((mean - 1) / (scale_points - 1)) * 100), placing every question on a common 0–100 scale regardless of the original scale format.

About Favorability: The package uses Viva Glint’s standard favorability thresholds for each scale type. On a 5-point scale, for example, 4–5 is favorable, 3 is neutral, and 1–2 is unfavorable.

3. Focus on Specific Questions

If you’re presenting to a leadership team or investigating a particular theme, filter to just the questions you care about:

# Engagement-specific questions
engagement_qs <- c(
  "I would recommend my team as a great place to work",
  "My work is meaningful"
)
engagement_summary <- summarize_survey(survey,
                                       scale_points = 5,
                                       questions = engagement_qs)

4. Explore Response Distributions

See exactly how many employees chose each response value — useful when a mean alone doesn’t tell the full story:

distributions <- get_response_dist(survey, scale_points = 5)

The output adds columns like count_1, count_2, pct_1, pct_2, etc. for each response value.

Multi-Cycle Trend Analysis

One of the most common HR reporting tasks is tracking whether engagement improved since the last survey. compare_cycles() takes multiple survey objects and aligns them by question:

survey_path <- system.file("extdata", "survey_export.csv", package = "vivaglint")
survey_q1 <- read_glint_survey(survey_path, emp_id_col = "EMP ID")
survey_q2 <- read_glint_survey(survey_path, emp_id_col = "EMP ID")
survey_q3 <- read_glint_survey(survey_path, emp_id_col = "EMP ID")

trends <- compare_cycles(
  survey_q1, survey_q2, survey_q3,
  scale_points = 5,
  cycle_names = c("Q1 FY25", "Q2 FY25", "Q3 FY25")
)

The output includes all metrics from summarize_survey() for each cycle, plus: - change_from_previous — Point change in mean score vs. prior cycle - pct_change_from_previous — Percentage change vs. prior cycle

This is the foundation for executive trend slides: which items are improving, holding steady, or declining quarter over quarter.

Manager-Level Analysis

HR business partners often need to understand which managers’ teams are scoring below average, or identify pockets of high engagement to learn from.

Roll Up to Manager Level

# Direct reports only
manager_summary <- aggregate_by_manager(survey, scale_points = 5)

# Full org tree (includes indirect reports)
manager_full <- aggregate_by_manager(survey, scale_points = 5, full_tree = TRUE)

Each row represents one manager × question combination and includes: - manager_id, manager_name - team_size - All standard metrics: mean, sd, glint_score, n_responses, n_skips, n_total, pct_favorable, pct_neutral, pct_unfavorable

You can filter the result to identify managers with low favorability on a specific question:

library(dplyr)

# Managers where fewer than 50% of their team is favorable on a key item
low_engagement_managers <- manager_summary %>%
  filter(question == "I would recommend my team as a great place to work",
         pct_favorable < 50) %>%
  arrange(pct_favorable)

Demographic Analysis

Segment by Employee Attributes

analyze_by_attributes() lets you break survey results down by any combination of employee attributes — department, gender, tenure group, location, job level, etc.

attr_path <- system.file("extdata", "employee_attributes.csv", package = "vivaglint")
demo_results <- analyze_by_attributes(
  survey,
  attribute_file = attr_path,
  scale_points = 5,
  attribute_cols = c("Department", "Gender", "Tenure Group"),
  min_group_size = 10  # Suppress groups below this size for privacy
)

This is useful for identifying which employee populations have systematically lower scores and on which questions.

Pre-Joining Attributes for Multiple Analyses

If you plan to run several analyses against the same attribute file, join it once and reuse the enriched survey object:

# Join once
attr_path <- system.file("extdata", "employee_attributes.csv", package = "vivaglint")
survey_enriched <- join_attributes(survey, attr_path)

# Reuse for multiple analyses — no need to re-read the file
dept_results <- analyze_by_attributes(
  survey_enriched,
  scale_points = 5,
  attribute_cols = "Department"
)

gender_results <- analyze_by_attributes(
  survey_enriched,
  scale_points = 5,
  attribute_cols = "Gender"
)

# Filter to a subpopulation before analyzing
na_only <- survey_enriched
na_only$data <- filter(survey_enriched$data, Region == "North America")
na_results <- analyze_by_attributes(na_only, scale_points = 5,
                                    attribute_cols = "Department")

Attributes are stored in survey$metadata$attribute_cols after joining, so they are excluded from question detection in downstream analyses.

Attrition Risk Analysis

Linking engagement scores to actual turnover is a high-value analysis for HR leaders. analyze_attrition() computes attrition rates by favorability group — showing whether employees who scored low on engagement were more likely to leave.

# Basic attrition analysis (90, 180, and 365 days post-survey)
attrition_path <- system.file("extdata", "attrition.csv", package = "vivaglint")
attrition <- analyze_attrition(
  survey,
  attrition_file = attrition_path,
  emp_id_col = "EMP ID",
  term_date_col = "Termination Date",
  scale_points = 5
)

The output shows, for each question and time period, the attrition rates for favorable vs. unfavorable responders along with a risk ratio — making it straightforward to identify which survey items are the strongest leading indicators of turnover.

Segment Attrition by Demographics

Combine attrition analysis with employee attributes to answer questions like “Are unfavorable responders in Engineering leaving at higher rates than those in Sales?”:

attr_path <- system.file("extdata", "employee_attributes.csv", package = "vivaglint")
attrition_path <- system.file("extdata", "attrition.csv", package = "vivaglint")
survey_enriched <- join_attributes(survey, attr_path)

attrition_by_dept <- analyze_attrition(
  survey_enriched,
  attrition_file = attrition_path,
  emp_id_col = "EMP ID",
  term_date_col = "Termination Date",
  scale_points = 5,
  attribute_cols = c("Department", "Job Level"),
  min_group_size = 10
)

Correlation and Factor Analysis

Understand Which Items Move Together

Correlation analysis is useful for identifying clusters of related questions — often as a first step before building a composite score or validating that a set of items measures a single construct:

# Pearson correlations in long format (default)
correlations <- get_correlations(survey)

# Spearman correlations (more robust for ordinal scale data)
correlations_spearman <- get_correlations(survey, method = "spearman")

# Correlation matrix
cor_matrix <- get_correlations(survey, format = "matrix")

Supported methods: "pearson" (default), "spearman", "kendall"

Factor Analysis

Factor analysis identifies the latent constructs underlying a set of survey items. This can validate whether your “manager effectiveness” items truly cluster together, or reveal unexpected groupings:

# Requires the psych package
factors <- extract_survey_factors(survey, n_factors = 3, rotation = "oblimin")

# Consolidated summary: item, factor assignment, loading, label, communality
print(factors$factor_summary)

# Filter to items with strong factor loadings only
strong_loaders <- dplyr::filter(factors$factor_summary, loading_label == "Strong")

# Access the raw psych object for advanced use
factors$fa_object

Loading labels: Strong (≥ 0.75), Medium (0.60–0.74), Weak (< 0.60)

Working with Comments

Full-Text Comment Search

Search across all comment columns at once — useful for surfacing themes around specific topics like “flexibility”, “burnout”, or a manager’s name:

# Fuzzy search (default) — tolerates minor spelling differences
flexibility_comments <- search_comments(survey, "flexibility")

# Exact, case-sensitive match
exact_results <- search_comments(survey, "work from home", exact = TRUE)

# Broaden fuzzy tolerance to catch more spelling variation
results <- search_comments(survey, "colaboration", max_distance = 0.3)

Each result row includes: - question — Which question the comment was attached to - response — The numeric score the employee gave - comment — The comment text - topics — Topic tags assigned by Glint

Convert to Long Format for NLP

To route comments into a text analysis or NLP pipeline, reshape the survey to long format and filter to rows with comments:

# All responses in long format
long_all <- pivot_long(survey, data_type = "all")

# Comments only
long_comments <- pivot_long(survey, data_type = "comments")

# Both as separate tibbles
both <- pivot_long(survey, data_type = "both")
comments_df <- both$comments

Separating Quantitative and Qualitative Data

For workflows that route numeric scores and open-text comments to different pipelines (e.g., numeric data to a statistical model, comments to an LLM), use split_survey_data():

parts <- split_survey_data(survey)

# Numeric scores only — standard respondent columns + one score column per question
quantitative <- parts$quantitative

# Comments only — EMP ID + all _COMMENT, _COMMENT_TOPICS, _SENSITIVE_COMMENT_FLAG columns
qualitative <- parts$qualitative

# Pass numeric data directly to vivaglint functions
summary <- summarize_survey(parts$quantitative, scale_points = 5,
                            emp_id_col = "EMP ID")

# Rejoin at any time using EMP ID
full_data <- dplyr::left_join(parts$quantitative, parts$qualitative, by = "EMP ID")

Privacy and Data Handling

Minimum Group Sizes

Use min_group_size wherever it’s available to suppress results for groups that are too small to protect individual anonymity:

# Default is 5; consider 10 or higher for sensitive analyses
attr_path <- system.file("extdata", "employee_attributes.csv", package = "vivaglint")
demo_results <- analyze_by_attributes(
  survey,
  attribute_file = attr_path,
  scale_points = 5,
  attribute_cols = c("Department", "Gender"),
  min_group_size = 10
)

Local Processing

This package processes all data locally within your R environment. No employee data is transmitted to any external service, including Microsoft. Always follow your organization’s data handling and privacy policies when working with employee survey data.

Additional Resources

Function Documentation: ?read_glint_survey, ?summarize_survey, ?analyze_by_attributes, etc.
GitHub: https://github.com/microsoft/vivaglint
Issues: https://github.com/microsoft/vivaglint/issues