--- title: "From Rolling Quarters to Monthly Estimates: SIDRA Mensalization Guide" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{From Rolling Quarters to Monthly Estimates: SIDRA Mensalization Guide} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE, purl = FALSE ) ``` ## Overview Brazil's Continuous National Household Sample Survey (PNADC) publishes labor market indicators as **rolling (moving) quarters** — 3-month moving averages where each published "quarter" shares 2 months with its neighbors. This smoothing hides short-term dynamics: turning points are delayed, seasonal patterns are distorted, and international comparison becomes difficult. The PNADCperiods package includes a SIDRA mensalization module that recovers **exact monthly estimates** from rolling quarter data. This vignette explains how to use it. ### Why Rolling Quarters Are Problematic Each published "quarter" is actually a 3-month moving average: - "2019-Q1" = average of Jan, Feb, Mar 2019 - "2019-Q2" = average of Feb, Mar, Apr 2019 - "2019-Q3" = average of Mar, Apr, May 2019 ![Rolling quarters overlap: each 'quarter' shares 2 months with its neighbors](figures/sidra-mensalization/fig1_rolling_schematic.png){width=100%} When unemployment jumps sharply in a single month, the rolling quarter spreads that spike across multiple overlapping periods. The mensalization algorithm inverts this averaging process to recover the true monthly values. --- ## Quick Start ```{r quickstart, eval=FALSE} library(PNADCperiods) # Step 1: Fetch rolling quarter data from SIDRA API rolling_quarters <- fetch_sidra_rolling_quarters() # Step 2: Convert to monthly estimates monthly <- mensalize_sidra_series(rolling_quarters) # Step 3: Use your monthly data! head(monthly[, .(anomesexato, m_popocup, m_taxadesocup)]) ``` That's it! You now have monthly estimates starting from January 2012. 1. **`fetch_sidra_rolling_quarters()`** downloaded 86+ economic indicators from IBGE's SIDRA API 2. **`mensalize_sidra_series()`** applied the mensalization formula using pre-computed starting points (bundled with the package) 3. The result is a `data.table` with one row per month and `m_*` columns for each mensalized series --- ## Understanding the Output The mensalized output contains: - `anomesexato`: Month identifier (YYYYMM format, e.g., 201903 = March 2019) - `m_*` columns: Mensalized (monthly) estimates for each series - Price indices: `ipca100dez1993`, `inpc100dez1993` (passed through for deflation) **Key series include:** | Column | Description | Unit | |--------|-------------|------| | `m_populacao` | Total population | Thousands | | `m_pop14mais` | Population 14+ years | Thousands | | `m_popocup` | Employed population | Thousands | | `m_popdesocup` | Unemployed population | Thousands | | `m_taxadesocup` | Unemployment rate | Percent | | `m_taxapartic` | Labor force participation rate | Percent | | `m_massahabnominaltodos` | Total nominal wage bill | Millions R$ | Rate series (like `m_taxadesocup`) are **derived** from mensalized level series when `compute_derived = TRUE` (the default). They are computed as ratios of the mensalized levels, not directly mensalized from the rolling quarter rates. ### Discovering Available Series Use `get_sidra_series_metadata()` to explore all 86+ available series: ```{r metadata, eval=FALSE} meta <- get_sidra_series_metadata() # View series organized by theme meta[, .N, by = .(theme, theme_category)] # Filter to specific theme categories meta[theme_category == "employment_type", .(series_name, description)] ``` The metadata uses a hierarchical taxonomy: `theme` (top level, e.g., "labor_market"), `theme_category` (e.g., "employment_type"), and optionally `subcategory` (e.g., "levels", "rates"). --- ## Data Flow The mensalization process follows a three-step pipeline: ![Data flow from SIDRA to monthly estimates](figures/sidra-mensalization/fig2_data_flow.png){width=100%} ### Step 1: Fetching Rolling Quarter Data `fetch_sidra_rolling_quarters()` downloads data from five SIDRA tables: | Table | Content | |-------|---------| | 4093 | Population and labor force | | 6390 | Income (nominal and real) | | 6392 | Real income by occupation | | 6399 | Employment by sector | | 6906 | Underutilization indicators | ```{r fetch-inspect, eval=FALSE} rq <- fetch_sidra_rolling_quarters(verbose = TRUE) # Inspect structure dim(rq) names(rq)[1:20] ``` Key columns: `anomesfinaltrimmovel` (end month of rolling quarter, YYYYMM), `mesnotrim` (month position 1/2/3), plus one column per series. ### Step 2: The Mensalization Transform ```{r mensalize-inspect, eval=FALSE} monthly <- mensalize_sidra_series(rq, verbose = TRUE) # Compare dimensions cat("Rolling quarters:", nrow(rq), "rows\n") cat("Monthly data:", nrow(monthly), "rows\n") ``` The row count is approximately the same (one per month), but the meaning changes from "rolling quarter ending in month X" to "exact estimate for month X". ### Step 3: Using Monthly Estimates
Show plotting code ```{r plot-unemployment, eval=FALSE} # --- VIGNETTE CODE: plot-unemployment --- library(ggplot2) monthly[, date := as.Date(paste0(substr(anomesexato, 1, 4), "-", substr(anomesexato, 5, 6), "-01"))] ggplot(monthly, aes(x = date, y = m_taxadesocup)) + geom_line(color = "#1976D2", linewidth = 0.8) + labs(title = "Monthly Unemployment Rate", x = NULL, y = "Unemployment Rate (%)") ```
### Population Data for Weighting For analyses requiring monthly population estimates separately: ```{r pop-data, eval=FALSE} pop <- fetch_monthly_population() head(pop) ``` Returns a data.table with `ref_month_yyyymm` and `m_populacao` columns. --- ## Working with Series ### Fetching by Theme Instead of fetching all 86+ series, filter by theme or theme category: ```{r by-theme, eval=FALSE} # Only employment type series employment <- fetch_sidra_rolling_quarters(theme_category = "employment_type") # Only wage mass series wages <- fetch_sidra_rolling_quarters(theme_category = "wage_mass") # Only labor market theme (includes participation, unemployment, employment types, etc.) labor <- fetch_sidra_rolling_quarters(theme = "labor_market") ``` ### Fetching Specific Series For maximum efficiency, request only the series you need: ```{r specific-series, eval=FALSE} # Only unemployment-related series unemp <- fetch_sidra_rolling_quarters( series = c("popdesocup", "taxadesocup", "popnaforca") ) ``` ### Excluding Derived Series Some series are rates computed from other series. To fetch only "base" series: ```{r exclude-derived, eval=FALSE} # Exclude computed rates (only population and income levels) base_only <- fetch_sidra_rolling_quarters(exclude_derived = TRUE) ``` ### Selecting Output Columns After mensalization, select columns as needed: ```{r select-columns, eval=FALSE} monthly <- mensalize_sidra_series(rq) # Select specific series labor_market <- monthly[, .( anomesexato, employed = m_popocup, unemployed = m_popdesocup, unemp_rate = m_taxadesocup, participation = m_taxapartic )] ``` --- ## The Mensalization Methodology *This section can be skipped by users who just need results.* ### The Core Concept Rolling quarters are 3-month moving averages. If we denote the true monthly value for month $t$ as $y_t$, then the rolling quarter value $x_t$ is: $$x_t = \frac{y_{t-2} + y_{t-1} + y_t}{3}$$ The mensalization algorithm inverts this relationship to recover $y_t$ from the sequence of $x_t$ values. ### The Mensalization Formula **Step 1: Compute first differences** $$d3_t = x_t - x_{t-1}$$ **Step 2: Identify month position (mesnotrim)** Each month has a position within its quarter: - Position 1: Jan, Apr, Jul, Oct - Position 2: Feb, May, Aug, Nov - Position 3: Mar, Jun, Sep, Dec **Step 3: Cumulative sum by position** For each position separately, compute the cumulative sum of first differences, starting from a calibrated "starting point" $y_0$: $$y_t = y_0 + \sum_{s \in \text{same position}, s \leq t} d3_s$$ ![Mensalization process: rolling quarters (blue) vs monthly estimates (red)](figures/sidra-mensalization/fig3_mensalization_process.png){width=100%} ### The Role of Starting Points ($y_0$) The starting point $y_0$ is crucial. It determines the **level** of all subsequent monthly estimates. The package includes pre-computed starting points for 53 series, calibrated during the stable 2013-2019 period. Starting points are computed by: 1. Processing PNADC microdata to get "true" monthly aggregates ($z$ values) 2. Comparing these to rolling quarters 3. Finding the $y_0$ that makes $y_0 + \text{cumsum}(d3)$ match the microdata ### Assumptions and Limitations - Monthly values within each position evolve smoothly - The calibration period (2013-2019) reflects "normal" conditions - Cannot recover intra-month variation - Starting points are calibrated to national totals (not regional breakdowns) --- ## Practical Considerations ### API Caching The package caches SIDRA API responses in memory during your R session: ```{r cache, eval=FALSE} # First call: fetches from API (~10 seconds) rq1 <- fetch_sidra_rolling_quarters() # Second call with use_cache = TRUE: uses cached data (instant) rq2 <- fetch_sidra_rolling_quarters(use_cache = TRUE) # Clear all cached data (force fresh fetch on next call) clear_sidra_cache() ``` The cache persists until you call `clear_sidra_cache()` or restart R. ### Common Errors | Error | Cause | Solution | |-------|-------|----------| | "Series not found" | Misspelled series name | Check `get_sidra_series_metadata()` | | "API timeout" | SIDRA server slow | Retry; use `use_cache = TRUE` | | "No starting points" | Custom series | See Custom Starting Points below | ```{r error-handling, eval=FALSE} # Check if series exists meta <- get_sidra_series_metadata() "taxadesocup" %in% meta$series_name # TRUE ``` ### Data Quality Notes **COVID-19 disruptions (2020):** IBGE suspended in-person interviews during the pandemic. Some indicators show unusual patterns in 2020-Q2. **CNPJ series availability:** Series based on CNPJ registration (empregadorcomcnpj, contapropriacomcnpj, etc.) are only available from October 2015, when V4019 was introduced. --- ## Custom Starting Points *For users with calibrated PNADC microdata.* Use the bundled starting points (default) unless: 1. **Your series isn't bundled** — Custom variable definitions 2. **Different calibration period** — Non-standard reference period 3. **Regional breakdown** — State or metro-area mensalization ### Option A: All-in-One Function ```{r custom-y0-allinone, eval=FALSE} # Load your stacked PNADC microdata (with pnadc_apply_periods weights) stacked <- readRDS("my_calibrated_pnadc.rds") # Compute starting points custom_y0 <- compute_starting_points_from_microdata( data = stacked, calibration_start = 201301L, calibration_end = 201912L, verbose = TRUE ) # Use custom starting points monthly <- mensalize_sidra_series(rq, starting_points = custom_y0) ``` ### Option B: Step-by-Step ```{r custom-y0-stepbystep, eval=FALSE} # Step 1: Build crosswalk and calibrate crosswalk <- pnadc_identify_periods(stacked) calibrated <- pnadc_apply_periods( stacked, crosswalk, weight_var = "V1028", anchor = "quarter", calibration_unit = "month" ) # Step 2: Compute z_ aggregates (monthly totals from microdata) z_agg <- compute_z_aggregates(calibrated) # Step 3: Fetch rolling quarters for comparison rq <- fetch_sidra_rolling_quarters() # Step 4: Compute starting points y0 <- compute_series_starting_points( monthly_estimates = z_agg, rolling_quarters = rq, calibration_start = 201301L, calibration_end = 201912L ) # Step 5: Use custom starting points result <- mensalize_sidra_series(rq, starting_points = y0) ``` CNPJ-based series automatically use a later calibration period (2016-2019) when `use_series_specific_periods = TRUE` (the default in `compute_series_starting_points()`). ### Validating Custom Starting Points ```{r validate-y0, eval=FALSE} bundled <- pnadc_series_starting_points # Merge and compare comp <- merge(custom_y0, bundled, by = c("series_name", "mesnotrim"), suffixes = c("_custom", "_bundled")) comp[, rel_diff := abs(y0_custom - y0_bundled) / abs(y0_bundled) * 100] comp[rel_diff > 1] # Flag series with >1% difference ``` --- ## Case Study: COVID-19 Unemployment How quickly did unemployment rise when COVID-19 hit Brazil? Rolling quarter data obscures these dynamics. Monthly estimates reveal the exact timing.
Show analysis code ```{r covid-analysis, eval=FALSE} # --- VIGNETTE CODE: covid-analysis --- # Fetch all series and mensalize rq <- fetch_sidra_rolling_quarters() monthly <- mensalize_sidra_series(rq) # Filter to COVID period covid_period <- monthly[anomesexato >= 201901 & anomesexato <= 202212] # Create date column covid_period[, date := as.Date(paste0( substr(anomesexato, 1, 4), "-", substr(anomesexato, 5, 6), "-01" ))] # Find peak peak_month <- covid_period[which.max(m_taxadesocup)] cat("Peak unemployment:", peak_month$m_taxadesocup, "% in", format(peak_month$date, "%B %Y"), "\n") ```
![Monthly vs rolling quarter unemployment rate (2019-2023)](figures/sidra-mensalization/fig4_monthly_vs_quarterly.png){width=100%} ![COVID-19 impact on Brazilian unemployment](figures/sidra-mensalization/fig5_covid_case_study.png){width=100%} **Key findings from monthly estimates:** 1. **Exact peak timing**: Monthly data pinpoints the peak month, while rolling quarters show only a gradual rise 2. **Speed of impact**: The monthly series reveals a sharp spike that rolling quarters smooth over 3+ months 3. **Recovery dynamics**: Monthly estimates show pauses and reversals in recovery that are hidden in quarterly averages --- ## Series Naming Conventions | Pattern | Meaning | Example | |---------|---------|---------| | `m_` | Mensalized monthly estimate | `m_popocup` | | `pop*` | Population count | `populacao`, `pop14mais` | | `*comcart` | With formal contract | `empregprivcomcart` | | `*semcart` | Without formal contract | `empregprivsemcart` | | `*comcnpj` | With CNPJ registration | `empregadorcomcnpj` | | `taxa*` | Rate (percent) | `taxadesocup` | | `nivel*` | Level/ratio (percent) | `nivelocup` | | `rend*` | Income (rendimento) | `rendhabnominaltodos` | | `massa*` | Wage bill (massa salarial) | `massahabnominaltodos` | | `*hab*` | Usually received (habitual) | `rendhabnominaltodos` | | `*efet*` | Actually received (efetivo) | `rendefetnominaltodos` | For the complete catalog, use `get_sidra_series_metadata()`: ```{r programmatic-access, eval=FALSE} meta <- get_sidra_series_metadata() # Filter by theme category meta[theme_category == "employment_type", .(series_name, description)] # Filter by theme and pattern meta[theme == "labor_market" & grepl("taxa|nivel", series_name), .(series_name, description)] ``` --- ## Function Reference | Function | Purpose | |----------|---------| | `fetch_sidra_rolling_quarters()` | Download rolling quarter data from SIDRA API | | `fetch_monthly_population()` | Get monthly population estimates | | `mensalize_sidra_series()` | Convert rolling quarters to monthly estimates | | `get_sidra_series_metadata()` | Explore available series and metadata | | `clear_sidra_cache()` | Clear cached API data | | `compute_z_aggregates()` | Compute monthly aggregates from calibrated microdata | | `compute_series_starting_points()` | Compute $y_0$ values from aggregates | | `compute_starting_points_from_microdata()` | All-in-one $y_0$ computation | **Bundled data:** `pnadc_series_starting_points` — pre-computed $y_0$ for 53 series x 3 month positions (calibration period: 2013-2019). --- ## References - HECKSHER, Marcos. "Valor Impreciso por Mes Exato: Microdados e Indicadores Mensais Baseados na Pnad Continua". IPEA - Nota Tecnica Disoc, n. 62. Brasilia, DF: IPEA, 2020. - HECKSHER, M. "Cinco meses de perdas de empregos e simulacao de um incentivo a contratacoes". IPEA - Nota Tecnica Disoc, n. 87. Brasilia, DF: IPEA, 2020. - HECKSHER, Marcos. "Mercado de trabalho: A queda da segunda quinzena de marco, aprofundada em abril". IPEA - Carta de Conjuntura, v. 47, p. 1-6, 2020. - Barbosa, Rogerio J; Hecksher, Marcos. (2026). PNADCperiods: Identify Reference Periods in Brazil's PNADC Survey Data. R package version v0.1.0. --- ## Further Reading - `vignette("getting-started")` — Setting up PNADC microdata analysis - `vignette("how-it-works")` — The period identification algorithm - `vignette("applied-examples")` — Applied research examples - **IBGE SIDRA API**: https://sidra.ibge.gov.br/ - **Package repository**: https://github.com/antrologos/PNADCperiods