--- title: "Monthly Poverty Analysis with Annual PNADC Data" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Monthly Poverty Analysis with Annual PNADC Data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( eval = FALSE, echo = TRUE, collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, fig.width = 10, fig.height = 6, purl = FALSE ) ``` ## Introduction When exactly did poverty spike during COVID-19? Official annual statistics tell us that 2020 was a difficult year---but they can't tell us whether the crisis peaked in April or June, whether the Auxilio Emergencial reduced poverty immediately or with a delay, or what the month-by-month recovery path looked like. **Monthly data can.** This vignette combines the mensalization algorithm with **annual PNADC data** to produce monthly poverty statistics. The annual PNADC releases contain comprehensive household income measures (`VD5008`) that aren't available in the quarterly releases. By applying a mensalization crosswalk (built from quarterly data) to annual income data, we get monthly temporal precision with comprehensive income measurement. ![Data Workflow: Monthly Poverty Analysis with Annual PNADC](figures/annual-poverty-analysis/fig-workflow-diagram.png){width=100%} IBGE's PNADC uses a rotating panel design where the same households appear in both quarterly and annual data. The workflow is: 1. Build a **crosswalk** from quarterly data using `pnadc_identify_periods()` 2. **Apply** the crosswalk to annual income data with `pnadc_apply_periods()` (which handles the merge internally and calibrates weights) 3. Analyze detailed income and poverty measures at monthly frequency --- ## Prerequisites ```{r prerequisites} library(PNADCperiods) library(data.table) library(fst) library(readxl) # Read IBGE deflator Excel files library(deflateBR) # INPC deflator library(ggplot2) library(scales) ``` You also need: - **Quarterly PNADC data** (2015-2024) in `.fst` format for creating the mensalization crosswalk - **Annual PNADC data** (2015-2024) in `.fst` format with income supplement variables - **Deflator file** from IBGE documentation (`deflator_pnadc_2024.xls`) --- ## Complete Workflow ### Step 1: Create Mensalization Crosswalk Load stacked quarterly PNADC data and run the mensalization algorithm. See the [Download and Prepare Data](download-and-prepare.html) vignette for details on obtaining and formatting PNADC microdata. ```{r create-crosswalk} # Define paths pnad_quarterly_dir <- "path/to/quarterly/data" # List quarterly files (2015-2024) quarterly_files <- list.files( path = pnad_quarterly_dir, pattern = "pnadc_20(1[5-9]|2[0-4])-[1-4]q\\.fst$", full.names = TRUE ) # Variables needed for mensalization quarterly_vars <- c( "Ano", "Trimestre", "UPA", "V1008", "V1014", "V2008", "V20081", "V20082", "V2009", "V1028", "UF", "posest", "posest_sxi", "Estrato" ) # Load and stack quarterly data quarterly_data <- rbindlist( lapply(quarterly_files, function(f) { read_fst(f, as.data.table = TRUE, columns = quarterly_vars) }), fill = TRUE ) # Build the crosswalk (identifies reference periods) crosswalk <- pnadc_identify_periods( quarterly_data, verbose = TRUE ) # Check determination rate (expect ~96% with 40 quarters of data) crosswalk[, mean(determined_month, na.rm = TRUE)] ``` ### Step 2: Load Annual PNADC Data Annual PNADC files follow a specific naming convention. Note that 2020-2021 use visit 5 (due to COVID-related field disruptions), while other years use visit 1: ```{r load-annual-data} pnad_annual_dir <- "path/to/annual/data" # Define which visit to use for each year visit_selection <- data.table( ano = 2015:2024, visita = c(1, 1, 1, 1, 1, 5, 5, 1, 1, 1) # 2020-2021 use visit 5 ) ``` > **Why Visit 5 for 2020-2021?** > > During COVID-19, IBGE suspended in-person data collection. Visit 1 interviews > for 2020-2021 have significant quality issues or are unavailable entirely. > Visit 5 interviews were conducted later under improved conditions and are > the standard choice for COVID-era income and poverty analysis. ```{r load-annual-continued} # Build file paths annual_files <- visit_selection[, .( file = file.path(pnad_annual_dir, sprintf("pnadc_%d_visita%d.fst", ano, visita)) ), by = ano] # Variables to load annual_vars <- c( # Join keys "ano", "trimestre", "upa", "v1008", "v1014", # Demographics "v2005", "v2007", "v2009", "v2010", "uf", "estrato", # Weights and calibration "v1032", "posest", "posest_sxi", # Household per capita income (IBGE pre-calculated) "vd5008" ) # Load and stack annual data annual_data <- rbindlist( lapply(annual_files[file.exists(file), file], function(f) { dt <- read_fst(f, as.data.table = TRUE) setnames(dt, tolower(names(dt))) cols_present <- intersect(annual_vars, names(dt)) dt[, ..cols_present] }), fill = TRUE ) ``` ### Step 2b: Standardize Column Names The annual data has lowercase column names, but `pnadc_apply_periods()` requires specific casing for the join keys. Standardize before applying the crosswalk: ```{r standardize-columns} # pnadc_apply_periods() expects uppercase join keys key_mappings <- c( "ano" = "Ano", "trimestre" = "Trimestre", "upa" = "UPA", "v1008" = "V1008", "v1014" = "V1014", "v1032" = "V1032", "uf" = "UF", "v2009" = "V2009" ) for (old_name in names(key_mappings)) { if (old_name %in% names(annual_data)) { setnames(annual_data, old_name, key_mappings[[old_name]]) } } ``` > **Note:** The calibration columns `posest` and `posest_sxi` stay lowercase---the > package expects them in that case. Only the join keys (`Ano`, `Trimestre`, `UPA`, > `V1008`, `V1014`) and weight column (`V1032`) need uppercase. ### Step 3: Apply Crosswalk and Calibrate Weights Apply the crosswalk to annual data and calibrate weights using `pnadc_apply_periods()`. The function handles the merge internally using the five join keys (`Ano`, `Trimestre`, `UPA`, `V1008`, `V1014`): ```{r apply-crosswalk} d <- pnadc_apply_periods( annual_data, crosswalk, weight_var = "V1032", anchor = "year", calibrate = TRUE, calibration_unit = "month", smooth = TRUE, verbose = TRUE ) # Check match rate (expect ~97% with year anchor) mean(!is.na(d$ref_month_in_quarter)) ``` > **Why `anchor = "year"`?** Annual PNADC data contains only one visit per > household (e.g., visit 1 or visit 5), not all rotation groups like quarterly > data. The `"year"` anchor calibrates the annual weight `V1032` to monthly SIDRA > population totals while preserving yearly totals. ### Step 4: Construct Per Capita Income Use IBGE's pre-calculated household per capita income variable: ```{r construct-income} # Filter to household members only d <- d[v2005 <= 14 | v2005 == 16] # Use IBGE's pre-calculated per capita household income d[, hhinc_pc_nominal := fifelse(is.na(vd5008), 0, vd5008)] ``` ### Step 5: Apply Deflation Convert nominal income to real values using IBGE deflators: ```{r apply-deflation} # Load deflator data (from IBGE documentation) deflator <- readxl::read_excel("path/to/deflator_pnadc_2024.xls") setDT(deflator) deflator <- deflator[, .(Ano = ano, Trimestre = trim, UF = uf, CO2, CO2e, CO3)] # Merge deflators with data setkeyv(deflator, c("Ano", "Trimestre", "UF")) setkeyv(d, c("Ano", "Trimestre", "UF")) d <- deflator[d] # INPC adjustment factor to reference date (December 2025) inpc_factor <- deflateBR::inpc(1, nominal_dates = as.Date("2024-07-01"), real_date = "12/2025") # Apply deflation d[, hhinc_pc := hhinc_pc_nominal * CO2 * inpc_factor] ``` ### Step 6: Define Poverty Line Calculate the World Bank PPP-based poverty threshold: ```{r define-poverty-lines} # World Bank poverty line: USD 8.30 PPP per day (upper-middle income threshold) poverty_line_830_ppp_daily <- 8.30 # 2021 PPP conversion factor (World Bank) # https://data.worldbank.org/indicator/PA.NUS.PRVT.PP?year=2021 usd_to_brl_ppp <- 2.45 days_to_month <- 365/12 # Monthly value in 2021 BRL poverty_line_830_brl_monthly_2021 <- poverty_line_830_ppp_daily * usd_to_brl_ppp * days_to_month # Deflate to December 2025 reference poverty_line_830_brl_monthly_2025 <- deflateBR::inpc( poverty_line_830_brl_monthly_2021, nominal_dates = as.Date("2021-07-01"), real_date = "12/2025" ) d[, poverty_line := poverty_line_830_brl_monthly_2025] ``` > **Why USD 8.30/day?** This is the World Bank's upper-middle income poverty > threshold, appropriate for Brazil. We use the 2021 PPP conversion factor > (2.45 BRL per USD) because 2021 is the World Bank's reference year for > the current poverty lines. --- ## Analysis Examples ### Helper Functions Before computing poverty measures, we define the FGT family of poverty indices: ```{r helper-functions} # FGT poverty measure family (Foster-Greer-Thorbecke) # alpha = 0: Headcount ratio (share below line) # alpha = 1: Poverty gap (average shortfall) # alpha = 2: Squared poverty gap (sensitive to inequality among poor) fgt <- function(x, z, w = NULL, alpha = 0) { if (is.null(w)) w <- rep(1, length(x)) if (length(z) == 1) z <- rep(z, length(x)) idx <- complete.cases(x, z, w) x <- x[idx]; z <- z[idx]; w <- w[idx] g <- pmax(0, (z - x) / z) fgt_val <- ifelse(x < z, g^alpha, 0) sum(w * fgt_val) / sum(w) } ``` ### Example 1: Monthly FGT Poverty Measures Calculate monthly poverty rates using the FGT family: ```{r example-fgt-family} # Filter to determined observations d_monthly <- d[!is.na(ref_month_yyyymm)] # Use calibrated monthly weight (from pnadc_apply_periods()) d_monthly[, peso := weight_monthly] # Compute monthly poverty statistics monthly_poverty <- d_monthly[, .( # FGT-0 (Headcount ratio) poverty_rate = fgt(hhinc_pc, poverty_line, peso, alpha = 0), # FGT-1 (Poverty gap) poverty_gap = fgt(hhinc_pc, poverty_line, peso, alpha = 1), # Mean income mean_income = weighted.mean(hhinc_pc, peso, na.rm = TRUE), # Sample size n_obs = .N ), by = ref_month_yyyymm] # Add date for plotting monthly_poverty[, period := as.Date(paste0( ref_month_yyyymm %/% 100, "-", ref_month_yyyymm %% 100, "-15" ))] ```
Show plotting code ```{r fgt-plot} # Prepare data for plotting fgt_data <- melt( monthly_poverty[, .(period, `PPP 8.30/day` = poverty_rate)], id.vars = "period", variable.name = "poverty_line", value.name = "rate" ) fgt_gap_data <- melt( monthly_poverty[, .(period, `PPP 8.30/day` = poverty_gap)], id.vars = "period", variable.name = "poverty_line", value.name = "gap" ) # Panel A: Headcount ratio (FGT-0) p1 <- ggplot(fgt_data, aes(x = period, y = rate, color = poverty_line)) + geom_line(linewidth = 0.8) + geom_point(size = 1) + scale_y_continuous(labels = percent_format(accuracy = 1), limits = c(0, NA)) + scale_x_date(date_breaks = "1 year", date_labels = "%Y") + scale_color_manual(values = c("PPP 8.30/day" = "#b2182b")) + labs(title = "A. Poverty Headcount (FGT-0)", subtitle = "Share of population below poverty line", x = NULL, y = "Poverty Rate", color = "Poverty Line") + theme_minimal(base_size = 11) + theme(legend.position = "bottom", panel.grid.minor = element_blank(), plot.title = element_text(face = "bold")) # Panel B: Poverty gap (FGT-1) p2 <- ggplot(fgt_gap_data, aes(x = period, y = gap, color = poverty_line)) + geom_line(linewidth = 0.8) + geom_point(size = 1) + scale_y_continuous(labels = percent_format(accuracy = 0.1), limits = c(0, NA)) + scale_x_date(date_breaks = "1 year", date_labels = "%Y") + scale_color_manual(values = c("PPP 8.30/day" = "#ef8a62")) + labs(title = "B. Relative Poverty Gap (FGT-1)", subtitle = "Average shortfall as share of poverty line", x = NULL, y = "Relative Poverty Gap", color = "Poverty Line") + theme_minimal(base_size = 11) + theme(legend.position = "bottom", panel.grid.minor = element_blank(), plot.title = element_text(face = "bold")) # Combine panels library(patchwork) fig_fgt <- p1 / p2 + plot_annotation( title = "Monthly Poverty Measures: Brazil, 2015-2024", caption = "Source: PNADC/IBGE. Annual data with monthly reference periods from PNADCperiods.", theme = theme( plot.title = element_text(face = "bold", size = 14), plot.subtitle = element_text(size = 11), plot.caption = element_text(size = 9, hjust = 0) ) ) fig_fgt ```
![Monthly Poverty Measures: Brazil, 2015-2024](figures/annual-poverty-analysis/fig-fgt-monthly-series.png){width=100%} The figure reveals several key dynamics: 1. **COVID-19 spike (March-April 2020)**: The poverty rate shows a sharp increase in early 2020. 2. **Auxilio Emergencial effect (May-December 2020)**: Emergency cash transfers dramatically reduced poverty below pre-pandemic levels. 3. **Post-Auxilio adjustment (2021)**: As emergency aid was reduced, poverty rates partially rebounded. For proper inference with confidence intervals, use complex survey design with monthly weights---see the [Complex Survey Design](complex-survey-design.html) vignette. --- ## Summary | Insight | Annual Data | Monthly Data | |---------|-------------|--------------| | **COVID poverty spike** | Averaged across year | Visible March-April 2020 | | **Auxilio timing** | Effect blurred | Clear May 2020 onset | | **Recovery dynamics** | Single 2021 estimate | Monthly trajectory | | **Seasonal patterns** | Invisible | December income spikes | **Limitations**: ~3% sample loss from undetermined reference months; annual PNADC is released with 18+ month delay; monthly estimates have wider confidence intervals than annual. --- ## Further Reading - [Get Started](getting-started.html) - Basic mensalization workflow - [How It Works](how-it-works.html) - Algorithm details - [Complex Survey Design](complex-survey-design.html) - Variance estimation - [Applied Examples](applied-examples.html) - Unemployment and minimum wage examples ## References - HECKSHER, Marcos. "Valor Impreciso por Mes Exato: Microdados e Indicadores Mensais Baseados na Pnad Continua". IPEA - Nota Tecnica Disoc, n. 62. Brasilia, DF: IPEA, 2020. - HECKSHER, M. "Cinco meses de perdas de empregos e simulacao de um incentivo a contratacoes". IPEA - Nota Tecnica Disoc, n. 87. Brasilia, DF: IPEA, 2020. - HECKSHER, Marcos. "Mercado de trabalho: A queda da segunda quinzena de marco, aprofundada em abril". IPEA - Carta de Conjuntura, v. 47, p. 1-6, 2020. - NERI, Marcelo; HECKSHER, Marcos. "A Montanha-Russa da Pobreza". FGV Social - Sumario Executivo. Rio de Janeiro: FGV, Junho/2022. - NERI, Marcelo; HECKSHER, Marcos. "A montanha-russa da pobreza mensal e um programa social alternativo". *Revista NECAT*, v. 11, n. 21, 2022. - IBGE. Pesquisa Nacional por Amostra de Domicilios Continua (PNADC). - World Bank. Poverty and Shared Prosperity Reports. Various years. - Foster, J., Greer, J., & Thorbecke, E. (1984). A class of decomposable poverty measures. *Econometrica*, 52(3), 761-766. - Barbosa, Rogerio J; Hecksher, Marcos. (2026). PNADCperiods: Identify Reference Periods in Brazil's PNADC Survey Data. R package version v0.1.0.