--- title: "Using survey weights with ihsMW" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Using survey weights with ihsMW} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` For researchers targeting population-representative estimates, accounting for the complex survey design of the Malawi Integrated Household Survey (IHS) is non-negotiable. The `ihsMW` package abstracts the tedious configuration of these designs, integrating directly with existing R infrastructure to provide statistically sound representations natively. ## 1. Why weights matter The IHS utilizes a **stratified two-stage cluster sample** design explicitly mapped to represent the national population, rural/urban disparities, and specific geographical districts simultaneously. Unweighted means are **NOT** nationally representative. Computing a simple average across an unweighted `data.frame` yields biased estimates skewed by disproportionate sampling allocations inherent to rural versus urban tracks. > "Overall, the IHS5 sample design is a stratified two-stage sample... Therefore, it is imperative that the survey weights be used when making national-level or regional-level estimates." > — *National Statistical Office (2020), IHS5 Basic Information Document, Sampling Section.* ## 2. The IHS_survey() function To bridge the gap between raw data downloads and proper statistical weights seamlessly, `ihsMW` provides the `IHS_survey()` function. This wrapper extracts the intended indicators alongside their underlying survey dimensions dynamically. ```{r eval=FALSE} library(ihsMW) # Automatically intercept consumption variables and inject structural weighting svy <- IHS_survey("rexp_cat01", round = "IHS5") # The output natively masks as a tbl_svy allowing tidy-eval manipulation class(svy) #> [1] "tbl_svy" "svydesign2" "svydesign" ``` ## 3. Computing weighted estimates Once instantiated, the survey object behaves exactly as a standard complex environment gracefully tracking variances internally. If you prefer classic structural base approximations using the `survey` package: ```{r eval=FALSE} library(survey) # Compute the statistically accurate, nationally representative average svymean(~rexp_cat01, design = svy, na.rm = TRUE) # Segment the nationally representative consumption by explicit strata svyby(~rexp_cat01, ~stratum, svy, svymean, na.rm = TRUE) ``` Alternatively, leverage the `srvyr` package bridging `dplyr` pipelines into weighted topologies intuitively: ```{r eval=FALSE} library(srvyr) # Tidy-style summaries mapping the underlying survey dimensions natively svy |> group_by(stratum) |> summarise(mean_cons = survey_mean(rexp_cat01, na.rm = TRUE)) ``` ## 4. Multi-round weighted analysis Because population sizes, spatial mapping coordinates, and primary sampling units fundamentally shift globally between cross-sectional spans, survey objects targeting discrete rounds **should NOT be intelligently pooled naively** under a unified statistical architecture without intensive independent reweighting operations. Instead, when querying multiple rounds natively, `IHS_survey()` protects inferences iteratively yielding a distinct instantiated named list structure explicitly encapsulating the unique configurations cleanly. ```{r eval=FALSE} # Requesting pooled objects targets isolated arrays preserving isolated bounds svy_list <- IHS_survey("rexp_cat01", round = c("IHS4", "IHS5")) # Apply functional iteration computing the unique representation safely lapply(svy_list, function(s) { survey::svymean(~rexp_cat01, design = s, na.rm = TRUE) }) ``` ## 5. Weight variables per round `ihsMW` relies on a hard-coded internal mapping dictating the string topologies targeted during object instantiation dynamically. | Round | `weight_var` | `strata_var` | `cluster_var` | |-------|--------------|--------------|---------------| | IHS1 | `wght` | `stratum` | `ea_id` | | IHS2 | `hh_wgt` | `stratum` | `ea_code` | | IHS3 | `hh_wgt` | `stratum` | `ea_id` | | IHS4 | `hh_wgt` | `stratum` | `ea_id` | | IHS5 | `hhweight` | `stratum` | `ea_id` | **Crucial Warning:** These mappings are assumed static but should always be independently verified by the researcher. You must actively cross-reference these fields against the explicit [World Bank Microdata Library BIDs](https://microdata.worldbank.org/index.php/catalog) natively avoiding invalid proxy mappings implicitly. ## 6. Common mistakes Researchers approaching the package frequently implement standard code workflows resulting in structurally invalid endpoints natively: * **Using raw `IHS()` outcomes:** Extracting variables exclusively natively bypassing survey targets avoids structural weights natively skewing estimates entirely inappropriately. Always utilize `IHS_survey()` if inference depends on it. * **Naive pooling across bounds:** Extracting `IHS(round = "all")` into a static `.dta` equivalent and independently wrapping it natively into a single global `.svydesign()` fundamentally crashes cluster targets natively. Always retain separate round lists securely. * **Forgetting the clustering attributes natively:** Ignoring the structural requirement utilizing standard `survey::svydesign()` directly without capturing the `nest = TRUE` flag fundamentally masks variance clusters inflating assumed structural confidences artificially.