--- title: "Working with tidycensuskr" output: rmarkdown::html_vignette date: "`r format(Sys.Date(), '%B %d, %Y')`" vignette: > %\VignetteIndexEntry{Working with tidycensuskr} %\VignetteEngine{knitr::rmarkdown} %\usepackage[utf8]{inputenc} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE, out.width = "100%" ) library(ggplot2) library(sf) library(dplyr) library(tidyr) ``` ## Getting Started with tidycensuskr The `tidycensuskr` package provides easy access to South Korean census and socioeconomic statistics, along with corresponding geospatial boundary data. With this package, R users can query and visualize population, housing, economy, tax, and mortality data linked to administrative districts. Load the package: ```{r, include = TRUE} library(tidycensuskr) ``` tidycensuskr will work at its full potential with the companion data package `tidycensussfkr`, which contains the district boundaries of South Korea. The package can be installed from R-universe: ```r install.packages("tidycensussfkr", repos = "https://sigmafelix.r-universe.dev") ``` After installing the companion package, three RDS files for 2010, 2015, and 2020 will be accessible through the function `system.file()`. For example, the RDS file path of the 2010 district boundaries can be loaded as follows: ```r fs10 <- system.file("extdata", "adm2_sf_2010.rds", package = "tidycensussfkr") adm2_sf_2010 <- readRDS(fs10) ``` ## 1. Understanding Korean Geographic Hierarchies South Korean census data is organized by three levels of administrative divisions: - **_Si-Do_**: The highest level of administrative division. - Metropolitan cities are treated as provinces. - Jeju-do, Gangwon-do, and Jeollabuk-do have special self-governing status under Korean law. - **_Si-Gun-Gu_**: The second level, which includes cities and counties. - **_Si_**: Cities (urban administrative units) - **_Gun_**: Counties (rural areas, typically <50,000 population) - **_Gu_**: Districts (urban subdivisions of metropolitan cities or large cities). - _Gu_ under metropolitan cities are autonomous districts - _Gu_ under 11 large cities, as of 2025 (i.e., Suwon-si, Seongnam-si, Anyang-si, Goyang-si, Ansan-si, Yongin-si, Cheongju-si, Cheonan-si, Pohang-si, Changwon-si, Jeonju-si), are administraitve districts - **_Eup-Myeon-Dong_**: The third level, which includes town and districts. (_planned for future releases_) - **_Eup_**: Towns (urban, >20,000 population, within a county) - **_Myeon_**: Townships (rural, <20,000 population, within a county) - **_Dong_**: Neighborhoods (smallest units within cities and districts) ### Comparison of Administrative Divisions The table below provides a rough comparison of administrative divisions across South Korea, the United States, the European Union, and the United Kingdom (England). While the correspondence is not exact, it can be helpful to understand the approximate levels when working with census or regional data. | South Korea | US | EU (NUTS[^1]) | UK (England) | |-------------------|-------------------------------------------|---------------|----------------------------------| | **Si/Do** | State | NUTS1 | Regions / Combined Authorities | | **Si/Gun/Gu** | County | NUTS2 | County | | **Eup/Myeon/Dong**| Townships / Towns / Census County Division | NUTS3 | Districts / Wards / Boroughs | [^1]: NUTS: *Nomenclature of Territorial Units for Statistics*, a geocode standard for referencing the subdivisions of countries for statistical purposes.  Because administrative boundaries and coding systems can vary across years and data sources, `tidycensuskr` harmonizes codes to allow consistent integration of statistics. Currently, for 2020 data there are 250 _Si-Gun-Gu_ and 17 _Si-Do_. ```{r, include = TRUE} data(adm2_sf_2020) print(length(unique(adm2_sf_2020$adm2_code))) ``` ## 2. Available census data The package provides census and survey data through: - The function `anycensus()` for querying subsets - The built-in dataset `censuskor` in long format ### Data types