--- title: "Getting started with datasusr" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting started with datasusr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` `datasusr` provides fast, in-memory reading of DATASUS `.dbc` files and a complete workflow for discovering, downloading, and caching Brazilian public health data. ## The fastest way: `datasus_fetch()` If you know the source, file type, period, and state you need, `datasus_fetch()` handles listing, downloading, and reading in a single call: ```{r eval = FALSE} library(datasusr) df <- datasus_fetch( source = "SIHSUS", file_type = "RD", year = 2024, month = 1, uf = "PE" ) df ``` The result is a tibble ready for analysis with `dplyr`, `ggplot2`, or any tidyverse tool. Files are cached by default, so running the same call again skips the download entirely. ## Reading a local DBC file If you already have a `.dbc` file on disk, use `read_datasus_dbc()` directly: ```{r eval = FALSE} x <- read_datasus_dbc("RDPE2401.dbc") x ``` ## Selecting columns DATASUS files often have dozens of columns. Use `select` to keep only what you need — this is faster and uses less memory: ```{r eval = FALSE} x <- read_datasus_dbc( "RDPE2401.dbc", select = c("uf_zi", "ano_cmpt", "munic_res", "val_tot") ) ``` ## Controlling column types By default, `datasusr` inspects each numeric field to decide between integer and double. You can override this with `col_types` and parse date fields with `parse_dates`: ```{r eval = FALSE} x <- read_datasus_dbc( "SPPE2401.dbc", select = c("sp_gestor", "sp_naih", "sp_dtinter", "sp_valato"), col_types = c( sp_gestor = "character", sp_naih = "character", sp_dtinter = "date", sp_valato = "double" ), parse_dates = TRUE, guess_types = FALSE ) ``` ## Exploring available data Before downloading, you can browse the internal catalog to discover which sources and file types are available: ```{r eval = FALSE} datasus_sources() datasus_file_types(source = "SIHSUS") datasus_file_types(source = "CNES") ``` ## Step-by-step workflow For more control, you can use the individual functions instead of `datasus_fetch()`: ```{r eval = FALSE} # 1. Build the FTP paths datasus_build_path(source = "SIHSUS", file_type = "RD", year = 2024, month = 1) # 2. List files (validated against FTP) files <- datasus_list_files( source = "SIHSUS", file_type = "RD", year = 2024, month = 1:3, uf = c("PE", "PB") ) # 3. Download with cache downloads <- datasus_download(files, use_cache = TRUE) # 4. Read x <- read_datasus_dbc(downloads$local_file[[1]]) ``` To skip FTP validation (useful when the server is slow), set `check_exists = FALSE` in `datasus_list_files()`. ## Territorial data (municipalities, regions) DATASUS publishes territorial reference tables (municipalities, health regions, etc.) as CSV files. Use `datasus_get_territory()` to download and read them: ```{r eval = FALSE} # Download municipalities table municipios <- datasus_get_territory("tb_municip") municipios # Other available tables datasus_ftp_ls("ftp://ftp.datasus.gov.br/territorio/tabelas/") ``` ## Finding documentation and data dictionaries Each information system has documentation files on the DATASUS FTP. Use `datasus_docs_url()` to find them: ```{r eval = FALSE} # See all documentation paths datasus_docs_url() # List documentation files for a specific system datasus_docs_url("CNES") datasus_ftp_ls(datasus_docs_url("CNES")$docs_url[[1]]) ``` ## Next steps See the other vignettes for more detail: - **Cache and downloads** — how caching works and how to manage it - **Performance notes** — tips for reading large files quickly - **Comparison with read.dbc and microdatasus** — how `datasusr` relates to other R packages for DATASUS data