--- title: "Working with Unity Catalog Volumes" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Working with Unity Catalog Volumes} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) library(brickster) ``` `{brickster}` includes two groups of volume functions: - `db_uc_volumes_*`: manage volume objects in Unity Catalog (create, list, update, delete) - `db_volume_*`: work with files and directories inside an existing volume In most day-to-day workflows, you will spend most of your time with `db_volume_*` once a volume already exists. ## Which Function to Use This table is focused on **filesystem operations inside an existing volume**: | Operation | Scope | Function | Notes | |-----------|-------|----------|-------| | Upload file | Single file | `db_volume_write()` | Upload one local file to a volume path | | Download file | Single file | `db_volume_read()` | Download one volume file to local disk | | Delete file | Single file | `db_volume_delete()` | Remove one file from a volume | | Check file exists | Single file | `db_volume_file_exists()` | Returns `TRUE`/`FALSE` | | List contents | Directory | `db_volume_list()` | Lists files/subdirectories for a directory | | Create directory | Directory | `db_volume_dir_create()` | Creates target directory path | | Check directory exists | Directory | `db_volume_dir_exists()` | Returns `TRUE`/`FALSE` | | Delete directory | Directory | `db_volume_dir_delete()` | Use `recursive = TRUE` for non-empty directories | | Upload directory | Bulk transfer | `db_volume_upload_dir()` | Parallel upload, `recursive = TRUE` includes subdirectories | | Download directory | Bulk transfer | `db_volume_download_dir()` | Parallel download, `recursive = TRUE` includes subdirectories | ## Example Workflows ### Single File Round-Trip If you just need to move one file in and out of a volume. This example goes beyond the minimum to showcase more than just upload/download. ```{r} library(brickster) volume_root <- "/Volumes/zacdav/default/data" incoming_dir <- file.path(volume_root, "incoming") incoming_file <- file.path(incoming_dir, "example.csv") # create local file local_file <- tempfile(fileext = ".csv") write.csv(mtcars, local_file, row.names = FALSE) # ensure target directory exists db_volume_dir_create(incoming_dir) # upload file db_volume_write( path = incoming_file, file = local_file, overwrite = TRUE ) # verify + inspect db_volume_file_exists(incoming_file) db_volume_list(incoming_dir) # download file back to local path downloaded_file <- tempfile(fileext = ".csv") db_volume_read( path = incoming_file, destination = downloaded_file ) # verify that file can be read as csv read.csv(downloaded_file) # clean up (optional) db_volume_delete(incoming_file) db_volume_dir_delete(incoming_dir) ``` ### Bulk Directory Transfer (Upload + Download) This is a compact pattern for a larger transfer: sample a local dataset to 100 million rows, write a 2-level partitioned Arrow dataset, upload it, then download it back. ```{r} library(brickster) library(arrow) library(dplyr) volume_root <- "/Volumes/zacdav/default/data" landing_dir <- file.path(volume_root, "sample_10m") local_dir <- tempfile("arrow_sample_") # sample to 10M rows # write partitioned Arrow dataset (2 levels deep: cyl/gear) mtcars |> sample_n(size = 1e+07, replace = TRUE) |> write_dataset( path = local_dir, format = "parquet", partitioning = c("cyl", "gear") ) # bulk upload db_volume_upload_dir( local_dir = local_dir, volume_dir = landing_dir, overwrite = TRUE, recursive = TRUE ) # bulk download local_download <- tempfile("arrow_download_") db_volume_download_dir( volume_dir = landing_dir, local_dir = local_download, overwrite = TRUE, recursive = TRUE ) list.files(local_download, recursive = TRUE) # cleanup example directory recursively (optional) db_volume_dir_delete( path = landing_dir, recursive = TRUE ) ``` Set `recursive = FALSE` for non-recursive transfer: only files directly under the source directory are transferred, and nested subdirectories are skipped. ## Managing Volume Objects (Optional) Use `db_uc_volumes_*` when you need to create or manage the **volume object itself** (not files inside it). | Operation | Function | Notes | |-----------|----------|-------| | List volumes in schema | `db_uc_volumes_list()` | Returns volumes under `.` | | Get one volume | `db_uc_volumes_get()` | Returns metadata for one volume | | Create volume | `db_uc_volumes_create()` | Supports `MANAGED` and `EXTERNAL` | | Update volume metadata | `db_uc_volumes_update()` | Rename/comment/owner updates | | Delete volume | `db_uc_volumes_delete()` | Removes the Unity Catalog volume object | ```{r} # list volumes in a schema db_uc_volumes_list(catalog = "", schema = "") # create a managed volume db_uc_volumes_create( catalog = "", schema = "", volume = "my_volume", volume_type = "MANAGED" ) # inspect one volume db_uc_volumes_get( catalog = "", schema = "", volume = "my_volume" ) ``` After a volume exists, use `/Volumes////...` paths with `db_volume_*` for file operations.