--- title: "Reading and Writing Data Files" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Reading and Writing Data Files} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` While `eg_read()` and `eg_write()` handle raw file transfers, you probably spend most of your time working with data files that you want to load directly into R as data frames. egnyte provides a set of functions that handle the download-read or write-upload workflow in a single step. ## Prerequisites Before you can read or write files, you need to authenticate. See `vignette("configuration")` or `vignette("authorization")` if you haven't set that up yet. ```{r} library(egnyte) eg_auth() ``` ## Reading Data Files egnyte provides read functions for common data formats. Each function: 1. Downloads the file from Egnyte to a temporary location 2. Reads it using the appropriate R package 3. Returns the data as a data frame (or tibble) 4. Cleans up the temporary file ### CSV Files ```{r} # Read a CSV file dat <- eg_read_csv("/Shared/Data/analysis.csv") ``` This uses `readr::read_csv()` under the hood. You can pass any additional arguments that `read_csv()` accepts: ```{r} # Specify column types dat <- eg_read_csv( "/Shared/Data/analysis.csv", col_types = cols( id = col_integer(), name = col_character(), value = col_double() ) ) # Skip rows, select columns, etc. dat <- eg_read_csv( "/Shared/Data/analysis.csv", skip = 2, col_select = c(id, name, value) ) ``` ### Delimited Files For files with delimiters other than commas: ```{r} # Tab-delimited file (default) dat <- eg_read_delim("/Shared/Data/data.tsv") # Pipe-delimited file dat <- eg_read_delim("/Shared/Data/data.txt", delim = "|") # Semicolon-delimited (common in European CSVs) dat <- eg_read_delim("/Shared/Data/european.csv", delim = ";") ``` ### Excel Files ```{r} # Read an Excel file (first sheet by default) dat <- eg_read_excel("/Shared/Data/workbook.xlsx") # Read a specific sheet by name dat <- eg_read_excel("/Shared/Data/workbook.xlsx", sheet = "Summary") # Read a specific sheet by position dat <- eg_read_excel("/Shared/Data/workbook.xlsx", sheet = 3) ``` Additional `readxl::read_excel()` arguments work here too: ```{r} # Specify a range dat <- eg_read_excel( "/Shared/Data/workbook.xlsx", range = "B2:F100" ) # Skip rows dat <- eg_read_excel( "/Shared/Data/workbook.xlsx", skip = 5 ) ``` ### SAS Files ```{r} # Read a SAS7BDAT file dat <- eg_read_sas("/Shared/Data/dataset.sas7bdat") # Read a SAS transport file (.xpt) dat <- eg_read_xpt("/Shared/Data/dataset.xpt") ``` These use `haven::read_sas()` and `haven::read_xpt()`. Haven preserves SAS attributes like variable labels and formats. ### Stata Files ```{r} # Read a Stata file dat <- eg_read_stata("/Shared/Data/dataset.dta") ``` ### SPSS Files ```{r} # Read an SPSS file dat <- eg_read_spss("/Shared/Data/dataset.sav") ``` ### R Objects (RDS) ```{r} # Read an RDS file obj <- eg_read_rds("/Shared/Data/model.rds") ``` RDS files can contain any R object, not just data frames. This is useful for saving fitted models, lists, or other complex objects. ## Writing Data Files The write functions work in reverse - they take an R object, write it to a temporary file, and upload it to Egnyte. ### CSV Files ```{r} # Write a data frame to CSV eg_write_csv(dat, "/Shared/Data/results.csv") ``` By default, this will fail if the file already exists: ```{r} # Overwrite an existing file eg_write_csv(dat, "/Shared/Data/results.csv", overwrite = TRUE) ``` You can pass additional arguments to `readr::write_csv()`: ```{r} # Don't include column names eg_write_csv(dat, "/Shared/Data/results.csv", col_names = FALSE) # Use a different NA representation eg_write_csv(dat, "/Shared/Data/results.csv", na = ".") ``` ### Delimited Files ```{r} # Write a tab-delimited file eg_write_delim(dat, "/Shared/Data/results.tsv") # Write with a different delimiter eg_write_delim(dat, "/Shared/Data/results.txt", delim = "|") ``` ### Excel Files ```{r} # Write a single sheet eg_write_excel(dat, "/Shared/Data/results.xlsx") # Write multiple sheets by passing a named list eg_write_excel( list( "Summary" = summary_df, "Details" = details_df, "Raw" = raw_df ), "/Shared/Data/workbook.xlsx" ) ``` ### SAS Transport Files ```{r} # Write a SAS transport file eg_write_xpt(dat, "/Shared/Data/results.xpt") ``` Note: egnyte can write XPT files but not native SAS7BDAT files. If you need SAS7BDAT, you'll need to use SAS itself or another tool. ### Stata Files ```{r} # Write a Stata file eg_write_stata(dat, "/Shared/Data/results.dta") ``` ### SPSS Files ```{r} # Write an SPSS file eg_write_spss(dat, "/Shared/Data/results.sav") ``` ### R Objects (RDS) ```{r} # Save any R object eg_write_rds(fitted_model, "/Shared/Data/model.rds") # Control compression eg_write_rds(large_data, "/Shared/Data/data.rds", compress = "xz") ``` ## Optional Dependencies The format-specific functions require additional packages that aren't installed by default with egnyte: | Function | Required Package | |----------|-----------------| | `eg_read_csv()`, `eg_write_csv()` | readr | | `eg_read_delim()`, `eg_write_delim()` | readr | | `eg_read_excel()` | readxl | | `eg_write_excel()` | writexl | | `eg_read_sas()`, `eg_read_xpt()`, `eg_write_xpt()` | haven | | `eg_read_stata()`, `eg_write_stata()` | haven | | `eg_read_spss()`, `eg_write_spss()` | haven | | `eg_read_rds()`, `eg_write_rds()` | (base R - no extra package) | If you try to use a function without the required package installed, egnyte will prompt you to install it: ``` The readr package is required for this function. Would you like to install it? (yes/no) ``` ## A Few Tips ### Working with Large Files The read functions download the entire file before reading it. For large files, this can take a while and use significant memory. A few suggestions: - If you only need specific columns, use the `col_select` argument (for CSV/delim files) to avoid loading unnecessary data - Consider whether you really need the whole file, or if you can work with a sample - For very large datasets, you might want to use `eg_read()` to download the file once, then work with it locally ### Passing Arguments Through All the format-specific functions accept `...` arguments that get passed to the underlying read/write function. If you're familiar with `readr::read_csv()` or `haven::read_sas()`, you can use all the same options: ```{r} # readr options dat <- eg_read_csv( "/Shared/Data/data.csv", col_types = "ccdd", locale = locale(decimal_mark = ","), na = c("", "NA", "N/A", ".") ) # haven options dat <- eg_read_sas( "/Shared/Data/data.sas7bdat", encoding = "latin1" ) ``` ### File Extensions Matter egnyte uses the file extension to determine the format when downloading. Make sure your files have the correct extension: - `.csv` for CSV files - `.xlsx` or `.xls` for Excel files - `.sas7bdat` for SAS data files - `.xpt` for SAS transport files - `.dta` for Stata files - `.sav` for SPSS files - `.rds` for R objects ## Comparison with Base File Transfer Here's the difference between using the format-specific functions vs. the base `eg_read()`/`eg_write()`: **Using eg_read_csv() (recommended for data files):** ```{r} # One step - download and read dat <- eg_read_csv("/Shared/Data/data.csv") ``` **Using eg_read() (manual approach):** ```{r} # Two steps - download, then read temp_file <- eg_read("/Shared/Data/data.csv") dat <- readr::read_csv(temp_file) unlink(temp_file) # Clean up ``` The format-specific functions handle all this for you and clean up temporary files automatically. ## Next Steps - Learn about authentication options in `vignette("authorization")` - See raw file transfer in `vignette("file-transfer")`