Title: | Rename and Encode Data Frames Using External Crosswalk Files |
Version: | 0.3.0 |
Description: | A pair of functions for renaming and encoding data frames using external crosswalk files. It is especially useful when constructing master data sets from multiple smaller data sets that do not name or encode variables consistently across files. Based on similar commands in 'Stata'. |
Maintainer: | Benjamin Skinner <ben@btskinner.io> |
URL: | https://github.com/btskinner/crosswalkr |
BugReports: | https://github.com/btskinner/crosswalkr/issues |
Depends: | R (≥ 3.5.0) |
Imports: | haven, labelled, methods, readr, readxl, tibble |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
Suggests: | testthat, knitr, rmarkdown, dplyr |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2025-04-08 15:00:35 UTC; benski |
Author: | Benjamin Skinner |
Repository: | CRAN |
Date/Publication: | 2025-04-09 07:30:36 UTC |
crosswalkr: Rename and Encode Data Frames Using External Crosswalk Files
Description
A pair of functions for renaming and encoding data frames using external crosswalk files. Especially useful when constructing master data sets from multiple smaller data sets that do not name or encode variables consistently across files. Based on -renamefrom- and -encodefrom- Stata commands.
Author(s)
Maintainer: Benjamin Skinner ben@btskinner.io (ORCID)
See Also
Useful links:
Report bugs at https://github.com/btskinner/crosswalkr/issues
Encode data frame column using external crosswalk file.
Description
Encode data frame column using external crosswalk file.
Usage
encodefrom(
.data,
var,
cw_file,
raw,
clean,
label,
delimiter = NULL,
sheet = NULL,
case_ignore = TRUE,
ignore_tibble = FALSE
)
encodefrom_(
.data,
var,
cw_file,
raw,
clean,
label,
delimiter = NULL,
sheet = NULL,
case_ignore = TRUE,
ignore_tibble = FALSE
)
Arguments
.data |
Data frame or tbl_df |
var |
Column name of vector to be encoded |
cw_file |
Either data frame object or string with path to
external crosswalk file, including path, which has columns
representing |
raw |
Name of column in |
clean |
Name of column in |
label |
Name of column in |
delimiter |
String delimiter used to parse
|
sheet |
Specify sheet if |
case_ignore |
Ignore case when matching current ( |
ignore_tibble |
Ignore |
Value
Vector that is either a factor or labelled, depending on data input and options
Functions
-
encodefrom_()
: Standard evaluation version ofencodefrom
(var
,raw
,clean
, andlabel
must be strings when using this version)
Examples
df <- data.frame(state = c('Kentucky','Tennessee','Virginia'),
stfips = c(21,47,51),
cenregnm = c('South','South','South'))
df_tbl <- tibble::as_tibble(df)
cw <- get(data(stcrosswalk))
df$state2 <- encodefrom(df, state, cw, stname, stfips, stabbr)
df_tbl$state2 <- encodefrom(df_tbl, state, cw, stname, stfips, stabbr)
df_tbl$state3 <- encodefrom(df_tbl, state, cw, stname, stfips, stabbr,
ignore_tibble = TRUE)
haven::as_factor(df_tbl)
haven::zap_labels(df_tbl)
Rename data frame columns using external crosswalk file.
Description
Rename data frame columns using external crosswalk file.
Usage
renamefrom(
.data,
cw_file,
raw,
clean,
label = NULL,
delimiter = NULL,
sheet = NULL,
drop_extra = TRUE,
case_ignore = TRUE,
keep_label = FALSE,
name_label = FALSE
)
renamefrom_(
.data,
cw_file,
raw,
clean,
label = NULL,
delimiter = NULL,
sheet = NULL,
drop_extra = TRUE,
case_ignore = TRUE,
keep_label = FALSE,
name_label = FALSE
)
Arguments
.data |
Data frame or tbl_df |
cw_file |
Either data frame object or string with path to
external crosswalk file, which has columns representing
|
raw |
Name of column in |
clean |
Name of column in |
label |
Name of column in |
delimiter |
String delimiter used to parse
|
sheet |
Specify sheet if |
drop_extra |
Drop extra columns in current data frame if they
are not matched in |
case_ignore |
Ignore case when matching current ( |
keep_label |
Keep current label, if any, on data frame columns
that aren't matched in |
name_label |
Use old ( |
Value
Data frame or tbl_df with new column names and labels.
Functions
-
renamefrom_()
: Standard evaluation version ofrenamefrom
(raw
,clean
, andlabel
must be strings when using this version)
Examples
df <- data.frame(state = c('Kentucky','Tennessee','Virginia'),
fips = c(21,47,51),
region = c('South','South','South'))
cw <- data.frame(old_name = c('state','fips'),
new_name = c('stname','stfips'),
label = c('Full state name', 'FIPS code'))
df1 <- renamefrom(df, cw, old_name, new_name, label)
df2 <- renamefrom(df, cw, old_name, new_name, name_label = TRUE)
df3 <- renamefrom(df, cw, old_name, new_name, drop_extra = FALSE)
State crosswalk data set.
Description
An example state crosswalk. Includes information for all states plus the District of Columbia.
Usage
stcrosswalk
Format
A data frame with 51 rows and 7 variables:
- stfips
Two-digit state FIPS codes
- stabbr
Two-letter state abbreviation
- stname
Full state name
- cenreg
Census region number
- cenregnm
Census region name
- cendiv
Census division number
- cendivnm
Census division name
State and territory crosswalk data set.
Description
An example state and territory crosswalk. Includes information for all states plus the District of Columbia plus territories.
Usage
sttercrosswalk
Format
A data frame with 69 rows and 10 variables:
- stfips
Two-digit FIPS codes
- stabbr
Two-letter abbreviation
- stname
Full name
- cenreg
Census region number
- cenregnm
Census region name
- cendiv
Census division number
- cendivnm
Census division name
- is_state
Indicator for status as state
- is_state_dc
Indicator for status as state or DC
- status
1 := Under U.S. sovereignty; 2 := Minor Outlying Islands; 3 := Independent nation under Compact of Free Association with U.S.; 4 := Individual Minor Outlying Islands (within status 2)