| Type: | Package |
| Title: | Data Frame Joins Leveraging 'data.table' |
| Version: | 0.1.0 |
| Description: | Extends 'data.table' join functionality, lets it work with any data frame class, and provides a familiar 'x'/'y'-style interface, enabling broad use across R. Offers NA-safe matching by default, on-the-fly column selection, multiple match-handling on both sides, 'x' or 'y' row order, and a row origin indicator. Performs inner, left, right, full, semi- and anti-joins with equality and inequality conditions, plus cross joins. Specific support for 'data.table', (grouped) tibble, and 'sf'/'sfc' objects and their attributes; returns a plain data frame otherwise. Avoids data-copying of inputs and outputs. Allows displaying the 'data.table' code instead of (or as well as) executing it. |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| Depends: | R (≥ 3.3.0) |
| Imports: | data.table |
| Suggests: | dplyr (≥ 1.1.0), sf, testthat (≥ 3.0.0), knitr, rmarkdown, quarto, bench, ggplot2 |
| URL: | https://trobx.github.io/fjoin/ |
| BugReports: | https://github.com/trobx/fjoin/issues |
| Config/testthat/edition: | 3 |
| VignetteBuilder: | knitr, quarto |
| RoxygenNote: | 7.3.3 |
| NeedsCompilation: | no |
| Packaged: | 2025-12-05 22:00:23 UTC; Toby |
| Author: | Toby Robertson [aut, cre] |
| Maintainer: | Toby Robertson <trobx@proton.me> |
| Repository: | CRAN |
| Date/Publication: | 2025-12-11 13:30:08 UTC |
fjoin
Description
fjoin builds on data.table to provide fast, flexible joins on any data frames. It slots into tidyverse pipelines and general workflows in a single line, and provides NA-safe matching by default, on-the-fly column selection, flexible row-order preservation, multiple-match handling on both sides, and an indicator column for row origin.
Vignette
View the Get started
guide on the package website
or access locally in R with vignette("fjoin").
API
| fjoin_* functions | dtjoin_* functions |
x/y style | Extended DT[i] style |
fjoin_inner(), fjoin_left(), fjoin_right(), fjoin_full() | dtjoin() |
fjoin_left_semi() (alias fjoin_semi()), fjoin_right_semi() | dtjoin_semi() |
fjoin_left_anti() (alias fjoin_anti()), fjoin_right_anti() | dtjoin_anti() |
fjoin_cross() | dtjoin_cross()
|
Author(s)
Maintainer: Toby Robertson trobx@proton.me
See Also
Useful links:
Join data frame-like objects using an extended DT[i]-style interface
to data.table
Description
Write (and optionally run) data.table code for a join
using a generalisation of DT[i] syntax with extended arguments and
enhanced behaviour. Accepts any data.frame-like inputs (not only
data.tables), permits left, right, inner, and full joins, prevents
unwanted matches on NA and NaN by default, does not garble join
columns in non-equality joins, allows mult on both sides of the join,
creates an optional join indicator column, allows specifying which columns to
select from each input, and provides convenience options to control column
order and prefixing.
If run, the join returns a data.frame, data.table, tibble,
sf, or sf-tibble according to context. The generated
data.table code can be printed to the console instead of (or as well
as) being executed. This feature extends to mock joins, where no
inputs are provided, and template code is produced.
dtjoin is the workhorse function for fjoin_inner,
fjoin_left, fjoin_right, and
fjoin_full, which are wrappers providing a more conventional
interface for join operations. These functions are recommended over
dtjoin for most users and cases.
Usage
dtjoin(
.DT = NULL,
.i = NULL,
on,
match.na = FALSE,
mult = "all",
mult.DT = "all",
nomatch = NA,
nomatch.DT = NULL,
indicate = FALSE,
select = NULL,
select.DT = NULL,
select.i = NULL,
both = FALSE,
on.first = FALSE,
i.home = FALSE,
i.first = i.home,
prefix = if (i.home) "x." else "i.",
i.class = i.home,
do = !(is.null(.DT) && is.null(.i)),
show = !do,
verbose = FALSE,
...
)
Arguments
.DT, .i |
|
on |
A character vector of join predicates, e.g. |
match.na |
If |
mult |
(as in |
mult.DT |
Like |
nomatch |
(as in |
nomatch.DT |
Like |
indicate |
Whether to add a column |
select, select.DT, select.i |
Character vectors of columns to be selected
from either input if present ( |
both |
Whether to include equality join columns from the "foreign"
table separately in the output, instead of combining them with those from
the "home" table. Default |
on.first |
Whether to place the join columns from both inputs first in
the join result. Default |
i.home |
Whether to treat |
i.first |
Whether to place |
prefix |
A prefix to attach to column names in the "foreign" table that
are the same as a column name in the "home" table. The default is
|
i.class |
Whether the |
do |
Whether to execute the join. Default is |
show |
Whether to print the code for the join to the console. Default is
the opposite of |
verbose |
(passed to |
... |
Further arguments (for internal use). |
Details
Input and output class
Each input can be any object with class data.frame, or a plain
list of same-length vectors.
The output class depends on .DT by default (but .i with
i.class = TRUE) and is as follows:
a
data.tableif the input is a puredata.tablea tibble if it is a tibble (and a grouped tibble if it has class
grouped_df)an
sfif it is ansfwith its active geometry selected in the joina plain
data.framein all other cases
The following attributes are carried through and refreshed: data.table
key, tibble groups, sf agr (and bbox etc. of all
individual sfc-class columns regardless of output class). See below
for specifics. Other classes and attributes are not carried through.
Specifying join conditions with on
on is a required argument. For a natural join (a join by equality on
all same-named column pairs), you must specify on = NA; you can't just
omit on as in other packages. This is to prevent a natural join being
specified by mistake, which may then go unnoticed.
Using select, select.DT, and select.i
Used on its own, select keeps the join columns plus the specified
non-join columns from both inputs if present.
If select.DT is provided (and similarly for select.i) then:
if
selectis also specified, non-join columns of.DTnamed in eitherselectorselect.DTare includedif
selectis not specified, only non-join columns named inselect.DTare included from.DT. Thus e.g.select.DT = ""excludes all of.DT's non-join columns.
Non-existent column names are ignored without warning.
Column order
When select is specified but select.DT and select.i are
not, the output consists of all join columns followed by the selected
non-join columns from either input in the order given in select.
In all other cases:
columns from
.DTcome before columns from.iby default (but vice versa ifi.firstisTRUE)within each group of columns, non-join columns are in the order given by
select.DT/select.i, or in their original data order if no selection is providedif
on.firstisTRUE, join columns from both inputs are moved to the front of the overall output.
Using mult and mult.DT
If both of these arguments are not the default "all", mult is
applied first (typically by passing directly to [.data.table) and
mult.DT is applied subsequently to eliminate all but the first or last
occurrence of each row of .DT from the inner part of the join,
producing a 1:1 result. This order of operations can affect the identity of
the rows in the inner join.
Displaying code and 'mock joins'
The option of displaying the join code with show = TRUE or by passing
null inputs is aimed at data.table users wanting to use the package as
a cookbook of recipes for adaptation. If .DT and .i are both
NULL, template code is displayed based on join column names implied by
on, plus sample non-join column names. select arguments are
ignored in this case.
The code displayed is for the join operation after casting the inputs as
data.tables if necessary, and before casting the result as a tibble
and/or sf if applicable. Note that fjoin departs from the usual
j = list() idiom in order to avoid a deep copy of the output made by
as.data.table.list. (Likewise, internally it takes only shallow copies
of columns when casting inputs or outputs to different classes.)
tibble groups
If the relevant input is a grouped tibble (class grouped_df), the
output is grouped by the grouping columns that are selected in the result.
data.table keys
If .i is a keyed data.table and the output is also a
data.table, it inherits .i's key provided
nomatch.DT is NULL (i.e. the non-matching rows of .DT
are not included in the result). This differs from a data.table
DT[i] join, in which the output inherits the key of DT
provided it remains sorted on those columns. If not all of the key columns
are selected in the result, the leading subset is used.
sf objects and sfc-class columns
Joins between two sf objects are supported. The relation-to-geometry
attribute agr is inherited from the input supplying the active
geometry. All sfc-class columns in the output are refreshed after
joining (using sf::st_sfc() with recompute_bbox = TRUE); this
is true regardless of whether or not the inputs and output are sfs.
Value
A data.frame, data.table, (grouped) tibble, sf,
or sf-tibble, or else NULL if do is FALSE. See
Details.
See Also
See the package-level documentation fjoin for related
functions.
Examples
# An illustration showing:
# - two calls to fjoin_left() (commented out), differing in the `order` argument
# - the resulting calls to dtjoin(), plus `show = TRUE`
# - the generated data.table code and output
# data frames
set.seed(1)
df_x <- data.frame(id_x = 1:3, col_x = paste0("x", 1:3), val = runif(3))
df_y <- data.frame(id_y = rep(4:2, each = 2), col_y = paste0("y", 1:6), val = runif(6))
# ---------------------------------------------------------------------------
# (1) fjoin_left(df_x, df_y, on = "id_x == id_y", mult.x = "first")
dtjoin(
df_y,
df_x,
on = "id_y == id_x",
mult = "first",
i.home = TRUE,
prefix = "R.",
show = TRUE
)
# (2) fjoin_left(df_x, df_y, on = "id_x == id_y", mult.x = "first", order = "right")
dtjoin(
df_x,
df_y,
on = "id_x == id_y",
mult.DT = "first",
nomatch = NULL,
nomatch.DT = NA,
prefix = "R.",
show = TRUE
)
Anti-join of DT in a DT[i]-style join of data frame-like
objects
Description
Write (and optionally run) data.table code to return the anti-join of
DT (the rows of DT not joining with i) using a
generalisation of DT[i] syntax.
The functions fjoin_left_anti and fjoin_right_anti
provide a more conventional interface that is recommended over
dtjoin_anti for most users and cases.
Usage
dtjoin_anti(
.DT = NULL,
.i = NULL,
on,
match.na = FALSE,
mult = "all",
mult.DT = "all",
nomatch = NULL,
nomatch.DT = NULL,
select = NULL,
do = !(is.null(.DT) && is.null(.i)),
show = !do,
verbose = FALSE,
...
)
Arguments
.DT, .i |
|
on |
A character vector of join predicates, e.g. |
match.na |
If |
mult |
(as in |
mult.DT |
Permitted for consistency with |
nomatch, nomatch.DT |
Permitted for consistency with |
select |
Character vector of columns of |
do |
Whether to execute the join. Default is |
show |
Whether to print the code for the join to the console. Default is
the opposite of |
verbose |
(passed to |
... |
Further arguments (for internal use). |
Details
Details are as for dtjoin except for arguments controlling
the order and prefixing of output columns, which do not apply.
Value
A data.frame, data.table, (grouped) tibble, sf,
or sf-tibble, or else NULL if do is FALSE. See
Details.
See Also
See the package-level documentation fjoin for related
functions.
Examples
# Mock joins
dtjoin_anti(on = "id")
dtjoin_anti(on = c("id", "date <= date"))
dtjoin_anti(on = c("id", "date <= date"), mult = "last")
Cross join of data frame-like objects DT and i using
a DT[i]-style interface to data.table
Description
Write (and optionally run) data.table code to return the cross join of
two data.frame-like objects using a generalisation of DT[i]
syntax.
The function fjoin_cross provides a more conventional interface
that is recommended over dtjoin_cross for most users and cases.
Usage
dtjoin_cross(
.DT = NULL,
.i = NULL,
select = NULL,
select.DT = NULL,
select.i = NULL,
i.home = FALSE,
i.first = i.home,
prefix = if (i.home) "x." else "i.",
i.class = i.home,
do = !(is.null(.DT) && is.null(.i)),
show = !do,
...
)
Arguments
.DT, .i |
|
select, select.DT, select.i |
Character vectors of columns to be selected
from either input if present ( |
i.home |
Whether to treat |
i.first |
Whether to place |
prefix |
A prefix to attach to column names in the "foreign" table that
are the same as a column name in the "home" table. The default is
|
i.class |
Whether the |
do |
Whether to execute the join. Default is |
show |
Whether to print the code for the join to the console. Default is
the opposite of |
... |
Further arguments (for internal use). |
Details
Details are as for dtjoin except for remarks about join
columns and matching logic, which do not apply.
Value
A data.frame, data.table, (grouped) tibble, sf,
or sf-tibble, or else NULL if do is FALSE. See
Details.
See Also
See the package-level documentation fjoin for related
functions.
Examples
# data frames
df1 <- data.table::fread(data.table = FALSE, input = "
bread kcal
Brown 150
White 180
Baguette 250
")
df2 <- data.table::fread(data.table = FALSE, input = "
filling kcal
Cheese 200
Pâté 160
")
dtjoin_cross(df1, df2)
Semi-join of DT in a DT[i]-style join of data frame-like
objects
Description
Write (and optionally run) data.table code to return the semi-join of
DT (the rows of DT that join with i) using a
generalisation of DT[i] syntax.
The functions fjoin_left_semi and fjoin_right_semi
provide a more conventional interface that is recommended over
dtjoin_semi for most users and cases.
Usage
dtjoin_semi(
.DT = NULL,
.i = NULL,
on,
match.na = FALSE,
mult = "all",
mult.DT = "all",
nomatch = NULL,
nomatch.DT = NULL,
select = NULL,
do = !(is.null(.DT) && is.null(.i)),
show = !do,
verbose = FALSE,
...
)
Arguments
.DT, .i |
|
on |
A character vector of join predicates, e.g. |
match.na |
If |
mult |
(as in |
mult.DT |
Permitted for consistency with |
nomatch, nomatch.DT |
Permitted for consistency with |
select |
Character vector of columns of |
do |
Whether to execute the join. Default is |
show |
Whether to print the code for the join to the console. Default is
the opposite of |
verbose |
(passed to |
... |
Further arguments (for internal use). |
Details
Details are as for dtjoin except for arguments controlling
the order and prefixing of output columns, which do not apply.
Value
A data.frame, data.table, (grouped) tibble, sf,
or sf-tibble, or else NULL if do is FALSE. See
Details.
See Also
See the package-level documentation fjoin for related
functions.
Examples
# Mock joins
dtjoin_semi(on = "id")
dtjoin_semi(on = c("id", "date <= date"))
dtjoin_semi(on = c("id", "date <= date"), mult = "last")
Cross join
Description
Cross join of x and y
Usage
fjoin_cross(
x = NULL,
y = NULL,
order = "left",
select = NULL,
select.x = NULL,
select.y = NULL,
prefix.y = "R.",
do = !(is.null(x) && is.null(y)),
show = !do
)
Arguments
x, y |
|
order |
Whether the row order of the result should reflect |
select, select.x, select.y |
Character vectors of columns to be selected
from either input if present ( |
prefix.y |
A prefix to attach to column names in |
do |
Whether to execute the join. If |
show |
Whether to print the data.table code for the join to the
console. Default is the opposite of |
Details
Details are as for e.g. fjoin_inner except for remarks
about join columns and matching logic, which do not apply.
Value
A data.frame, data.table, (grouped) tibble, sf,
or sf-tibble, or else NULL if do is FALSE. See
Details.
See Also
See the package-level documentation fjoin for related
functions.
Examples
# data frames
df1 <- data.table::fread(data.table = FALSE, input = "
bread kcal
Brown 150
White 180
Baguette 250
")
df2 <- data.table::fread(data.table = FALSE, input = "
filling kcal
Cheese 200
Pâté 160
")
fjoin_cross(df1, df2)
fjoin_cross(df1, df2, order = "right")
Full join
Description
Full join of x and y
Usage
fjoin_full(
x = NULL,
y = NULL,
on,
match.na = FALSE,
mult.x = "all",
mult.y = "all",
on.first = FALSE,
order = "left",
select = NULL,
select.x = NULL,
select.y = NULL,
indicate = FALSE,
prefix.y = "R.",
both = FALSE,
do = !(is.null(x) && is.null(y)),
show = !do
)
Arguments
x, y |
|
on |
A character vector of join predicates, e.g. |
match.na |
Whether to allow equality matches between |
mult.x, mult.y |
When a row of |
on.first |
Whether to place the join columns first in the join result.
Default |
order |
Whether the row order of the result should reflect |
select, select.x, select.y |
Character vectors of columns to be selected
from either input if present ( |
indicate |
Whether to add a column |
prefix.y |
A prefix to attach to column names in |
both |
Whether to include |
do |
Whether to execute the join. If |
show |
Whether to print the data.table code for the join to the
console. Default is the opposite of |
Details
Input and output class
Each input can be any object with class data.frame, or a plain
list of same-length vectors.
The output class depends on x as follows:
a
data.tableifxis a puredata.tablea tibble if it is a tibble (and a grouped tibble if it has class
grouped_df)an
sfif it is ansfwith its active geometry selected in the outputa plain
data.framein all other cases
The following attributes are carried through and refreshed: data.table
key, tibble groups, sf agr (and bbox etc. of all
individual sfc-class columns regardless of output class). See below
for specifics.
Specifying join conditions with on
on is a required argument. For a natural join (a join by equality on
all same-named column pairs), you must specify on = NA; you can't just
omit on as in other packages. This is to prevent a natural join being
specified by mistake, which may then go unnoticed.
Using select, select.x, and select.y
Used on its own, select keeps the join columns plus the
specified non-join columns from both inputs if present.
If select.x is provided (and similarly for select.y) then:
if
selectis also specified, non-join columns ofxnamed in eitherselectorselect.xare includedif
selectis not specified, only non-join columns named inselect.xare included fromx. Thus e.g.select.x = ""excludes all ofx's non-join columns.
Non-existent column names are ignored without warning.
Column order
When select is specified but select.x and select.y are
not, the output consists of all join columns followed by the selected
non-join columns from either input in the order given in select.
In all other cases:
columns from
xcome before columns fromywithin each group of columns, non-join columns are in the order given by
select.x/select.y, or in their original data order if no selection is providedif
on.firstisTRUE, join columns from both inputs are moved to the front of the overall output.
Using mult.x and mult.y
See the Examples for an application of using mult.x and mult.y
together. Note that mult.y is applied after mult.x except with
order = "right".
Displaying code and 'mock joins'
The option of displaying the join code with show = TRUE or by passing
null inputs is aimed at data.table users wanting to use the package as
a cookbook of recipes for adaptation. If x and y are both
NULL, template code is displayed based on join column names implied by
on, plus sample non-join column names. select arguments are
ignored in this case.
The code displayed is for the join operation after casting the inputs as
data.tables if necessary, and before casting the result as a tibble
and/or sf if applicable. Note that fjoin departs from the usual
j = list() idiom in order to avoid a deep copy of the output made by
as.data.table.list. (Likewise, internally it takes only shallow copies
of columns when casting inputs or outputs to different classes.)
tibble groups
If x is a grouped tibble (class grouped_df), the
output is grouped by the grouping columns that are selected in the result.
data.table keys
If the output is a data.table, it inherits a key as follows:
-
fjoin_innerorfjoin_leftwithorder = "left"(default):x'skeyif present -
fjoin_innerorfjoin_rightwithorder = "right":y'skeyif present
If not all of the key columns are selected in the result, the leading subset is used.
sf objects and sfc-class columns
Joins between two sf objects are supported. The active geometry and
relation-to-geometry attribute agr are determined by x. All
sfc-class columns in the output are refreshed after joining (using
sf::st_sfc() with recompute_bbox = TRUE); this is true
regardless of whether or not the inputs and output are sfs.
Value
A data.frame, data.table, (grouped) tibble, sf,
or sf-tibble, or else NULL if do is FALSE. See
Details.
See Also
See the package-level documentation fjoin for related
functions.
Examples
# ---------------------------------------------------------------------------
# True joins (inner/left/right/full): basic usage
# ---------------------------------------------------------------------------
# data frames
x <- data.table::fread(data.table = FALSE, input = "
country pop_m
Australia 27.2
Brazil 212.0
Chad 3.0
")
y <- data.table::fread(data.table = FALSE, input = "
country forest_pc
Brazil 59.1
Chad 3.2
Denmark 15.8
")
# ---------------------------------------------------------------------------
# `indicate = TRUE` adds a front column ".join" indicating whether a row is
# from `x` only (1L), from `y` only (2L), or joined from both (3L)
fjoin_full(x, y, on = "country", indicate = TRUE)
fjoin_left(x, y, on = "country", indicate = TRUE)
fjoin_right(x, y, on = "country", indicate = TRUE)
fjoin_inner(x, y, on = "country", indicate = TRUE)
# ---------------------------------------------------------------------------
# Core options and arguments (in a 1:1 equality join with fjoin_full())
# ---------------------------------------------------------------------------
# data frames
dfQ <- data.table::fread(data.table = FALSE, quote ="'", input = "
id quantity notes other_cols
2 5 '' ...
1 6 '' ...
3 7 '' ...
NA 8 'oranges (not listed)' ...
")
dfP <- data.table::fread(data.table = FALSE, input = "
id item price other_cols
NA apples 10 ...
3 bananas 20 ...
2 cherries 30 ...
1 dates 40 ...
")
# ---------------------------------------------------------------------------
# (1) basic syntax
# cf. dplyr: full_join(dfQ, dfP, join_by(id), na.matches = "never")
fjoin_full(dfQ, dfP, on = "id")
# (2) join-select in one line
fjoin_full(dfQ, dfP, on = "id", select = c("item", "price", "quantity"))
# equivalent operation in dplyr
# x <- dfQ |> select(id, quantity)
# y <- dfP |> select(id, item, price)
# full_join(x, y, join_by(id), na.matches = "never") |>
# select(id, item, price, quantity)
# ---------------------------------------------------------------------------
# (an aside) equality matches on NA if you insist
fjoin_full(dfQ, dfP, on = "id", select = c("item", "price", "quantity", "notes"), match.na = TRUE)
# (3) indicator column (in Stata since 1984)
fjoin_full(
dfQ,
dfP,
on = "id",
select = c("item", "price", "quantity"),
indicate = TRUE
)
# (4) order rows by y then x
fjoin_full(
dfQ,
dfP,
on = "id",
select = c("item", "price", "quantity"),
indicate = TRUE,
order = "right"
)
# (5) display code instead
fjoin_full(
dfQ,
dfP,
on = "id",
select = c("item", "price", "quantity"),
indicate = TRUE,
order = "right",
do = FALSE
)
# ---------------------------------------------------------------------------
# M:M inequality join reduced to 1:1 using `mult.x` and `mult.y`
# ---------------------------------------------------------------------------
# data.table (`mult`) and dplyr (`multiple`) have options for reducing the
# cardinality on one side of the join from many ("all") to one ("first" or
# "last"). fjoin (`mult.x`, `mult.y`) permits this on either side of the
# join, or on both sides at once.
# This example (using `fjoin_left()`) shows an application to temporally
# ordered data frames of "events" and "reactions".
# data frames
events <- data.table::fread(data.table = FALSE, input = "
event_id event_ts
1 10
2 20
3 40
")
reactions <- data.table::fread(data.table = FALSE, input = "
reaction_id reaction_ts
1 30
2 50
3 60
")
# ---------------------------------------------------------------------------
# (1) for each event, all subsequent reactions (M:M)
fjoin_left(
events,
reactions,
on = c("event_ts < reaction_ts"),
)
# (2) for each event, the next reaction (1:M)
fjoin_left(
events,
reactions,
on = c("event_ts < reaction_ts"),
mult.x = "first"
)
# (3) for each event, the next reaction, provided there was no intervening event (1:1)
fjoin_left(
events,
reactions,
on = c("event_ts < reaction_ts"),
mult.x = "first",
mult.y = "last"
)
# ---------------------------------------------------------------------------
# Natural join
# ---------------------------------------------------------------------------
fjoin_inner(x, y, on = NA) # note `NA` not `NULL`/omitted
try(fjoin_left(x, y)) # to prevent accidental natural joins
# ---------------------------------------------------------------------------
# Mock join (code "ghostwriter" for data.table users)
# ---------------------------------------------------------------------------
fjoin_inner(on = c("id"))
Inner join
Description
Inner join of x and y
Usage
fjoin_inner(
x = NULL,
y = NULL,
on,
match.na = FALSE,
mult.x = "all",
mult.y = "all",
order = "left",
select = NULL,
select.x = NULL,
select.y = NULL,
indicate = FALSE,
prefix.y = "R.",
on.first = FALSE,
both = FALSE,
do = !(is.null(x) && is.null(y)),
show = !do
)
Arguments
x, y |
|
on |
A character vector of join predicates, e.g. |
match.na |
Whether to allow equality matches between |
mult.x, mult.y |
When a row of |
order |
Whether the row order of the result should reflect |
select, select.x, select.y |
Character vectors of columns to be selected
from either input if present ( |
indicate |
Whether to add a column |
prefix.y |
A prefix to attach to column names in |
on.first |
Whether to place the join columns first in the join result.
Default |
both |
Whether to include |
do |
Whether to execute the join. If |
show |
Whether to print the data.table code for the join to the
console. Default is the opposite of |
Details
Input and output class
Each input can be any object with class data.frame, or a plain
list of same-length vectors.
The output class depends on x as follows:
a
data.tableifxis a puredata.tablea tibble if it is a tibble (and a grouped tibble if it has class
grouped_df)an
sfif it is ansfwith its active geometry selected in the outputa plain
data.framein all other cases
The following attributes are carried through and refreshed: data.table
key, tibble groups, sf agr (and bbox etc. of all
individual sfc-class columns regardless of output class). See below
for specifics.
Specifying join conditions with on
on is a required argument. For a natural join (a join by equality on
all same-named column pairs), you must specify on = NA; you can't just
omit on as in other packages. This is to prevent a natural join being
specified by mistake, which may then go unnoticed.
Using select, select.x, and select.y
Used on its own, select keeps the join columns plus the
specified non-join columns from both inputs if present.
If select.x is provided (and similarly for select.y) then:
if
selectis also specified, non-join columns ofxnamed in eitherselectorselect.xare includedif
selectis not specified, only non-join columns named inselect.xare included fromx. Thus e.g.select.x = ""excludes all ofx's non-join columns.
Non-existent column names are ignored without warning.
Column order
When select is specified but select.x and select.y are
not, the output consists of all join columns followed by the selected
non-join columns from either input in the order given in select.
In all other cases:
columns from
xcome before columns fromywithin each group of columns, non-join columns are in the order given by
select.x/select.y, or in their original data order if no selection is providedif
on.firstisTRUE, join columns from both inputs are moved to the front of the overall output.
Using mult.x and mult.y
See the Examples for an application of using mult.x and mult.y
together. Note that mult.y is applied after mult.x except with
order = "right".
Displaying code and 'mock joins'
The option of displaying the join code with show = TRUE or by passing
null inputs is aimed at data.table users wanting to use the package as
a cookbook of recipes for adaptation. If x and y are both
NULL, template code is displayed based on join column names implied by
on, plus sample non-join column names. select arguments are
ignored in this case.
The code displayed is for the join operation after casting the inputs as
data.tables if necessary, and before casting the result as a tibble
and/or sf if applicable. Note that fjoin departs from the usual
j = list() idiom in order to avoid a deep copy of the output made by
as.data.table.list. (Likewise, internally it takes only shallow copies
of columns when casting inputs or outputs to different classes.)
tibble groups
If x is a grouped tibble (class grouped_df), the
output is grouped by the grouping columns that are selected in the result.
data.table keys
If the output is a data.table, it inherits a key as follows:
-
fjoin_innerorfjoin_leftwithorder = "left"(default):x'skeyif present -
fjoin_innerorfjoin_rightwithorder = "right":y'skeyif present
If not all of the key columns are selected in the result, the leading subset is used.
sf objects and sfc-class columns
Joins between two sf objects are supported. The active geometry and
relation-to-geometry attribute agr are determined by x. All
sfc-class columns in the output are refreshed after joining (using
sf::st_sfc() with recompute_bbox = TRUE); this is true
regardless of whether or not the inputs and output are sfs.
Value
A data.frame, data.table, (grouped) tibble, sf,
or sf-tibble, or else NULL if do is FALSE. See
Details.
See Also
See the package-level documentation fjoin for related
functions.
Examples
# ---------------------------------------------------------------------------
# True joins (inner/left/right/full): basic usage
# ---------------------------------------------------------------------------
# data frames
x <- data.table::fread(data.table = FALSE, input = "
country pop_m
Australia 27.2
Brazil 212.0
Chad 3.0
")
y <- data.table::fread(data.table = FALSE, input = "
country forest_pc
Brazil 59.1
Chad 3.2
Denmark 15.8
")
# ---------------------------------------------------------------------------
# `indicate = TRUE` adds a front column ".join" indicating whether a row is
# from `x` only (1L), from `y` only (2L), or joined from both (3L)
fjoin_full(x, y, on = "country", indicate = TRUE)
fjoin_left(x, y, on = "country", indicate = TRUE)
fjoin_right(x, y, on = "country", indicate = TRUE)
fjoin_inner(x, y, on = "country", indicate = TRUE)
# ---------------------------------------------------------------------------
# Core options and arguments (in a 1:1 equality join with fjoin_full())
# ---------------------------------------------------------------------------
# data frames
dfQ <- data.table::fread(data.table = FALSE, quote ="'", input = "
id quantity notes other_cols
2 5 '' ...
1 6 '' ...
3 7 '' ...
NA 8 'oranges (not listed)' ...
")
dfP <- data.table::fread(data.table = FALSE, input = "
id item price other_cols
NA apples 10 ...
3 bananas 20 ...
2 cherries 30 ...
1 dates 40 ...
")
# ---------------------------------------------------------------------------
# (1) basic syntax
# cf. dplyr: full_join(dfQ, dfP, join_by(id), na.matches = "never")
fjoin_full(dfQ, dfP, on = "id")
# (2) join-select in one line
fjoin_full(dfQ, dfP, on = "id", select = c("item", "price", "quantity"))
# equivalent operation in dplyr
# x <- dfQ |> select(id, quantity)
# y <- dfP |> select(id, item, price)
# full_join(x, y, join_by(id), na.matches = "never") |>
# select(id, item, price, quantity)
# ---------------------------------------------------------------------------
# (an aside) equality matches on NA if you insist
fjoin_full(dfQ, dfP, on = "id", select = c("item", "price", "quantity", "notes"), match.na = TRUE)
# (3) indicator column (in Stata since 1984)
fjoin_full(
dfQ,
dfP,
on = "id",
select = c("item", "price", "quantity"),
indicate = TRUE
)
# (4) order rows by y then x
fjoin_full(
dfQ,
dfP,
on = "id",
select = c("item", "price", "quantity"),
indicate = TRUE,
order = "right"
)
# (5) display code instead
fjoin_full(
dfQ,
dfP,
on = "id",
select = c("item", "price", "quantity"),
indicate = TRUE,
order = "right",
do = FALSE
)
# ---------------------------------------------------------------------------
# M:M inequality join reduced to 1:1 using `mult.x` and `mult.y`
# ---------------------------------------------------------------------------
# data.table (`mult`) and dplyr (`multiple`) have options for reducing the
# cardinality on one side of the join from many ("all") to one ("first" or
# "last"). fjoin (`mult.x`, `mult.y`) permits this on either side of the
# join, or on both sides at once.
# This example (using `fjoin_left()`) shows an application to temporally
# ordered data frames of "events" and "reactions".
# data frames
events <- data.table::fread(data.table = FALSE, input = "
event_id event_ts
1 10
2 20
3 40
")
reactions <- data.table::fread(data.table = FALSE, input = "
reaction_id reaction_ts
1 30
2 50
3 60
")
# ---------------------------------------------------------------------------
# (1) for each event, all subsequent reactions (M:M)
fjoin_left(
events,
reactions,
on = c("event_ts < reaction_ts"),
)
# (2) for each event, the next reaction (1:M)
fjoin_left(
events,
reactions,
on = c("event_ts < reaction_ts"),
mult.x = "first"
)
# (3) for each event, the next reaction, provided there was no intervening event (1:1)
fjoin_left(
events,
reactions,
on = c("event_ts < reaction_ts"),
mult.x = "first",
mult.y = "last"
)
# ---------------------------------------------------------------------------
# Natural join
# ---------------------------------------------------------------------------
fjoin_inner(x, y, on = NA) # note `NA` not `NULL`/omitted
try(fjoin_left(x, y)) # to prevent accidental natural joins
# ---------------------------------------------------------------------------
# Mock join (code "ghostwriter" for data.table users)
# ---------------------------------------------------------------------------
fjoin_inner(on = c("id"))
Left join
Description
Left join of x and y
Usage
fjoin_left(
x = NULL,
y = NULL,
on,
match.na = FALSE,
mult.x = "all",
mult.y = "all",
order = "left",
select = NULL,
select.x = NULL,
select.y = NULL,
indicate = FALSE,
prefix.y = "R.",
on.first = FALSE,
both = FALSE,
do = !(is.null(x) && is.null(y)),
show = !do
)
Arguments
x, y |
|
on |
A character vector of join predicates, e.g. |
match.na |
Whether to allow equality matches between |
mult.x, mult.y |
When a row of |
order |
Whether the row order of the result should reflect |
select, select.x, select.y |
Character vectors of columns to be selected
from either input if present ( |
indicate |
Whether to add a column |
prefix.y |
A prefix to attach to column names in |
on.first |
Whether to place the join columns first in the join result.
Default |
both |
Whether to include |
do |
Whether to execute the join. If |
show |
Whether to print the data.table code for the join to the
console. Default is the opposite of |
Details
Input and output class
Each input can be any object with class data.frame, or a plain
list of same-length vectors.
The output class depends on x as follows:
a
data.tableifxis a puredata.tablea tibble if it is a tibble (and a grouped tibble if it has class
grouped_df)an
sfif it is ansfwith its active geometry selected in the outputa plain
data.framein all other cases
The following attributes are carried through and refreshed: data.table
key, tibble groups, sf agr (and bbox etc. of all
individual sfc-class columns regardless of output class). See below
for specifics.
Specifying join conditions with on
on is a required argument. For a natural join (a join by equality on
all same-named column pairs), you must specify on = NA; you can't just
omit on as in other packages. This is to prevent a natural join being
specified by mistake, which may then go unnoticed.
Using select, select.x, and select.y
Used on its own, select keeps the join columns plus the
specified non-join columns from both inputs if present.
If select.x is provided (and similarly for select.y) then:
if
selectis also specified, non-join columns ofxnamed in eitherselectorselect.xare includedif
selectis not specified, only non-join columns named inselect.xare included fromx. Thus e.g.select.x = ""excludes all ofx's non-join columns.
Non-existent column names are ignored without warning.
Column order
When select is specified but select.x and select.y are
not, the output consists of all join columns followed by the selected
non-join columns from either input in the order given in select.
In all other cases:
columns from
xcome before columns fromywithin each group of columns, non-join columns are in the order given by
select.x/select.y, or in their original data order if no selection is providedif
on.firstisTRUE, join columns from both inputs are moved to the front of the overall output.
Using mult.x and mult.y
See the Examples for an application of using mult.x and mult.y
together. Note that mult.y is applied after mult.x except with
order = "right".
Displaying code and 'mock joins'
The option of displaying the join code with show = TRUE or by passing
null inputs is aimed at data.table users wanting to use the package as
a cookbook of recipes for adaptation. If x and y are both
NULL, template code is displayed based on join column names implied by
on, plus sample non-join column names. select arguments are
ignored in this case.
The code displayed is for the join operation after casting the inputs as
data.tables if necessary, and before casting the result as a tibble
and/or sf if applicable. Note that fjoin departs from the usual
j = list() idiom in order to avoid a deep copy of the output made by
as.data.table.list. (Likewise, internally it takes only shallow copies
of columns when casting inputs or outputs to different classes.)
tibble groups
If x is a grouped tibble (class grouped_df), the
output is grouped by the grouping columns that are selected in the result.
data.table keys
If the output is a data.table, it inherits a key as follows:
-
fjoin_innerorfjoin_leftwithorder = "left"(default):x'skeyif present -
fjoin_innerorfjoin_rightwithorder = "right":y'skeyif present
If not all of the key columns are selected in the result, the leading subset is used.
sf objects and sfc-class columns
Joins between two sf objects are supported. The active geometry and
relation-to-geometry attribute agr are determined by x. All
sfc-class columns in the output are refreshed after joining (using
sf::st_sfc() with recompute_bbox = TRUE); this is true
regardless of whether or not the inputs and output are sfs.
Value
A data.frame, data.table, (grouped) tibble, sf,
or sf-tibble, or else NULL if do is FALSE. See
Details.
See Also
See the package-level documentation fjoin for related
functions.
Examples
# ---------------------------------------------------------------------------
# True joins (inner/left/right/full): basic usage
# ---------------------------------------------------------------------------
# data frames
x <- data.table::fread(data.table = FALSE, input = "
country pop_m
Australia 27.2
Brazil 212.0
Chad 3.0
")
y <- data.table::fread(data.table = FALSE, input = "
country forest_pc
Brazil 59.1
Chad 3.2
Denmark 15.8
")
# ---------------------------------------------------------------------------
# `indicate = TRUE` adds a front column ".join" indicating whether a row is
# from `x` only (1L), from `y` only (2L), or joined from both (3L)
fjoin_full(x, y, on = "country", indicate = TRUE)
fjoin_left(x, y, on = "country", indicate = TRUE)
fjoin_right(x, y, on = "country", indicate = TRUE)
fjoin_inner(x, y, on = "country", indicate = TRUE)
# ---------------------------------------------------------------------------
# Core options and arguments (in a 1:1 equality join with fjoin_full())
# ---------------------------------------------------------------------------
# data frames
dfQ <- data.table::fread(data.table = FALSE, quote ="'", input = "
id quantity notes other_cols
2 5 '' ...
1 6 '' ...
3 7 '' ...
NA 8 'oranges (not listed)' ...
")
dfP <- data.table::fread(data.table = FALSE, input = "
id item price other_cols
NA apples 10 ...
3 bananas 20 ...
2 cherries 30 ...
1 dates 40 ...
")
# ---------------------------------------------------------------------------
# (1) basic syntax
# cf. dplyr: full_join(dfQ, dfP, join_by(id), na.matches = "never")
fjoin_full(dfQ, dfP, on = "id")
# (2) join-select in one line
fjoin_full(dfQ, dfP, on = "id", select = c("item", "price", "quantity"))
# equivalent operation in dplyr
# x <- dfQ |> select(id, quantity)
# y <- dfP |> select(id, item, price)
# full_join(x, y, join_by(id), na.matches = "never") |>
# select(id, item, price, quantity)
# ---------------------------------------------------------------------------
# (an aside) equality matches on NA if you insist
fjoin_full(dfQ, dfP, on = "id", select = c("item", "price", "quantity", "notes"), match.na = TRUE)
# (3) indicator column (in Stata since 1984)
fjoin_full(
dfQ,
dfP,
on = "id",
select = c("item", "price", "quantity"),
indicate = TRUE
)
# (4) order rows by y then x
fjoin_full(
dfQ,
dfP,
on = "id",
select = c("item", "price", "quantity"),
indicate = TRUE,
order = "right"
)
# (5) display code instead
fjoin_full(
dfQ,
dfP,
on = "id",
select = c("item", "price", "quantity"),
indicate = TRUE,
order = "right",
do = FALSE
)
# ---------------------------------------------------------------------------
# M:M inequality join reduced to 1:1 using `mult.x` and `mult.y`
# ---------------------------------------------------------------------------
# data.table (`mult`) and dplyr (`multiple`) have options for reducing the
# cardinality on one side of the join from many ("all") to one ("first" or
# "last"). fjoin (`mult.x`, `mult.y`) permits this on either side of the
# join, or on both sides at once.
# This example (using `fjoin_left()`) shows an application to temporally
# ordered data frames of "events" and "reactions".
# data frames
events <- data.table::fread(data.table = FALSE, input = "
event_id event_ts
1 10
2 20
3 40
")
reactions <- data.table::fread(data.table = FALSE, input = "
reaction_id reaction_ts
1 30
2 50
3 60
")
# ---------------------------------------------------------------------------
# (1) for each event, all subsequent reactions (M:M)
fjoin_left(
events,
reactions,
on = c("event_ts < reaction_ts"),
)
# (2) for each event, the next reaction (1:M)
fjoin_left(
events,
reactions,
on = c("event_ts < reaction_ts"),
mult.x = "first"
)
# (3) for each event, the next reaction, provided there was no intervening event (1:1)
fjoin_left(
events,
reactions,
on = c("event_ts < reaction_ts"),
mult.x = "first",
mult.y = "last"
)
# ---------------------------------------------------------------------------
# Natural join
# ---------------------------------------------------------------------------
fjoin_inner(x, y, on = NA) # note `NA` not `NULL`/omitted
try(fjoin_left(x, y)) # to prevent accidental natural joins
# ---------------------------------------------------------------------------
# Mock join (code "ghostwriter" for data.table users)
# ---------------------------------------------------------------------------
fjoin_inner(on = c("id"))
Left anti-join
Description
The anti-join of x in a join of x and y, i.e. the rows
of x that do not join. The alias fjoin_anti can be used
instead.
Usage
fjoin_left_anti(
x = NULL,
y = NULL,
on,
match.na = FALSE,
mult.x = "all",
mult.y = "all",
select = NULL,
do = !(is.null(x) && is.null(y)),
show = !do
)
fjoin_anti(
x = NULL,
y = NULL,
on,
match.na = FALSE,
mult.x = "all",
mult.y = "all",
select = NULL,
do = !(is.null(x) && is.null(y)),
show = !do
)
Arguments
x, y |
|
on |
A character vector of join predicates, e.g. |
match.na |
Whether to allow equality matches between |
mult.x, mult.y |
When a row of |
select |
Character vector of non-join columns to be selected from
|
do |
Whether to execute the join. If |
show |
Whether to print the data.table code for the join to the
console. Default is the opposite of |
Details
Details are as for fjoin_inner except for arguments controlling
the order and prefixing of output columns, which do not apply. Output class
is determined by x.
Value
A data.frame, data.table, (grouped) tibble, sf,
or sf-tibble, or else NULL if do is FALSE. See
Details.
See Also
See the package-level documentation fjoin for related
functions.
Examples
# ---------------------------------------------------------------------------
# Semi- and anti-joins: basic usage
# ---------------------------------------------------------------------------
# data frames
x <- data.table::fread(data.table = FALSE, input = "
country pop_m
Australia 27.2
Brazil 212.0
Chad 3.0
")
y <- data.table::fread(data.table = FALSE, input = "
country forest_pc
Brazil 59.1
Chad 3.2
Denmark 15.8
")
# full join with `indicate = TRUE` for comparison
fjoin_full(x, y, on = "country", indicate = TRUE)
fjoin_semi(x, y, on = "country")
fjoin_anti(x, y, on = "country")
fjoin_right_semi(x, y, on = "country")
fjoin_right_anti(x, y, on = "country")
# ---------------------------------------------------------------------------
# `mult.x` and `mult.y` support
# ---------------------------------------------------------------------------
# data frames
events <- data.table::fread(data.table = FALSE, input = "
event_id event_ts
1 10
2 20
3 40
")
reactions <- data.table::fread(data.table = FALSE, input = "
reaction_id reaction_ts
1 30
2 50
3 60
")
# ---------------------------------------------------------------------------
# for each event, the next reaction, provided there was no intervening event (1:1)
fjoin_full(
events,
reactions,
on = c("event_ts < reaction_ts"),
mult.x = "first",
mult.y = "last",
indicate = TRUE
)
fjoin_semi(
events,
reactions,
on = c("event_ts < reaction_ts"),
mult.x = "first",
mult.y = "last"
)
fjoin_anti(
events,
reactions,
on = c("event_ts < reaction_ts"),
mult.x = "first",
mult.y = "last"
)
# ---------------------------------------------------------------------------
# Natural join
# ---------------------------------------------------------------------------
fjoin_semi(x, y, on = NA)
fjoin_anti(x, y, on = NA)
# ---------------------------------------------------------------------------
# Mock join
# ---------------------------------------------------------------------------
fjoin_semi(on="id")
fjoin_semi(on=c("id", "date"))
fjoin_semi(on=c("id"), mult.y = "last")
Left semi-join
Description
The semi-join of x in a join of x and y, i.e. the rows
of x that join at least once. The alias fjoin_semi can be
used instead.
Usage
fjoin_left_semi(
x = NULL,
y = NULL,
on,
match.na = FALSE,
mult.x = "all",
mult.y = "all",
select = NULL,
do = !(is.null(x) && is.null(y)),
show = !do
)
fjoin_semi(
x = NULL,
y = NULL,
on,
match.na = FALSE,
mult.x = "all",
mult.y = "all",
select = NULL,
do = !(is.null(x) && is.null(y)),
show = !do
)
Arguments
x, y |
|
on |
A character vector of join predicates, e.g. |
match.na |
Whether to allow equality matches between |
mult.x, mult.y |
When a row of |
select |
Character vector of non-join columns to be selected from
|
do |
Whether to execute the join. If |
show |
Whether to print the data.table code for the join to the
console. Default is the opposite of |
Details
Details are as for e.g. fjoin_inner except for arguments
controlling the order and prefixing of output columns, which do not apply.
Output class is determined by x.
Value
A data.frame, data.table, (grouped) tibble, sf,
or sf-tibble, or else NULL if do is FALSE. See
Details.
See Also
See the package-level documentation fjoin for related
functions.
Examples
# ---------------------------------------------------------------------------
# Semi- and anti-joins: basic usage
# ---------------------------------------------------------------------------
# data frames
x <- data.table::fread(data.table = FALSE, input = "
country pop_m
Australia 27.2
Brazil 212.0
Chad 3.0
")
y <- data.table::fread(data.table = FALSE, input = "
country forest_pc
Brazil 59.1
Chad 3.2
Denmark 15.8
")
# full join with `indicate = TRUE` for comparison
fjoin_full(x, y, on = "country", indicate = TRUE)
fjoin_semi(x, y, on = "country")
fjoin_anti(x, y, on = "country")
fjoin_right_semi(x, y, on = "country")
fjoin_right_anti(x, y, on = "country")
# ---------------------------------------------------------------------------
# `mult.x` and `mult.y` support
# ---------------------------------------------------------------------------
# data frames
events <- data.table::fread(data.table = FALSE, input = "
event_id event_ts
1 10
2 20
3 40
")
reactions <- data.table::fread(data.table = FALSE, input = "
reaction_id reaction_ts
1 30
2 50
3 60
")
# ---------------------------------------------------------------------------
# for each event, the next reaction, provided there was no intervening event (1:1)
fjoin_full(
events,
reactions,
on = c("event_ts < reaction_ts"),
mult.x = "first",
mult.y = "last",
indicate = TRUE
)
fjoin_semi(
events,
reactions,
on = c("event_ts < reaction_ts"),
mult.x = "first",
mult.y = "last"
)
fjoin_anti(
events,
reactions,
on = c("event_ts < reaction_ts"),
mult.x = "first",
mult.y = "last"
)
# ---------------------------------------------------------------------------
# Natural join
# ---------------------------------------------------------------------------
fjoin_semi(x, y, on = NA)
fjoin_anti(x, y, on = NA)
# ---------------------------------------------------------------------------
# Mock join
# ---------------------------------------------------------------------------
fjoin_semi(on="id")
fjoin_semi(on=c("id", "date"))
fjoin_semi(on=c("id"), mult.y = "last")
Right join
Description
Right join of x and y
Usage
fjoin_right(
x = NULL,
y = NULL,
on,
match.na = FALSE,
mult.x = "all",
mult.y = "all",
indicate = FALSE,
order = "left",
select = NULL,
select.x = NULL,
select.y = NULL,
prefix.y = "R.",
on.first = FALSE,
both = FALSE,
do = !(is.null(x) && is.null(y)),
show = !do
)
Arguments
x, y |
|
on |
A character vector of join predicates, e.g. |
match.na |
Whether to allow equality matches between |
mult.x, mult.y |
When a row of |
indicate |
Whether to add a column |
order |
Whether the row order of the result should reflect |
select, select.x, select.y |
Character vectors of columns to be selected
from either input if present ( |
prefix.y |
A prefix to attach to column names in |
on.first |
Whether to place the join columns first in the join result.
Default |
both |
Whether to include |
do |
Whether to execute the join. If |
show |
Whether to print the data.table code for the join to the
console. Default is the opposite of |
Details
Input and output class
Each input can be any object with class data.frame, or a plain
list of same-length vectors.
The output class depends on x as follows:
a
data.tableifxis a puredata.tablea tibble if it is a tibble (and a grouped tibble if it has class
grouped_df)an
sfif it is ansfwith its active geometry selected in the outputa plain
data.framein all other cases
The following attributes are carried through and refreshed: data.table
key, tibble groups, sf agr (and bbox etc. of all
individual sfc-class columns regardless of output class). See below
for specifics.
Specifying join conditions with on
on is a required argument. For a natural join (a join by equality on
all same-named column pairs), you must specify on = NA; you can't just
omit on as in other packages. This is to prevent a natural join being
specified by mistake, which may then go unnoticed.
Using select, select.x, and select.y
Used on its own, select keeps the join columns plus the
specified non-join columns from both inputs if present.
If select.x is provided (and similarly for select.y) then:
if
selectis also specified, non-join columns ofxnamed in eitherselectorselect.xare includedif
selectis not specified, only non-join columns named inselect.xare included fromx. Thus e.g.select.x = ""excludes all ofx's non-join columns.
Non-existent column names are ignored without warning.
Column order
When select is specified but select.x and select.y are
not, the output consists of all join columns followed by the selected
non-join columns from either input in the order given in select.
In all other cases:
columns from
xcome before columns fromywithin each group of columns, non-join columns are in the order given by
select.x/select.y, or in their original data order if no selection is providedif
on.firstisTRUE, join columns from both inputs are moved to the front of the overall output.
Using mult.x and mult.y
See the Examples for an application of using mult.x and mult.y
together. Note that mult.y is applied after mult.x except with
order = "right".
Displaying code and 'mock joins'
The option of displaying the join code with show = TRUE or by passing
null inputs is aimed at data.table users wanting to use the package as
a cookbook of recipes for adaptation. If x and y are both
NULL, template code is displayed based on join column names implied by
on, plus sample non-join column names. select arguments are
ignored in this case.
The code displayed is for the join operation after casting the inputs as
data.tables if necessary, and before casting the result as a tibble
and/or sf if applicable. Note that fjoin departs from the usual
j = list() idiom in order to avoid a deep copy of the output made by
as.data.table.list. (Likewise, internally it takes only shallow copies
of columns when casting inputs or outputs to different classes.)
tibble groups
If x is a grouped tibble (class grouped_df), the
output is grouped by the grouping columns that are selected in the result.
data.table keys
If the output is a data.table, it inherits a key as follows:
-
fjoin_innerorfjoin_leftwithorder = "left"(default):x'skeyif present -
fjoin_innerorfjoin_rightwithorder = "right":y'skeyif present
If not all of the key columns are selected in the result, the leading subset is used.
sf objects and sfc-class columns
Joins between two sf objects are supported. The active geometry and
relation-to-geometry attribute agr are determined by x. All
sfc-class columns in the output are refreshed after joining (using
sf::st_sfc() with recompute_bbox = TRUE); this is true
regardless of whether or not the inputs and output are sfs.
Value
A data.frame, data.table, (grouped) tibble, sf,
or sf-tibble, or else NULL if do is FALSE. See
Details.
See Also
See the package-level documentation fjoin for related
functions.
Examples
# ---------------------------------------------------------------------------
# True joins (inner/left/right/full): basic usage
# ---------------------------------------------------------------------------
# data frames
x <- data.table::fread(data.table = FALSE, input = "
country pop_m
Australia 27.2
Brazil 212.0
Chad 3.0
")
y <- data.table::fread(data.table = FALSE, input = "
country forest_pc
Brazil 59.1
Chad 3.2
Denmark 15.8
")
# ---------------------------------------------------------------------------
# `indicate = TRUE` adds a front column ".join" indicating whether a row is
# from `x` only (1L), from `y` only (2L), or joined from both (3L)
fjoin_full(x, y, on = "country", indicate = TRUE)
fjoin_left(x, y, on = "country", indicate = TRUE)
fjoin_right(x, y, on = "country", indicate = TRUE)
fjoin_inner(x, y, on = "country", indicate = TRUE)
# ---------------------------------------------------------------------------
# Core options and arguments (in a 1:1 equality join with fjoin_full())
# ---------------------------------------------------------------------------
# data frames
dfQ <- data.table::fread(data.table = FALSE, quote ="'", input = "
id quantity notes other_cols
2 5 '' ...
1 6 '' ...
3 7 '' ...
NA 8 'oranges (not listed)' ...
")
dfP <- data.table::fread(data.table = FALSE, input = "
id item price other_cols
NA apples 10 ...
3 bananas 20 ...
2 cherries 30 ...
1 dates 40 ...
")
# ---------------------------------------------------------------------------
# (1) basic syntax
# cf. dplyr: full_join(dfQ, dfP, join_by(id), na.matches = "never")
fjoin_full(dfQ, dfP, on = "id")
# (2) join-select in one line
fjoin_full(dfQ, dfP, on = "id", select = c("item", "price", "quantity"))
# equivalent operation in dplyr
# x <- dfQ |> select(id, quantity)
# y <- dfP |> select(id, item, price)
# full_join(x, y, join_by(id), na.matches = "never") |>
# select(id, item, price, quantity)
# ---------------------------------------------------------------------------
# (an aside) equality matches on NA if you insist
fjoin_full(dfQ, dfP, on = "id", select = c("item", "price", "quantity", "notes"), match.na = TRUE)
# (3) indicator column (in Stata since 1984)
fjoin_full(
dfQ,
dfP,
on = "id",
select = c("item", "price", "quantity"),
indicate = TRUE
)
# (4) order rows by y then x
fjoin_full(
dfQ,
dfP,
on = "id",
select = c("item", "price", "quantity"),
indicate = TRUE,
order = "right"
)
# (5) display code instead
fjoin_full(
dfQ,
dfP,
on = "id",
select = c("item", "price", "quantity"),
indicate = TRUE,
order = "right",
do = FALSE
)
# ---------------------------------------------------------------------------
# M:M inequality join reduced to 1:1 using `mult.x` and `mult.y`
# ---------------------------------------------------------------------------
# data.table (`mult`) and dplyr (`multiple`) have options for reducing the
# cardinality on one side of the join from many ("all") to one ("first" or
# "last"). fjoin (`mult.x`, `mult.y`) permits this on either side of the
# join, or on both sides at once.
# This example (using `fjoin_left()`) shows an application to temporally
# ordered data frames of "events" and "reactions".
# data frames
events <- data.table::fread(data.table = FALSE, input = "
event_id event_ts
1 10
2 20
3 40
")
reactions <- data.table::fread(data.table = FALSE, input = "
reaction_id reaction_ts
1 30
2 50
3 60
")
# ---------------------------------------------------------------------------
# (1) for each event, all subsequent reactions (M:M)
fjoin_left(
events,
reactions,
on = c("event_ts < reaction_ts"),
)
# (2) for each event, the next reaction (1:M)
fjoin_left(
events,
reactions,
on = c("event_ts < reaction_ts"),
mult.x = "first"
)
# (3) for each event, the next reaction, provided there was no intervening event (1:1)
fjoin_left(
events,
reactions,
on = c("event_ts < reaction_ts"),
mult.x = "first",
mult.y = "last"
)
# ---------------------------------------------------------------------------
# Natural join
# ---------------------------------------------------------------------------
fjoin_inner(x, y, on = NA) # note `NA` not `NULL`/omitted
try(fjoin_left(x, y)) # to prevent accidental natural joins
# ---------------------------------------------------------------------------
# Mock join (code "ghostwriter" for data.table users)
# ---------------------------------------------------------------------------
fjoin_inner(on = c("id"))
Right anti-join
Description
The anti-join of y in a join of x and y, i.e. the rows
of y that do not join.
Usage
fjoin_right_anti(
x = NULL,
y = NULL,
on,
match.na = FALSE,
mult.x = "all",
mult.y = "all",
select = NULL,
do = !(is.null(x) && is.null(y)),
show = !do
)
Arguments
x, y |
|
on |
A character vector of join predicates, e.g. |
match.na |
Whether to allow equality matches between |
mult.x, mult.y |
When a row of |
select |
Character vector of columns to be selected from |
do |
Whether to execute the join. If |
show |
Whether to print the data.table code for the join to the
console. Default is the opposite of |
Details
Details are as for e.g. fjoin_inner except for arguments
controlling the order and prefixing of output columns, which do not apply.
Output class is determined by y.
Value
A data.frame, data.table, (grouped) tibble, sf,
or sf-tibble, or else NULL if do is FALSE. See
Details.
See Also
See the package-level documentation fjoin for related
functions.
Examples
# ---------------------------------------------------------------------------
# Semi- and anti-joins: basic usage
# ---------------------------------------------------------------------------
# data frames
x <- data.table::fread(data.table = FALSE, input = "
country pop_m
Australia 27.2
Brazil 212.0
Chad 3.0
")
y <- data.table::fread(data.table = FALSE, input = "
country forest_pc
Brazil 59.1
Chad 3.2
Denmark 15.8
")
# full join with `indicate = TRUE` for comparison
fjoin_full(x, y, on = "country", indicate = TRUE)
fjoin_semi(x, y, on = "country")
fjoin_anti(x, y, on = "country")
fjoin_right_semi(x, y, on = "country")
fjoin_right_anti(x, y, on = "country")
# ---------------------------------------------------------------------------
# `mult.x` and `mult.y` support
# ---------------------------------------------------------------------------
# data frames
events <- data.table::fread(data.table = FALSE, input = "
event_id event_ts
1 10
2 20
3 40
")
reactions <- data.table::fread(data.table = FALSE, input = "
reaction_id reaction_ts
1 30
2 50
3 60
")
# ---------------------------------------------------------------------------
# for each event, the next reaction, provided there was no intervening event (1:1)
fjoin_full(
events,
reactions,
on = c("event_ts < reaction_ts"),
mult.x = "first",
mult.y = "last",
indicate = TRUE
)
fjoin_semi(
events,
reactions,
on = c("event_ts < reaction_ts"),
mult.x = "first",
mult.y = "last"
)
fjoin_anti(
events,
reactions,
on = c("event_ts < reaction_ts"),
mult.x = "first",
mult.y = "last"
)
# ---------------------------------------------------------------------------
# Natural join
# ---------------------------------------------------------------------------
fjoin_semi(x, y, on = NA)
fjoin_anti(x, y, on = NA)
# ---------------------------------------------------------------------------
# Mock join
# ---------------------------------------------------------------------------
fjoin_semi(on="id")
fjoin_semi(on=c("id", "date"))
fjoin_semi(on=c("id"), mult.y = "last")
Right semi-join
Description
The semi-join of y in a join of x and y, i.e. the rows
of y that join at least once.
Usage
fjoin_right_semi(
x = NULL,
y = NULL,
on,
match.na = FALSE,
mult.x = "all",
mult.y = "all",
select = NULL,
do = !(is.null(x) && is.null(y)),
show = !do
)
Arguments
x, y |
|
on |
A character vector of join predicates, e.g. |
match.na |
Whether to allow equality matches between |
mult.x, mult.y |
When a row of |
select |
Character vector of columns to be selected from |
do |
Whether to execute the join. If |
show |
Whether to print the data.table code for the join to the
console. Default is the opposite of |
Details
Details are as for e.g. fjoin_inner except for arguments
controlling the order and prefixing of output columns, which do not apply.
Output class is determined by y.
Value
A data.frame, data.table, (grouped) tibble, sf,
or sf-tibble, or else NULL if do is FALSE. See
Details.
See Also
See the package-level documentation fjoin for related
functions.
Examples
# ---------------------------------------------------------------------------
# Semi- and anti-joins: basic usage
# ---------------------------------------------------------------------------
# data frames
x <- data.table::fread(data.table = FALSE, input = "
country pop_m
Australia 27.2
Brazil 212.0
Chad 3.0
")
y <- data.table::fread(data.table = FALSE, input = "
country forest_pc
Brazil 59.1
Chad 3.2
Denmark 15.8
")
# full join with `indicate = TRUE` for comparison
fjoin_full(x, y, on = "country", indicate = TRUE)
fjoin_semi(x, y, on = "country")
fjoin_anti(x, y, on = "country")
fjoin_right_semi(x, y, on = "country")
fjoin_right_anti(x, y, on = "country")
# ---------------------------------------------------------------------------
# `mult.x` and `mult.y` support
# ---------------------------------------------------------------------------
# data frames
events <- data.table::fread(data.table = FALSE, input = "
event_id event_ts
1 10
2 20
3 40
")
reactions <- data.table::fread(data.table = FALSE, input = "
reaction_id reaction_ts
1 30
2 50
3 60
")
# ---------------------------------------------------------------------------
# for each event, the next reaction, provided there was no intervening event (1:1)
fjoin_full(
events,
reactions,
on = c("event_ts < reaction_ts"),
mult.x = "first",
mult.y = "last",
indicate = TRUE
)
fjoin_semi(
events,
reactions,
on = c("event_ts < reaction_ts"),
mult.x = "first",
mult.y = "last"
)
fjoin_anti(
events,
reactions,
on = c("event_ts < reaction_ts"),
mult.x = "first",
mult.y = "last"
)
# ---------------------------------------------------------------------------
# Natural join
# ---------------------------------------------------------------------------
fjoin_semi(x, y, on = NA)
fjoin_anti(x, y, on = NA)
# ---------------------------------------------------------------------------
# Mock join
# ---------------------------------------------------------------------------
fjoin_semi(on="id")
fjoin_semi(on=c("id", "date"))
fjoin_semi(on=c("id"), mult.y = "last")