This article describes how to use the full power of DuckDB with duckplyr. Two options are discussed: interoperability with dbplyr and the use of DuckDB’s functions in duckplyr.
The duckplyr package is a drop-in replacement for dplyr, designed to work with DuckDB as the backend. There is a translation layer that converts R function calls to DuckDB functions or macros, aiming at full compatibility with R. Many functions are translated already, and many more are not. For functions that cannot be translated, duckplyr falls back to the original R implementation, disrupting the DuckDB pipeline and materializing intermediate results.
Furthermore, DuckDB has functions with no R equivalent. These might
be used already by code that interacts with DuckDB through dbplyr,
either making use of its passthrough feature (unknown functions are
translated to SQL verbatim), or by using the
mutate(x = sql(...)) pattern. When working with duckplyr,
this functionality is still accessible, albeit through experimental
interfaces:
as_tbl() converts a duckplyr table to a duckdb
tbl objectdd$fun(...) can
be used to call arbitrary DuckDB functionsThe experimental as_tbl() function, introduced in
duckplyr 1.1.0, transparently converts a duckplyr frame to a dbplyr
tbl object:
It achieves this by creating a temporary view that points to the
relational object created internally by duckplyr, in the same DBI
connection as the duckplyr object. No data is copied in this operation.
The view is discarded when the tbl object goes out of
scope.
This allows using arbitrary SQL code, either through
sql() or by relying on dbplyr’s passthrough feature.
There is no R function called least_common_multiple(),
it is interpreted as a SQL function.
To continue processing with duckplyr, use
as_duckdb_tibble():
The escape hatch, also introduced in duckplyr 1.1.0, allows calling arbitrary DuckDB functions directly from duckplyr, without going through SQL:
The dd prefix has been picked for the following
reasons:
dd in RA prefix is necessary to avoid name clashes with existing R functions. If this is used widely, large-scale code analysis may help prioritize the translation of functions that are not yet supported by duckplyr.
The dd package, when
attached, will provide a dd object containing many known
DuckDB functions. This adds support for autocomplete:
This package is not necessary to use duckplyr, and the list of functions is incomplete and growing. In case you’re wondering:
While duckplyr is designed to be a drop-in replacement for dplyr, it still allows to harness most if not all of the power of DuckDB.
See vignette("limits") for limitations in the
translation employed by duckplyr, vignette("fallback") for
more information on fallback, and vignette("telemetry") for
existing attempts to prioritize work on the translation layer.