igraph >= 2.1.2.memo_expr() because it causes errors on
R-devel.is.R().clustermq 0.9.0 (@mschubert).rm() and
remove().rlang PR 1255.batchtools template file can be brewed
(#1359, @pat-s).targets.NOTICE and inst/NOTICE to more
explicitly credit code included from other open source projects.
(Previously drake just had comments in the source with
links to the various projects.)dsl_sym() instead of as.symbol() when
constructing commands for combine() (#1340, @vkehayas).level_separation argument to
vis_drake_graph() and render_drake_graph() to
control the aspect ratio of visNetwork graphs (#1303, @matthewstrasiotto, @matthiasgomolka,
@robitalec).caching = "master" in favor of
caching = "main"..data in DSL (#1323,
@shirdekel).identical() to compare file hashes (#1324, @shirdekel).seed = TRUE in future::future().parallelism = "clustermq"
and caching = "worker" (@richardbayes).NROW() throws an error (#1300,
julian-tagell on Stack Overflow).lifecycle that does not require
badges to be in man/figures.log_worker argument of
clustermq::workers() to make() and
drake_config() (#1305, @billdenney, @mschubert).as.is to TRUE in
utils::type.convert() (#1309, @bbolker).cached_planned() and cached_unplanned()
now work with non-standard cache locations (#1268, @Plebejer).use_cache to FALSE more often (#1257,
@Plebejer).iris dataset with the
airquality dataset in all documentation, examples, and
tests (#1271).code_to_function() to the
proper environment (#1275, @robitalec).tidyselect (#1274, @dernst).txtq lockfiles (#1232, #1239, #1280, @danwwilson, @pydupont, @mattwarkentin).drake_script() function to write
_drake.R files for r_make() (#1282).expose_imports() in favor of
make(envir = getNamespace("yourPackage") (#1286, @mvarewyck).r_make() if
getOption("drake_r_make_message") is FALSE
(#1238, @januz).visNetwork graph by using
the hierarchical layout with
visEdges(smooth = list(type = "cubicBezier", forceDirection = TRUE))
(#1289, @mstr3336).splice_inner() from dropping formal arguments
shared by c() (#1262, @bart1).subtarget_hashes.cross() for crosses on a single
grouping variable.group() used with specialized formats
(#1236, @adamaltmejd).tidyselect >= 1.0.0..names argument (#1240, @maciejmotyka, @januz).drake_plan()
(#1237, @januz).cross()
sub-targets (#1204, @psadil). Expansion order is the same, but
names are correctly matched now.file_out() files in
clean(), even when garbage_collection is
TRUE (#521, @the-Hull).keep_going = TRUE for formatted targets
(#1206).progress_bar instead of progress) so that
drake works without the progress package
(#1208, @mbaccou).config$settings (#965).drake_done() and
drake_cancelled() (#1205).drake_graph_info() (#1207).verbose is 2 (#1203, @kendonB).jobs argument of
clean().drake_build() or drake_debug() (#1214, @kendonB).hasty_build (#1222).config$settings
(#965).file_in()/file_out()/knitr_in()
files are not literal strings (#1229).file_out() and knitr_in() in
imported functions (#1229).knitr_in() in dynamic branching (#1229).target().progress() => drake_progress(),
running() => drake_running(),
failed() => drake_failed()) (#1205).digest version to require 0.6.21 (#1166, @boshek)depend trigger to toggle invalidation from
dynamic-only dependencies, including the max_expand
argument of make().session_info argument parsing (and reduce calls
to utils::sessionInfo() in tests).tibble 3.0.0.target(format = "file")
(#1168, #1127).max_expand on a target-by-target
basis via target() (#1175, @kendonB).make(), not in drake_config() (#1156).make(verbose = 2), remove the spinner and use a
progress bar to track how many targets are done so far.cli
(optional package).console_log_file in favor of
log_make as an argument to make() and
drake_config()."loop" and
"future" parallel backends (#400).loadd() RStudio addin
through the new rstudio_drake_cache global option (#1169,
@joelnitta).recoverable(), e.g. dynamic
branching + dynamic files.drake_plan() if
a grouping variable is undefined or invalid (#1182, @kendonB).drake_deps and drake_deps_ht (#1183).rlang::trace_back() to make
diagnose()$error$calls nicer (#1198).These changes invalidate some targets in some workflows, but they are necessary bug fixes.
$<-()
and @<-() (#1144).bind_plans() (#1136,
@jennysjaarda).analyze_assign() (#1119, @jennysjaarda)."running" progress of dynamic targets."fst_tbl" format for large
tibble targets (#1154, @kendonB).format argument to make(), an
optional custom storage format for targets without an explicit
target(format = ...) in the plan (#1124).lock_cache argument to make() to
optionally suppress cache locking (#1129). (It can be annoying to
interrupt make() repeatedly and unlock the cache manually
every time.)cancel() and cancel_if()
function to cancel targets mid-build (#1131).subtarget_list argument to
loadd() and readd() to optionally load a
dynamic target as a list of sub-targets (#1139, @MilesMcBain).file_out() (#1141).drake_config() level (#1156, @MilesMcBain).config argument in all user-side
functions (#1118, @vkehayas). Users can now supply the plan
and other make() arguments directly, without bothering with
drake_config(). Now, you only need to call
drake_config() in the _drake.R file for
r_make() and friends. Old code with config
objects should still work. Affected functions:
make()outdated()drake_build()drake_debug()recoverable()missed()deps_target()deps_profile()drake_graph_info()vis_drake_graph()sankey_drake_graph()drake_graph()text_drake_graph()predict_runtime(). Needed to rename the
targets argument to targets_predict and
jobs to jobs_predict.predict_workers(). Same argument name changes as
predict_runtime().drake_config() is to serve functions r_make()
and friends.@ operator. For example, in the static code
analysis of x@y, do not register y as a
dependency (#1130, @famuvie).deps_profile() (#1134, @kendonB).deps_target() output (#1134, @kendonB).drake_meta_()
objects objects.drake_envir() and id_chr() (#1132).drake_envir() to select the environment with
imports (#882).vctrs paradigm and its type stability for
dynamic branching (#1105, #1106).target as a symbol by default in
read_trace(). Required for the trace to make sense in
#1107."future" backend (#1083, @jennysjaarda).log_build_times argument to
make() and drake_config(). Allows users to
disable the recording of build times. Produces a speedup of up to 20% on
Macs (#1078).make(), outdated(make_imports = TRUE),
recoverable(make_imports = TRUE),
vis_drake_graph(make_imports = TRUE), clean(),
etc. on the same cache.format trigger to invalidate targets when the
specialized data format changes (#1104, @kendonB).cache_planned() and
cache_unplanned() to help selectively clean workflows with
dynamic targets (#1110, @kendonB).drake_config() objects and analyze_code()
objects."qs" format (#1121, @kendonB).%||% (%|||% is
faster). (#1089, @billdenney)%||NA due to slowness (#1089, @billdenney).is_dynamic() and
is_subtarget() (#1089, @billdenney).getVDigest() instead of digest()
(#1089, #1092,
https://github.com/eddelbuettel/digest/issues/139#issuecomment-561870289,
@eddelbuettel,
@billdenney).backtick and .deparseOpts() to
speed up deparse() (#1086,
https://stackoverflow.com/users/516548/g-grothendieck,
@adamkski).build_times() (#1098).mget_hash() in progress()
(#1098).drake_graph_info() (#1098).outdated()
(#1098).make(), avoid checking for nonexistent metadata for
missing targets.drake_config().use_drake()
(#1097, @lorenzwalthert, @tjmahr).drake’s
interpretation of the plan. In the plan, all the dependency
relationships among targets and files are implicit. In the
spec, they are all explicit. We get from the plan to the spec
using static code analysis, e.g. analyze_code().drake::drake_plan(x = target(...)) from
throwing an error if drake is not loaded (#1039, @mstr3336).transformations lifecycle badge to the proper
location in the docstring (#1040, @jeroen).readd() / loadd() from turning an
imported function into a target (#1067).disk.frame targets with their stored
values (#1077, @brendanf).subtargets() function to get the cached names
of the sub-targets of a dynamic target.subtargets arguments to loadd()
and readd() to retrieve specific sub-targets from a parent
dynamic target.get_trace() and read_trace()
functions to help track which values of grouping variables go into the
making of dynamic sub-targets.id_chr() function to get the name of the
target while make() is running.plot(plan) (#1036).vis_drake_graph(), drake_graph_info(), and
render_drake_graph() now take arguments that allow behavior
to be defined upon selection of nodes. (#1031,@mstr3336).max_expand argument to make()
and drake_config() to scale down dynamic branching (#1050,
@hansvancalster).drake_config()
objects.prework is a language object, list of
language objects, or character vector (#1 at pat-s/multicore-debugging
on GitHub, @pat-s).config$layout.
Supports internal modifications by reference. Required for #685.dynamic a formal argument of
target().storrs and decorated
storrs (#1071).setdiff() and avoiding
names(config$envir_targets).dir_size(). Incurs
rehashing for some workflows, but should not invalidate any
targets.which_clean() function to preview which
targets will be invalidated by clean() (#1014, @pat-s).storr (#1015, @billdenney, @noamross)."diskframe" format for larger-than-memory
data (#1004, @xiaodaigh).drake_tempfile() function to help with
"diskframe" format. It makes sure we are not copying large
datasets across different physical storage media (#1004, @xiaodaigh).code_to_function() to allow for
parsing script based workflows into functions so
drake_plan() can begin to manage the workflow and track
dependencies. (#994, @thebioengineer)seed_trigger() (#1013,
@CreRecombinase).txtq API inside decorated
storr API (#1020).max_expand in
drake_plan(). max_expand is now the maximum
number of targets produced by map(), split(),
and cross(). For cross(), this reduces the
number of targets (less cumbersome) and makes the subsample of targets
more representative of the complete grid. It also. ensures consistent
target naming when .id is FALSE (#1002). Note:
max_expand is not for production workflows anyway, so this
change does not break anything important. Unfortunately, we do lose the
speed boost in drake_plan() originally due to
max_expand, but drake_plan() is still fast, so
that is not so bad.NULL targets (#998).cross() (#1009). The same fix should apply to
map() and split() too.map() (#1010).fst-powered saving of
data.table objects.transform a formal argument of
target() so that users do not have to type “transform =”
all the time in drake_plan() (#993).ropensci.github.io/drake to
docs.ropensci.org/drake.target(format = "keras") (#989).verbose argument in various caching
functions. The location of the cache is now only printed in
make(). This made the previous feature easier to
implement.combine()
(#1008).storr (#968).drake_plan(transform = slice()) understand
.id and grouping variables (#963).clean(garbage_collection = TRUE, destroy = TRUE).
Previously it destroyed the cache before trying to collect garbage.r_make() passes informative error messages
back to the calling process (#969).map() and
cross() on topologically side-by-side targets (#983).dsl_left_outer_join() so cross() selects the
right combinations of existing targets (#986). This bug was probably
introduced in the solution to #983.progress() more consistent, less
dependent on whether tidyselect is installed.format argument of target() (#971). This
allows users to leverage faster ways to save and load targets, such as
write_fst() for data frames and
save_model_hdf5() for Keras models. It also improves memory
because it prevents storr from making a serialized
in-memory copy of large data objects.tidyselect functionality for ... in
progress(), analogous to loadd(),
build_times(), and clean().do_stuff() and the method stuff.your_class()
are defined in envir, and if do_stuff() has a
call to UseMethod("stuff"), then drake’s code
analysis will detect stuff.your_class() as a dependency of
do_stuff().file_in() URLs. Requires
the new curl_handles argument of make() and
drake_config() (#981).target(), map(), split(),
cross(), and combine() (#979).file_out() files in clean() unless
garbage_collection is TRUE. That way,
make(recover = TRUE) is a true “undo button” for
clean(). clean(garbage_collection = TRUE)
still removes data in the cache, as well as any file_out()
files from targets currently being cleaned.clean() only appears if
garbage_collection is TRUE. Also, this menu is
added to rescue_cache(garbage_collection = TRUE)..drake/. The
old .drake_history/ folder was awkward. Old histories are
migrated during drake_config(), and
drake_history()..drake_history in
plan_to_code(), plan_to_notebook(), and the
examples in the help files.make(recover = TRUE).recoverable() and
r_recoverable() to show targets that are outdated but
recoverable via make(recover = TRUE).drake_history(). Powered by txtq (#918,
#920).no_deps() function, similar to
ignore(). no_deps() suppresses dependency
detection but still tracks changes to the literal code (#910).transform_plan().seed column of drake plans
to set custom seeds (#947).seed trigger to optionally ignore changes to
the target seed (#947).drake_plan(), interpret custom columns as
non-language objects (#942).clustermq >= 0.8.8.ensure_workers in drake_config()
and make().make() after config is already supplied.make() from inside the cache
(#927).CITATION file with JOSS paper.deps_profile(), include the seed and change the
names.make(). All
this does is invalidate old targets.set_hash() and get_hash() in
storr to double the speed of progress tracking.$ (#938).xxhash64 as the default hash algorithm
for non-storr hashing if the driver does not have a hash
algorithm.These changes are technically breaking changes, but they should only affect advanced users.
rescue_cache() no longer returns a value.clustermq (#898). Suggest
version >= 0.8.8 but allow 0.8.7 as well.drake recomputes config$layout when
knitr reports change (#887).make()
(#878).r_drake_build().r_make() (#889).expose_imports(): do not do the
environment<- trick unless the object is a non-primitive
function.assign() vs
delayedAssign().file_in() files and other strings (#896).ignore() work inside loadd(),
readd(), file_in(), file_out(),
and knitr_in().file_in() and
file_out(). drake now treats
file_in()/file_out() files as URLS if they
begin with “http://”, “https://”, or “ftp://”. The fingerprint is a
concatenation of the ETag and last-modified timestamp. If neither can be
found or if there is no internet connection, drake throws
an error."unload" and
"none", which do not attempt to load a target’s
dependencies from memory (#897).drake_slice() to help split data across multiple
targets. Related: #77, #685, #833.drake_cache() function, which is now
recommended instead of get_cache() (#883).r_deps_target() function.r_make(),
r_vis_drake_graph(), and r_outdated()
(#892).get_cache() in favor of
drake_cache().clean() menu
prompt.drake_config().config argument.use_cache to FALSE
in storr function calls for saving and loading targets.
Also, at the end of make(), call flush_cache()
(and then gc() if garbage collection is enabled).callr::r() within commands as a safe
alternative to lock_envir = FALSE in the self-invalidation
section of the make() help file.file_in()/file_out()/knitr_in()
files. We now rehash files if the file is less than 100 KB or the time
stamp changed or the file size changed.rlang’s new interpolation operator
{{, which was causing make() to fail when
drake_plan() commands are enclosed in curly braces
(#864).config$lock_envir <- FALSE” from
loop_build() to backend_loop(). This makes
sure config$envir is correctly locked in
make(parallelism = "clustermq")..data
argument of map() and cross() in the DSL.drake_plan(), repair
cross(.data = !!args), where args is an
optional data frame of grouping variables.file_in()/file_out() directories for Windows
(#855)..id_chr work with combine() in the
DSL (#867).make_spinner() unless the version of
cli is at least 1.1.0.text_drake_graph() (and
r_text_drake_graph() and
render_text_drake_graph()). Uses text art to print a
dependency graph to the terminal window. Handy for when users SSH into
remote machines without X Window support.max_expand argument to
drake_plan(), an optional upper bound on the lengths of
grouping variables for map() and cross() in
the DSL. Comes in handy when you have a massive number of targets and
you want to test on a miniature version of your workflow before you
scale up to production.clustermq workers for as
long as possible. Before launching them, build/check targets locally
until we reach an outdated target with hpc equal to
FALSE. In other words, if no targets actually require
clustermq workers, no workers get created.make(parallelism = "future"), reset the
config$sleep() backoff interval whenever a new target gets
checked.CodeDepends with a base R solution in
code_to_plan(). Fixes a CRAN note.drake_plan()) is no longer
experimental.callr API (r_make() and friends) is no
longer experimental.evaluate_plan(), expand_plan(),
map_plan(), gather_plan(),
gather_by(), reduce_plan(),
reduce_by().deps(),
max_useful_jobs(), and
migrate_drake_project().drake_plan(x = target(..., transform = map(...)))
avoid inserting extra dots in target names when the grouping variables
are character vectors (#847). Target names come out much nicer this way,
but those name changes will invalidate some targets (i.e. they need to
be rebuilt with make()).config$jobs_preprocess (local jobs) in several
places where drake was incorrectly using
config$jobs (meant for targets).loadd(x, deps = TRUE, config = your_config) to
work even if x is not cached (#830). Required disabling
tidyselect functionality when deps
TRUE. There is a new note in the help file about this, and
an informative console message prints out on
loadd(deps = TRUE, tidyselect = TRUE). The default value of
tidyselect is now !deps.testthat >=
2.0.1.9000.drake_plan() transformations, allow the user to
refer to a target’s own name using a special .id_chr
symbol, which is treated like a character string.transparency argument to
drake_ggraph() and render_drake_ggraph() to
disable transparency in the rendered graph. Useful for R installations
without transparency support.vis_drake_graph() and drake_ggraph() displays.
Only activated in vis_drake_graph() when there are at least
10 nodes distributed in both the vertical and horizontal
directions.vis_drake_graph() and
render_drake_graph().drake_plan()
(#847).drake plans (drake_plan())
inside drake_config() objects. When other bottlenecks are
removed, this will reduce the burden on memory (re #800).targets argument inside
drake_config() objects. This is to reduce memory
consumption.layout and direction
arguments of vis_drake_graph() and
render_drake_graph(). Direction is now always left to right
and the layout is always Sugiyama.drake_cache.csv by default) to avoid issues with spaces
(e.g. entry names with spaces in them, such as “file report.Rmd”)`.drake 7.0.0, if you run make() in
interactive mode and respond to the menu prompt with an option other
than 1 or 2, targets will still build.drake_graph(). The
bug came from append_output_file_nodes(), a utility
function of drake_graph_info().r_make(r_fn = callr::r_bg()) re #799.drake_ggraph() and
sankey_drake_graph() to work when the graph has no
edges.use_drake() function to write the
make.R and _drake.R files from the “main
example”. Does not write other supporting scripts.hpc column in your
drake_plan(), you can now select which targets to deploy to
HPC and which to run locally.list argument to build_times(), just
like loadd().file_in() and file_out() can now handle
entire directories,
e.g. file_in("your_folder_of_input_data_files") and
file_out("directory_with_a_bunch_of_output_files").config to HPC workers.drake_ggraph()
drake plan to the config argument of a
function.map() and cross() transformations
in the DSL, prevent the accidental sorting of targets by name (#786).
Needed merge(sort = FALSE) in
dsl_left_outer_join().verbose argument of
make() now takes values 0, 1, and 2, and maximum verbosity
in the console prints targets, retries, failures, and a spinner. The
console log file, on the other hand, dumps maximally verbose runtime
info regardless of the verbose argument.f <- Rcpp::cppFunction(...) did not stay up to date from
session to session because the addresses corresponding to anonymous
pointers were showing up in deparse(f). Now,
drake ignores those pointers, and Rcpp
functions compiled inline appear to stay up to date. This problem was
more of an edge case than a bug.drake_plan(), deprecate the
tidy_evaluation argument in favor of the new and more
concise tidy_eval. To preserve back compatibility for now,
if you supply a non-NULL value to
tidy_evaluation, it overwrites tidy_eval.drake_config() objects by
assigning closure of config$sleep to
baseenv().drake plans, the command and
trigger columns are now lists of language objects instead
of character vectors. make() and friends still work if you
have character columns, but the default output of
drake_plan() has changed to this new format.parallelism argument of
make()) except “clustermq” and “future” are removed. A new
“loop” backend covers local serial execution.built(), find_project(),
imported(), and parallel_stages(); full list
at #564) and the single-quoted file API.lock_envir to
TRUE in make() and
drake_config(). So make() will automatically
quit in error if the act of building a target tries to change upstream
dependencies.make() no longer returns a value. Users will need to
call drake_config() separately to get the old return value
of make().jobs argument to be of length 1
(make() and drake_config()). To parallelize
the imports and other preprocessing steps, use
jobs_preprocess, also of length 1.storr namespace. As a result,
drake is faster, but users will no longer be able to load
imported functions using loadd() or
readd().target(), users must now explicitly name all the
arguments except command,
e.g. target(f(x), trigger = trigger(condition = TRUE))
instead of target(f(x), trigger(condition = TRUE)).bind_plans() when the result has
duplicated target names. This makes drake’s API more
predictable and helps users catch malformed workflows earlier.loadd() only loads targets listed in the plan. It no
longer loads imports or file hashes.progress(),
deps_code(), deps_target(), and
predict_workers() are now data frames.hover to FALSE
in visualization functions. Improves speed.bind_plans() to work with lists of plans
(bind_plans(list(plan1, plan2)) was returning
NULL in drake 6.2.0 and 6.2.1).get_cache(path = "non/default/path", search = FALSE) looks
for the cache in "non/default/path" instead of
getwd().tibble.ensure_loaded() in
meta.R and triggers.R when ensuring the
dependencies of the condition and change
triggers are loaded.config argument to drake_build()
and loadd(deps = TRUE).lock_envir argument to safeguard
reproducibility. More discussion: #619, #620.from_plan() function allows the users to
reference custom plan columns from within commands. Changes to values in
these columns columns do not invalidate targets.make()
pitfalls in interactive mode (#761). Appears once per session. Disable
with options(drake_make_menu = FALSE).r_make(),
r_outdated(), etc. to run drake functions more
reproducibly in a clean session. See the help file of
r_make() for details.progress() gains a progress argument for
filtering results. For example,
progress(progress = "failed") will report targets that
failed.storr’s key mangling in favor of drake’s own
encoding of file paths and namespaced functions for storr
keys.., .., and
.gitignore from being target names (consequence of the
above).drake cache, which the
user can set with the hash_algorithm argument of
new_cache(), storr::storr_rds(), and various
other cache functions. Thus, the concepts of a “short hash algorithm”
and “long hash algorithm” are deprecated, and the functions
long_hash(), short_hash(),
default_long_hash_algo(),
default_short_hash_algo(), and
available_hash_algos() are deprecated. Caches are still
back-compatible with drake > 5.4.0 and <= 6.2.1.magrittr dot symbol to appear in some
commands sometimes.fetch_cache argument in all
functions.DBI and RSQLite from
“Suggests”.config$eval <- new.env(parent = config$envir) for
storing built targets and evaluating commands in the plan. Now,
make() no longer modifies the user’s environment. This move
is a long-overdue step toward purity.codetools package.session argument of
make() and drake_config(). Details: in
#623.graph and layout arguments
to make() and drake_config(). The change
simplifies the internals, and memoization allows us to do this.make() in a subdirectory of
the drake project root (determined by the location of the
.drake folder in relation to the working directory).verbose argument, including
the option to print execution and total build times.mclapply() or parLapply(), depending on the
operating system).build_times(), predict_runtime(), etc. focus
on only the targets.plan_analyses(), plan_summaries(),
analysis_wildcard(), cache_namespaces(),
cache_path(), check_plan(),
dataset_wildcard(), drake_meta(),
drake_palette(), drake_tip(),
recover_cache(), cleaned_namespaces(),
target_namespaces(), read_drake_config(),
read_drake_graph(), and
read_drake_plan().target() as a user-side function. From now
on, it should only be called from within drake_plan().drake_envir() now throws an error, not a warning, if
called in the incorrect context. Should be called only inside commands
in the user’s drake plan.*expr*() rlang functions with
their *quo*() counterparts. We still keep
rlang::expr() in the few places where we know the
expressions need to be evaluated in config$eval.prework argument to make() and
drake_config() can now be an expression (language object)
or list of expressions. Character vectors are still acceptable.make(), print messages about triggers
etc. only if verbose >= 2L.in_progress() to
running().knitr_deps() to
deps_knitr().dependency_profile() to
deps_profile().predict_load_balancing() to
predict_workers().this_cache() and defer to
get_cache() and storr::storr_rds() for
simplicity.hover to FALSE
in visualization functions. Improves speed. Also a breaking change.drake_cache_log_file(). We recommend using
make() with the cache_log_file argument to
create the cache log. This way ensures that the log is always up to date
with make() results.Version 6.2.1 is a hotfix to address the failing automated CRAN
checks for 6.2.0. Chiefly, in CRAN’s Debian R-devel (2018-12-10) check
platform, errors of the form “length > 1 in coercion to logical”
occurred when either argument to && or
|| was not of length 1
(e.g. nzchar(letters) && length(letters)). In
addition to fixing these errors, version 6.2.1 also removes a
problematic link from the vignette.
sep argument to gather_by(),
reduce_by(), reduce_plan(),
evaluate_plan(), expand_plan(),
plan_analyses(), and plan_summaries(). Allows
the user to set the delimiter for generating new target names.hasty_build argument to make()
and drake_config(). Here, the user can set the function
that builds targets in “hasty mode”
(make(parallelism = "hasty")).drake_envir() function that returns the
environment where drake builds targets. Can only be
accessed from inside the commands in the workflow plan data frame. The
primary use case is to allow users to remove individual targets from
memory at predetermined build steps.tibble 2.0.0.0s from
predict_runtime(targets_only = TRUE) when some targets are
outdated and others are not.sort(NULL) warnings from
create_drake_layout(). (Affects R-3.3.x.)evaluate,
formatR, fs, future,
parallel, R.utils, stats, and
stringi.parse() in code_dependencies().memory_strategy (previously pruning_strategy)
to "speed" (previously "lookahead").drake_config()
(config$layout) just to store the code analysis results.
This is an intermediate structure between the workflow plan data frame
and the graph. It will help clean up the internals in future
development.label argument to future() inside
make(parallelism = "future"). That way , job names are
target names by default if job.name is used correctly in
the batchtools template file.dplyr,
evaluate, fs, future,
magrittr, parallel, R.utils,
stats, stringi, tidyselect, and
withr.rprojroot from “Suggests”.force argument in all functions except
make() and drake_config().prune_envir() to
manage_memory().pruning_strategy argument to
memory_strategy (make() and
drake_config()).console_log_file in
real time (#588).vis_drake_graph() hover text to
display commands in the drake plan more elegantly.predict_load_balancing() and remove its
reliance on internals that will go away in 2019 via #561.worker column of
config$plan in predict_runtime() and
predict_load_balancing(). This functionality will go away
in 2019 via #561.predict_load_balancing() to time and
workers.predict_runtime() and
predict_load_balancing() up to date.drake_session() and rename to
drake_get_session_info().timeout argument in the API of
make() and drake_config(). A value of
timeout can be still passed to these functions without
error, but only the elapsed and cpu arguments
impose actual timeouts now.map_plan() function to easily create
a workflow plan data frame to execute a function call over a grid of
arguments.plan_to_code() function to turn
drake plans into generic R scripts. New users can use this
function to better understand the relationship between plans and code,
and unsatisfied customers can use it to disentangle their projects from
drake altogether. Similarly,
plan_to_notebook() generates an R notebook from a
drake plan.drake_debug() function to run a target’s
command in debug mode. Analogous to drake_build().mode argument to trigger() to
control how the condition trigger factors into the decision
to build or skip a target. See the ?trigger for
details.sleep argument to make() and
drake_config() to help the main process consume fewer
resources during parallel processing.caching argument for the
"clustermq" and "clustermq_staged" parallel
backends. Now,
make(parallelism = "clustermq", caching = "main") will do
all the caching with the main process, and
make(parallelism = "clustermq", caching = "worker") will do
all the caching with the workers. The same is true for
parallelism = "clustermq_staged".append argument to
gather_plan(), gather_by(),
reduce_plan(), and reduce_by(). The
append argument control whether the output includes the
original plan in addition to the newly generated rows.load_main_example(),
clean_main_example(), and
clean_mtcars_example().filter argument to gather_by() and
reduce_by() in order to restrict what we gather even when
append is TRUE.make(parallelism = "hasty") skips all
of drake’s expensive caching and checking. All targets run
every single time and you are responsible for saving results to custom
output files, but almost all the by-target overhead is gone.path.expand() on the file argument to
render_drake_graph() and
render_sankey_drake_graph(). That way, tildes in file paths
no longer interfere with the rendering of static image files.evaluate_plan(trace = TRUE) followed by
expand_plan(), gather_plan(),
reduce_plan(), gather_by(), or
reduce_by(). The more relaxed behavior also gives users
more options about how to construct and maintain their workflow plan
data frames."future" parallelism to make sure
files travel over network file systems before proceeding to downstream
targets.visNetwork
package is not installed.make_targets() if all the targets are
already up to date.seed argument in
make() and drake_config().caching argument of make()
and drake_config() to "main" rather than
"worker". The default option should be the lower-overhead
option for small workflows. Users have the option to make a different
set of tradeoffs for larger workflows.condition trigger to evaluate to non-logical
values as long as those values can be coerced to logicals.condition trigger evaluate to a vector
of length 1.drake_plan_source().make(verbose = 4) now prints to the console when a
target is stored.gather_by() and reduce_by() now
gather/reduce everything if no columns are specified.make(jobs = 4) was equivalent to
make(jobs = c(imports = 4, targets = 4)). Now,
make(jobs = 4) is equivalent to
make(jobs = c(imports = 1, targets = 4)). See issue #553
for details.verbose is at least 2.load_mtcars_example().hook argument of make() and
drake_config().gather_by() and reduce_by(), do not
exclude targets with all NA gathering variables.digest() wherever possible. This
puts old drake projects out of date, but it improves
speed.stringi package no longer compiles on 3.2.0.code_dependencies(), restrict the possible global variables
to the ones mentioned in the new globals argument (turned
off when NULL. In practical workflows, global dependencies
are restricted to items in envir and proper targets in the
plan. In deps_code(), the globals slot of the
output list is now a list of candidate globals, not necessarily
actual globals (some may not be targets or variables in
envir).unlink() in clean(), set
recursive and force to FALSE.
This should prevent the accidental deletion of whole directories.clean() deleted input-only files if no
targets from the plan were cached. A patch and a unit test are included
in this release.loadd(not_a_target) no longer loads every target in the
cache.igraph vertex attribute (fixes #503).knitr_in() file code
chunks.sort(NULL) that caused warnings in
R 3.3.3.analyze_loadd() was
sometimes quitting with “Error: attempt to set an attribute on
NULL”.digest::digest(file = TRUE) on directories.
Instead, set hashes of directories to NA. Users should
still not directories as file dependencies.vis_drake_graph(). Previously, these files were missing
from the visualization, but actual workflows worked just fine.codetools failures in R 3.3 (add
a tryCatch() statement in
find_globals()).clustermq-based parallel backend:
make(parallelism = "clustermq").evaluate_plan(trace = TRUE) now adds a
*_from column to show the origins of the evaluated targets.
Try
evaluate_plan(drake_plan(x = rnorm(n__), y = rexp(n__)), wildcard = "n__", values = 1:2, trace = TRUE).gather_by() and reduce_by(),
which gather on custom columns in the plan (or columns generated by
evaluate_plan(trace = TRUE)) and append the new targets to
the previous plan.template argument of clustermq
functions (e.g. Q() and workers()) as an
argument of make() and drake_config().code_to_plan() function to turn R scripts and
R Markdown reports into workflow plan data frames.drake_plan_source() function, which generates
lines of code for a drake_plan() call. This
drake_plan() call produces the plan passed to
drake_plan_source(). The main purpose is visual inspection
(we even have syntax highlighting via prettycode) but users
may also save the output to a script file for the sake of
reproducibility or simple reference.deps_targets() in favor of a new
deps_target() function (singular) that behaves more like
deps_code().vis_drake_graph() and
render_drake_graph().vis_drake_graph() and
render_drake_graph().vis_drake_graph() using
the “title” node column.vis_drake_graph(collapse = TRUE).dependency_profile() show major trigger hashes
side-by-side to tell the user if the command, a dependency, an input
file, or an output file changed since the last make().txtq
package is installed.loadd() and
readd(), giving specific usage guidance in prose.build_drake_graph() and print
to the console the ones that execute.txtq is not installed.drake’s code examples to the
drake-examples GitHub repository and make make
drake_example() and drake_examples() download
examples from there.show_output_files argument to
vis_drake_graph() and friends."clustermq_staged" and "future_lapply".igraph attributes of the
dependency graph to allow for smarter dependency/memory management
during make().vis_drake_graph() and
sankey_drake_graph() to save static image files via
webshot.static_drake_graph() and
render_static_drake_graph() in favor of
drake_ggraph() and render_drake_ggraph().columns argument to evaluate_plan()
so users can evaluate wildcards in columns other than the
command column of plan.target() so users do not have to
(explicitly).sankey_drake_graph() and
render_sankey_drake_graph().static_drake_graph() and
render_static_drake_graph() for
ggplot2/ggraph static graph
visualizations.group and clusters arguments to
vis_drake_graph(), static_drake_graph(), and
drake_graph_info() to optionally condense nodes into
clusters.trace argument to
evaluate_plan() to optionally add indicator columns to show
which targets got expanded/evaluated with which wildcard values.always_rename argument to
rename in evaluate_plan().rename argument to
expand_plan().make(parallelism = "clustermq_staged"), a
clustermq-based staged parallelism backend (see #452).make(parallelism = "future_lapply_staged"), a
future-based staged parallelism backend (see #450).codetools rather than
CodeDepends for finding global variables.loadd() and readd() dependencies in
knitr reports referenced with knitr_in()
inside imported functions. Previously, this feature was only available
in explicit knitr_in() calls in commands.drake_plan()s.inst/hpc_template_files.drake_batchtools_tmpl_file() in favor of
drake_hpc_template_file() and
drake_hpc_template_files().garbage_collection argument to
make(). If TRUE, gc() is called
after every new build of a target.sanitize_plan() in
make().tracked() to accept only a
drake_config() object as an argument. Yes, it is
technically a breaking change, but it is only a small break, and it is
the correct API choice.DESCRIPTION file.knitr reports without
warnings.lapply-like backends,
drake uses persistent workers and a main process. In the
case of "future_lapply" parallelism, the main process is a
separate background process called by Rscript.make()’s. (Previously, there were “check” messages and a
call to staged_parallelism().)make(parallelism = c(imports = "mclapply_staged", targets = "mclapply").make(jobs = 1). Now, they are kept in memory until no
downstream target needs them (for make(jobs = 1)).predict_runtime(). It is a more sensible way to
go about predicting runtimes with multiple jobs. Likely to be more
accurate.make() no longer leave targets in the user’s
environment.imports_only argument to
make() and drake_config() in favor of
skip_targets.migrate_drake_project().max_useful_jobs().upstream_only argument to failed()
so users can list failed targets that do not have any failed
dependencies. Naturally accompanies
make(keep_going = TRUE).plyr as a dependency.drake_plan() and
bind_plans().target() to help create drake plans
with custom columns.drake_gc(), clean out disruptive files in
storrs with mangled keys (re: #198).load_basic_example() in favor of
load_mtcars_example().README.md file on the main example rather
than the mtcars example.README.Rmd file to generate
README.md.deps_targets().deps() in favor of
deps_code()pruning_strategy argument to make()
and drake_config() so the user can decide how
drake keeps non-import dependencies in memory when it
builds a target.drake plans to help users customize
scheduling.makefile_path argument to make() and
drake_config() to avoid potential conflicts between
user-side custom Makefiles and the one written by
make(parallelism = "Makefile").console argument to make() and
drake_config() so users can redirect console output to a
file.show_source(),
readd(show_source = TRUE),
loadd(show_source = TRUE).!! operator from tidyeval and
rlang is parsed differently than in R <= 3.4.4. This
change broke one of the tests in tests/testthat/tidy-eval.R
The main purpose of drake’s 5.1.2 release is to fix the
broken test.R CMD check error from building the pdf
manual with LaTeX.drake_plan(), allow users to customize target-level
columns using target() inside the commands.bind_plans() function to concatenate the rows
of drake plans and then sanitize the aggregate plan.session argument to tell
make() to build targets in a separate, isolated main R
session. For example,
make(session = callr::r_vanilla).reduce_plan() function to do pairwise reductions
on collections of targets..) from being a dependency of
any target or import. This enforces more consistent behavior in the face
of the current static code analysis functionality, which sometimes
detects . and sometimes does not.ignore() to optionally ignore pieces of workflow
plan commands and/or imported functions. Use
ignore(some_code) to
drake to not track dependencies in
some_code, andsome_code when it comes to
deciding which target are out of date.drake to only look for imports in environments
inheriting from envir in make() (plus
explicitly namespaced functions).loadd() to ignore foreign imports (imports not
explicitly found in envir when make() last
imported them).loadd() so that only targets (not imports) are
loaded if the ... and list arguments are
empty..gitignore file containing "*" to
the default .drake/ cache folder every time
new_cache() is called. This means the cache will not be
automatically committed to git. Users need to remove
.gitignore file to allow unforced commits, and then
subsequent make()s on the same cache will respect the
user’s wishes and not add another .gitignore. this only
works for the default cache. Not supported for manual
storrs."future" backend with a manual
scheduler.dplyr-style tidyselect
functionality in loadd(), clean(), and
build_times(). For build_times(), there is an
API change: for tidyselect to work, we needed to insert a
new ... argument as the first argument of
build_times().file_in() for file inputs to commands or imported
functions (for imported functions, the input file needs to be an
imported file, not a target).file_out() for output file targets (ignored if used in
imported functions).knitr_in() for
knitr/rmarkdown reports. This tells
drake to look inside the source file for target
dependencies in code chunks (explicitly referenced with
loadd() and readd()). Treated as a
file_in() if used in imported functions.drake_plan() so that it automatically fills in
any target names that the user does not supply. Also, any
file_out()s become the target names automatically
(double-quoted internally).read_drake_plan() (rather than an empty
drake_plan()) the default plan argument in all
functions that accept a plan.loadd(..., lazy = "bind"). That way, when you have a target
loaded in one R session and hit make() in another R
session, the target in your first session will automatically
update.dataframes_graph().diagnose() will take on
the role of returning this metadata.read_drake_meta() function in favor of
diagnose().expose_imports() function to optionally force
drake detect deeply nested functions inside specific
packages.drake_build() to be an exclusively user-side
function.replace argument to loadd() so that
objects already in the user’s environment need not be replaced.seed argument to make(),
drake_config(), and load_basic_example(). Also
hard-code a default seed of 0. That way, the
pseudo-randomness in projects should be reproducible across R
sessions.drake_read_seed() function to read the seed
from the cache. Its examples illustrate what drake is doing
to try to ensure reproducible random numbers.!! for the
... argument to drake_plan(). Suppress this
behavior using tidy_evaluation = FALSE or by passing in
commands passed through the list argument.rlang::expr()
before evaluating them. That means you can use the quasiquotation
operator !! in your commands, and make() will
evaluate them according to the tidy evaluation paradigm.drake_example("basic"),
drake_example("gsp"), and
drake_example("packages") to demonstrate how to set up the
files for serious drake projects. More guidance was needed
in light of #193.drake_plan() in the help file
(?drake_plan).drake to rOpenSci GitHub URL.config
argument, which you can get from drake_config() or
make(). Examples:
cache$exists() instead.make() decides to build targets.storr cache in a way
that is not back-compatible with projects from versions 4.4.0 and
earlier. The main change is to make more intelligent use of
storr namespaces, improving efficiency (both time and
storage) and opening up possibilities for new features. If you attempt
to run drake >= 5.0.0 on a project from drake <= 4.0.0, drake will
stop you before any damage to the cache is done, and you will be
instructed how to migrate your project to the new drake.formatR::tidy_source() instead of
parse() in tidy_command() (originally
tidy() in R/dependencies.R). Previously,
drake was having problems with an edge case: as a command,
the literal string "A" was interpreted as the symbol
A after tidying. With tidy_source(), literal
quoted strings stay literal quoted strings in commands. This may put
some targets out of date in old projects, yet another loss of back
compatibility in version 5.0.0.rescue_cache(), exposed to the user and used
in clean(). This function removes dangling orphaned files
in the cache so that a broken cache can be cleaned and used in the usual
ways once more.cpu and elapsed
arguments of make() to NULL. This solves an
elusive bug in how drake imposes timeouts.graph argument to functions
make(), outdated(), and
missed().prune_graph() function for igraph
objects.prune() and
status().analyses() => plan_analyses()as_file() => as_drake_filename()backend() => future::plan()build_graph() =>
build_drake_graph()check() => check_plan()config() => drake_config()evaluate() => evaluate_plan()example_drake() => drake_example()examples_drake() =>
drake_examples()expand() => expand_plan()gather() => gather_plan()plan(), workflow(),
workplan() => drake_plan()plot_graph() => vis_drake_graph()read_config() =>
read_drake_config()read_graph() => read_drake_graph()read_plan() => read_drake_plan()render_graph() =>
render_drake_graph()session() => drake_session()summaries() => plan_summaries()output and code as names in the
workflow plan data frame. Use target and
command instead. This naming switch has been formally
deprecated for several months prior.drake_quotes(),
drake_unquote(), and drake_strings() to remove
the silly dependence on the eply package.skip_safety_checks flag to make()
and drake_config(). Increases speed.sanitize_plan(), remove rows with blank targets
““.purge argument to clean() to
optionally remove all target-level information.namespace argument to cached() so
users can inspect individual storr namespaces.verbose to numeric: 0 = print nothing, 1 = print
progress on imports only, 2 = print everything.next_stage() function to report the targets
to be made in the next parallelizable stage.session_info argument to make().
Apparently, sessionInfo() is a bottleneck for small
make()s, so there is now an option to suppress it. This is
mostly for the sake of speeding up unit tests.log_progress argument to make()
to suppress progress logging. This increases storage efficiency and
speeds some projects up a tiny bit.namespace argument to
loadd() and readd(). You can now load and read
from non-default storr namespaces.drake_cache_log(),
drake_cache_log_file(), and
make(..., cache_log_file = TRUE) as options to track
changes to targets/imports in the drake cache.rmarkdown::render(), not just knit().drake properly.plot_graph() to display subcomponents. Check out
arguments from, mode, order, and
subset. The graph visualization vignette has
demonstrations."future_lapply" parallelism: parallel backends
supported by the future and future.batchtools
packages. See ?backend for examples and the parallelism
vignette for an introductory tutorial. More advanced instruction can be
found in the future and future.batchtools
packages themselves.diagnose().hook argument to make() to
wrap around build(). That way, users can more easily
control the side effects of distributed jobs. For example, to redirect
error messages to a file in
make(..., parallelism = "Makefile", jobs = 2, hook = my_hook),
my_hook should be something like
function(code){withr::with_message_sink("messages.txt", code)}.drake was previously using the outfile
argument for PSOCK clusters to generate output that could not be caught
by capture.output(). It was a hack that should have been
removed before.drake was previously using the outfile
argument for PSOCK clusters to generate output that could not be caught
by capture.output(). It was a hack that should have been
removed before.make() and outdated()
print “All targets are already up to date” to the console."future_lapply" backends.plot_graph() and progress(). Also
see the new failed() function, which is similar to
in_progress().parLapply parallelism. The
downside to this fix is that drake has to be properly
installed. It should not be loaded with
devtools::load_all(). The speedup comes from lightening the
first clusterExport() call in run_parLapply().
Previously, we exported every single individual drake
function to all the workers, which created a bottleneck. Now, we just
load drake itself in each of the workers, which works
because build() and do_prework() are
exported.overwrite to FALSE
in load_basic_example().report.Rmd in
load_basic_example().get_cache(..., verbose = TRUE).lightly_parallelize() and
lightly_parallelize_atomic(). Now, processing happens
faster, and only over the unique values of a vector.make_with_config() function to do the work of
make() on an existing internal configuration list from
drake_config().drake_batchtools_tmpl_file() to
write a batchtools template file from one of the examples
(drake_example()), if one exists.Version 4.3.0 has: - Reproducible random numbers (#56) - Automatic detection of knitr dependencies (#9) - More vignettes - Bug fixes
Version 4.2.0 will be released today. There are several improvements to code style and performance. In addition, there are new features such as cache/hash externalization and runtime prediction. See the new storage and timing vignettes for details. This release has automated checks for back-compatibility with existing projects, and I also did manual back compatibility checks on serious projects.
Version 3.0.0 is coming out. It manages environments more
intelligently so that the behavior of make() is more
consistent with evaluating your code in an interactive session.
Version 1.0.1 is on CRAN! I’m already working on a massive update, though. 2.0.0 is cleaner and more powerful.