Version 6.2.1 is a hotfix to address the failing automated CRAN checks for 6.2.0. Chiefly, in CRAN’s Debian R-devel (2018-12-10) check platform, errors of the form “length > 1 in coercion to logical” occurred when either argument to &&
or ||
was not of length 1 (e.g. nzchar(letters) && length(letters)
). In addition to fixing these errors, version 6.2.1 also removes a problematic link from the vignette.
sep
argument to gather_by()
, reduce_by()
, reduce_plan()
, evaluate_plan()
, expand_plan()
, plan_analyses()
, and plan_summaries()
. Allows the user to set the delimiter for generating new target names.hasty_build
argument to make()
and drake_config()
. Here, the user can set the function that builds targets in “hasty mode” (make(parallelism = "hasty")
).drake_envir()
function that returns the environment where drake
builds targets. Can only be accessed from inside the commands in the workflow plan data frame. The primary use case is to allow users to remove individual targets from memory at predetermined build steps.tibble
2.0.0.0s
from predict_runtime(targets_only = TRUE)
when some targets are outdated and others are not.sort(NULL)
warnings from create_drake_layout()
. (Affects R-3.3.x.)parse()
in code_dependencies()
.memory_strategy
(previously pruning_strategy
) to "speed"
(previously "lookahead"
).drake_config()
(config$layout
) just to store the code analysis results. This is an intermediate structure between the workflow plan data frame and the graph. It will help clean up the internals in future development.label
argument to future()
inside make(parallelism = "future")
. That way , job names are target names by default if job.name
is used correctly in the batchtools
template file.dplyr
, evaluate
, fs
, future
, magrittr
, parallel
, R.utils
, stats
, stringi
, tidyselect
, and withr
.rprojroot
from “Suggests”.force
argument in all functions except make()
and drake_config()
.prune_envir()
to manage_memory()
.pruning_strategy
argument to memory_strategy
(make()
and drake_config()
).console_log_file
in real time (#588).vis_drake_graph()
hover text to display commands in the drake
plan more elegantly.predict_load_balancing()
and remove its reliance on internals that will go away in 2019 via #561.worker
column of config$plan
in predict_runtime()
and predict_load_balancing()
. This functionality will go away in 2019 via #561.predict_load_balancing()
to time
and workers
.predict_runtime()
and predict_load_balancing()
up to date.drake_session()
and rename to drake_get_session_info()
.timeout
argument in the API of make()
and drake_config()
. A value of timeout
can be still passed to these functions without error, but only the elapsed
and cpu
arguments impose actual timeouts now.map_plan()
function to easily create a workflow plan data frame to execute a function call over a grid of arguments.plan_to_code()
function to turn drake
plans into generic R scripts. New users can use this function to better understand the relationship between plans and code, and unsatisfied customers can use it to disentangle their projects from drake
altogether. Similarly, plan_to_notebook()
generates an R notebook from a drake
plan.drake_debug()
function to run a target’s command in debug mode. Analogous to drake_build()
.mode
argument to trigger()
to control how the condition
trigger factors into the decision to build or skip a target. See the ?trigger
for details.sleep
argument to make()
and drake_config()
to help the master process consume fewer resources during parallel processing.caching
argument for the "clustermq"
and "clustermq_staged"
parallel backends. Now, make(parallelism = "clustermq", caching = "master")
will do all the caching with the master process, and make(parallelism = "clustermq", caching = "worker")
will do all the caching with the workers. The same is true for parallelism = "clustermq_staged"
.append
argument to gather_plan()
, gather_by()
, reduce_plan()
, and reduce_by()
. The append
argument control whether the output includes the original plan
in addition to the newly generated rows.load_main_example()
, clean_main_example()
, and clean_mtcars_example()
.filter
argument to gather_by()
and reduce_by()
in order to restrict what we gather even when append
is TRUE
.make(parallelism = "hasty")
skips all of drake
’s expensive caching and checking. All targets run every single time and you are responsible for saving results to custom output files, but almost all the by-target overhead is gone.path.expand()
on the file
argument to render_drake_graph()
and render_sankey_drake_graph()
. That way, tildes in file paths no longer interfere with the rendering of static image files. Compensates for https://github.com/wch/webshot.evaluate_plan(trace = TRUE)
followed by expand_plan()
, gather_plan()
, reduce_plan()
, gather_by()
, or reduce_by()
. The more relaxed behavior also gives users more options about how to construct and maintain their workflow plan data frames."future"
parallelism to make sure files travel over network file systems before proceeding to downstream targets.visNetwork
package is not installed.make_targets()
if all the targets are already up to date.seed
argument in make()
and drake_config()
.caching
argument of make()
and drake_config()
to "master"
rather than "worker"
. The default option should be the lower-overhead option for small workflows. Users have the option to make a different set of tradeoffs for larger workflows.condition
trigger to evaluate to non-logical values as long as those values can be coerced to logicals.condition
trigger evaluate to a vector of length 1.drake_plan_source()
.make(verbose = 4)
now prints to the console when a target is stored.gather_by()
and reduce_by()
now gather/reduce everything if no columns are specified.make(jobs = 4)
was equivalent to make(jobs = c(imports = 4, targets = 4))
. Now, make(jobs = 4)
is equivalent to make(jobs = c(imports = 1, targets = 4))
. See issue 553 for details.verbose
is at least 2.load_mtcars_example()
.hook
argument of make()
and drake_config()
.gather_by()
and reduce_by()
, do not exclude targets with all NA
gathring variables.digest()
wherever possible. This puts old drake
projects out of date, but it improves speed.stringi
package no longer compiles on 3.2.0.code_dependencies()
, restrict the possible global variables to the ones mentioned in the new globals
argument (turned off when NULL
. In practical workflows, global dependencies are restricted to items in envir
and proper targets in the plan. In deps_code()
, the globals
slot of the output list is now a list of candidate globals, not necessarily actual globals (some may not be targets or variables in envir
).unlink()
in clean()
, set recursive
and force
to FALSE
. This should prevent the accidental deletion of whole directories.clean()
deleted input-only files if no targets from the plan were cached. A patch and a unit test are included in this release.loadd(not_a_target)
no longer loads every target in the cache.igraph
vertex attribute (fixes https://github.com/ropensci/drake/issues/503).knitr_in()
file code chunks.sort(NULL)
that caused warnings in R 3.3.3.analyze_loadd()
was sometimes quitting with “Error: attempt to set an attribute on NULL”.digest::digest(file = TRUE)
on directories. Instead, set hashes of directories to NA
. Users should still not directories as file dependencies.vis_drake_graph()
. Previously, these files were missing from the visualization, but actual workflows worked just fine. Ref: https://stackoverflow.com/questions/52121537/trigger-notification-from-report-generation-in-r-drake-packagecodetools
failures in R 3.3 (add a tryCatch()
statement in find_globals()
).clustermq
-based parallel backend: make(parallelism = "clustermq")
.evaluate_plan(trace = TRUE)
now adds a *_from
column to show the origins of the evaluated targets. Try evaluate_plan(drake_plan(x = rnorm(n__), y = rexp(n__)), wildcard = "n__", values = 1:2, trace = TRUE)
.gather_by()
and reduce_by()
, which gather on custom columns in the plan (or columns generated by evaluate_plan(trace = TRUE)
) and append the new targets to the previous plan.template
argument of clustermq
functions (e.g. Q()
and workers()
) as an argument of make()
and drake_config()
.code_to_plan()
function to turn R scripts and R Markdown reports into workflow plan data frames.drake_plan_source()
function, which generates lines of code for a drake_plan()
call. This drake_plan()
call produces the plan passed to drake_plan_source()
. The main purpose is visual inspection (we even have syntax highlighting via prettycode
) but users may also save the output to a script file for the sake of reproducibility or simple reference.deps_targets()
in favor of a new deps_target()
function (singular) that behaves more like deps_code()
.vis_drake_graph()
and render_drake_graph()
.vis_drake_graph()
and render_drake_graph()
.vis_drake_graph()
using the “title” node column.vis_drake_graph(collapse = TRUE)
.dependency_profile()
show major trigger hashes side-by-side to tell the user if the command, a dependency, an input file, or an ouptut file changed since the last make()
.txtq
package is installed.loadd()
and readd()
, giving specific usage guidance in prose.build_drake_graph()
and print to the console the ones that execute.txtq
is not installed.drake
’s code examples to this repository and make make drake_example()
and drake_examples()
download examples from there.show_output_files
argument to vis_drake_graph()
and friends."clustermq_staged"
and "future_lapply"
.igraph
attributes of the dependency graph to allow for smarter dependency/memory management during make()
.vis_drake_graph()
and sankey_drake_graph()
to save static image files via webshot
.static_drake_graph()
and render_static_drake_graph()
in favor of drake_ggraph()
and render_drake_ggraph()
.columns
argument to evaluate_plan()
so users can evaluate wildcards in columns other than the command
column of plan
.target()
so users do not have to (explicitly).sankey_drake_graph()
and render_sankey_drake_graph()
.static_drake_graph()
and render_static_drake_graph()
for ggplot2
/ggraph
static graph visualizations.group
and clusters
arguments to vis_drake_graph()
, static_drake_graph()
, and drake_graph_info()
to optionally condense nodes into clusters.trace
argument to evaluate_plan()
to optionally add indicator columns to show which targets got expanded/evaluated with which wildcard values.always_rename
argument to rename
in evaluate_plan()
.rename
argument to expand_plan()
.make(parallelism = "clustermq_staged")
, a clustermq
-based staged parallelism backend (see https://github.com/ropensci/drake/pull/452).make(parallelism = "future_lapply_staged")
, a future
-based staged parallelism backend (see https://github.com/ropensci/drake/pull/450).codetools
rather than CodeDepends
for finding global variables.loadd()
and readd()
dependencies in knitr
reports referenced with knitr_in()
inside imported functions. Previously, this feature was only available in explicit knitr_in()
calls in commands.drake_plan()
s.drake_batchtools_tmpl_file()
in favor of drake_hpc_template_file()
and drake_hpc_template_files()
.garbage_collection
argument to make()
. If TRUE
, gc()
is called after every new build of a target.sanitize_plan()
in make()
.tracked()
to accept only a drake_config()
object as an argument. Yes, it is technically a breaking change, but it is only a small break, and it is the correct API choice.DESCRIPTION
file.knitr
reports without warnings.lapply
-like backends, drake
uses persistent workers and a master process. In the case of "future_lapply"
parallelism, the master process is a separate background process called by Rscript
.make()
’s. (Previously, there were “check” messages and a call to staged_parallelism()
.)make(parallelism = c(imports = "mclapply_staged", targets = "mclapply")
.make(jobs = 1)
. Now, they are kept in memory until no downstream target needs them (for make(jobs = 1)
).predict_runtime()
. It is a more sensible way to go about predicting runtimes with multiple jobs. Likely to be more accurate.make()
no longer leave targets in the user’s environment.imports_only
argument to make()
and drake_config()
in favor of skip_targets
.migrate_drake_project()
.max_useful_jobs()
.upstream_only
argument to failed()
so users can list failed targets that do not have any failed dependencies. Naturally accompanies make(keep_going = TRUE)
.plyr
as a dependency.drake_plan()
and bind_plans()
.target()
to help create drake plans with custom columns.drake_gc()
, clean out disruptive files in storr
s with mangled keys (re: https://github.com/ropensci/drake/issues/198).load_basic_example()
in favor of load_mtcars_example()
.README.md
file on the main example rather than the mtcars example.README.Rmd
file to generate README.md
.deps_targets()
.deps()
in favor of deps_code()
pruning_strategy
argument to make()
and drake_config()
so the user can decide how drake
keeps non-import dependencies in memory when it builds a target.drake
plans to help users customize scheduling.makefile_path
argument to make()
and drake_config()
to avoid potential conflicts between user-side custom Makefile
s and the one written by make(parallelism = "Makefile")
.console
argument to make()
and drake_config()
so users can redirect console output to a file.show_source()
, readd(show_source = TRUE)
, loadd(show_source = TRUE)
.!!
operator from tidyeval and rlang
is parsed differently than in R <= 3.4.4. This change broke one of the tests in tests/testthat/tidy-eval.R
The main purpose of drake
’s 5.1.2 release is to fix the broken test.R CMD check
error from building the pdf manual with LaTeX.drake_plan()
, allow users to customize target-level columns using target()
inside the commands.bind_plans()
function to concatenate the rows of drake plans and then sanitize the aggregate plan.session
argument to tell make()
to build targets in a separate, isolated master R session. For example, make(session = callr::r_vanilla)
.reduce_plan()
function to do pairwise reductions on collections of targets..
) from being a dependency of any target or import. This enforces more consistent behavior in the face of the current static code analysis funcionality, which sometimes detects .
and sometimes does not.ignore()
to optionally ignore pieces of workflow plan commands and/or imported functions. Use ignore(some_code)
to
drake
to not track dependencies in some_code
, andsome_code
when it comes to deciding which target are out of date.drake
to only look for imports in environments inheriting from envir
in make()
(plus explicitly namespaced functions).loadd()
to ignore foreign imports (imports not explicitly found in envir
when make()
last imported them).loadd()
so that only targets (not imports) are loaded if the ...
and list
arguments are empty..gitignore
file containing "*"
to the default .drake/
cache folder every time new_cache()
is called. This means the cache will not be automatically committed to git. Users need to remove .gitignore
file to allow unforced commits, and then subsequent make()
s on the same cache will respect the user’s wishes and not add another .gitignore
. this only works for the default cache. Not supported for manual storr
s."future"
backend with a manual scheduler.dplyr
-style tidyselect
functionality in loadd()
, clean()
, and build_times()
. For build_times()
, there is an API change: for tidyselect
to work, we needed to insert a new ...
argument as the first argument of build_times()
.file_in()
for file inputs to commands or imported functions (for imported functions, the input file needs to be an imported file, not a target).file_out()
for output file targets (ignored if used in imported functions).knitr_in()
for knitr
/rmarkdown
reports. This tells drake
to look inside the source file for target dependencies in code chunks (explicitly referenced with loadd()
and readd()
). Treated as a file_in()
if used in imported functions.drake_plan()
so that it automatically fills in any target names that the user does not supply. Also, any file_out()
s become the target names automatically (double-quoted internally).read_drake_plan()
(rather than an empty drake_plan()
) the default plan
argument in all functions that accept a plan
.loadd(..., lazy = "bind")
. That way, when you have a target loaded in one R session and hit make()
in another R session, the target in your first session will automatically update.dataframes_graph()
.diagnose()
will take on the role of returning this metadata.read_drake_meta()
function in favor of diagnose()
.expose_imports()
function to optionally force drake
detect deeply nested functions inside specific packages.drake_build()
to be an exclusively user-side function.replace
argument to loadd()
so that objects already in the user’s eOne small thing:nvironment need not be replaced.seed
argument to make()
, drake_config()
, and load_basic_example()
. Also hard-code a default seed of 0
. That way, the pseudo-randomness in projects should be reproducible across R sessions.drake_read_seed()
function to read the seed from the cache. Its examples illustrate what drake
is doing to try to ensure reproducible random numbers.!!
for the ...
argument to drake_plan()
. Suppress this behavior using tidy_evaluation = FALSE
or by passing in commands passed through the list
argument.rlang::expr()
before evaluating them. That means you can use the quasiquotation operator !!
in your commands, and make()
will evaluate them according to the tidy evaluation paradigm.drake_example("basic")
, drake_example("gsp")
, and drake_example("packages")
to demonstrate how to set up the files for serious drake
projects. More guidance was needed in light of this issue.drake_plan()
in the help file (?drake_plan
).drake
to rOpenSci: https://github.com/ropensci/drakeconfig
argument, which you can get from drake_config()
or make()
. Examples:
cache$exists()
instead.make()
decides to build targets.storr
cache in a way that is not back-compatible with projects from versions 4.4.0 and earlier. The main change is to make more intelligent use of storr
namespaces, improving efficiency (both time and storage) and opening up possibilities for new features. If you attempt to run drake >= 5.0.0 on a project from drake <= 4.0.0, drake will stop you before any damage to the cache is done, and you will be instructed how to migrate your project to the new drake.formatR::tidy_source()
instead of parse()
in tidy_command()
(originally tidy()
in R/dependencies.R
). Previously, drake
was having problems with an edge case: as a command, the literal string "A"
was interpreted as the symbol A
after tidying. With tidy_source()
, literal quoted strings stay literal quoted strings in commands. This may put some targets out of date in old projects, yet another loss of back compatibility in version 5.0.0.rescue_cache()
, exposed to the user and used in clean()
. This function removes dangling orphaned files in the cache so that a broken cache can be cleaned and used in the usual ways once more.cpu
and elapsed
arguments of make()
to NULL
. This solves an elusive bug in how drake imposes timeouts.graph
argument to functions make()
, outdated()
, and missed()
.prune_graph()
function for igraph objects.prune()
and status()
.analyses()
=> plan_analyses()
as_file()
=> as_drake_filename()
backend()
=> future::plan()
build_graph()
=> build_drake_graph()
check()
=> check_plan()
config()
=> drake_config()
evaluate()
=> evaluate_plan()
example_drake()
=> drake_example()
examples_drake()
=> drake_examples()
expand()
=> expand_plan()
gather()
=> gather_plan()
plan()
, workflow()
, workplan()
=> drake_plan()
plot_graph()
=> vis_drake_graph()
read_config()
=> read_drake_config()
read_graph()
=> read_drake_graph()
read_plan()
=> read_drake_plan()
render_graph()
=> render_drake_graph()
session()
=> drake_session()
summaries()
=> plan_summaries()
output
and code
as names in the workflow plan data frame. Use target
and command
instead. This naming switch has been formally deprecated for several months prior.drake_quotes()
, drake_unquote()
, and drake_strings()
to remove the silly dependence on the eply
package.skip_safety_checks
flag to make()
and drake_config()
. Increases speed.sanitize_plan()
, remove rows with blank targets "".purge
argument to clean()
to optionally remove all target-level information.namespace
argument to cached()
so users can inspect individual storr
namespaces.verbose
to numeric: 0 = print nothing, 1 = print progress on imports only, 2 = print everything.next_stage()
function to report the targets to be made in the next parallelizable stage.session_info
argument to make()
. Apparently, sessionInfo()
is a bottleneck for small make()
s, so there is now an option to suppress it. This is mostly for the sake of speeding up unit tests.log_progress
argument to make()
to suppress progress logging. This increases storage efficiency and speeds some projects up a tiny bit.namespace
argument to loadd()
and readd()
. You can now load and read from non-default storr
namespaces.drake_cache_log()
, drake_cache_log_file()
, and make(..., cache_log_file = TRUE)
as options to track changes to targets/imports in the drake cache.rmarkdown::render()
, not just knit()
.drake
properly.plot_graph()
to display subcomponents. Check out arguments from
, mode
, order
, and subset
. The graph visualization vignette has demonstrations."future_lapply"
parallelism: parallel backends supported by the future and future.batchtools packages. See ?backend
for examples and the parallelism vignette for an introductory tutorial. More advanced instruction can be found in the future
and future.batchtools
packages themselves.diagnose()
.hook
argument to make()
to wrap around build()
. That way, users can more easily control the side effects of distributed jobs. For example, to redirect error messages to a file in make(..., parallelism = "Makefile", jobs = 2, hook = my_hook)
, my_hook
should be something like function(code){withr::with_message_sink("messages.txt", code)}
.drake
was previously using the outfile
argument for PSOCK clusters to generate output that could not be caught by capture.output()
. It was a hack that should have been removed before.drake
was previously using the outfile
argument for PSOCK clusters to generate output that could not be caught by capture.output()
. It was a hack that should have been removed before.make()
and outdated()
print “All targets are already up to date” to the console."future_lapply"
backends.plot_graph()
and progress()
. Also see the new failed()
function, which is similar to in_progress()
.parLapply
parallelism. The downside to this fix is that drake
has to be properly installed. It should not be loaded with devtools::load_all()
. The speedup comes from lightening the first clusterExport()
call in run_parLapply()
. Previously, we exported every single individual drake
function to all the workers, which created a bottleneck. Now, we just load drake
itself in each of the workers, which works because build()
and do_prework()
are exported.overwrite
to FALSE
in load_basic_example()
.report.Rmd
in load_basic_example()
.get_cache(..., verbose = TRUE)
.lightly_parallelize()
and lightly_parallelize_atomic()
. Now, processing happens faster, and only over the unique values of a vector.make_with_config()
function to do the work of make()
on an existing internal configuration list from drake_config()
.drake_batchtools_tmpl_file()
to write a batchtools
template file from one of the examples (drake_example()
), if one exists.Version 4.3.0 has: - Reproducible random numbers - Automatic detection of knitr dependencies - More vignettes - Bugfixes
Version 4.2.0 will be released today. There are several improvements to code style and performance. In addition, there are new features such as cache/hash externalization and runtime prediction. See the new storage and timing vignettes for details. This release has automated checks for back-compatibility with existing projects, and I also did manual back compatibility checks on serious projects.
Version 3.0.0 is coming out. It manages environments more intelligently so that the behavior of make()
is more consistent with evaluating your code in an interactive session.
Version 1.0.1 is on CRAN! I’m already working on a massive update, though. 2.0.0 is cleaner and more powerful.