Chapter 1 Introduction

The targets package is a Make-like pipeline toolkit for Statistics and data science in R. With targets, you can maintain a reproducible workflow without repeating yourself. targets learns how your pipeline fits together, skips costly runtime for tasks that are already up to date, runs only the necessary computation, supports implicit parallel computing, abstracts files as R objects, and shows tangible evidence that the results match the underlying code and data.

This manual is a step-by-step written guide to targets. The current chapter elaborates on the role and benefits of targets, and subsequent chapters walk through the major functionality. See the documentation website for most other major resources, including installation instructions, links to example projects, and a reference page with all user-side functions.

1.1 Motivation

Data analysis can be slow. A round of scientific computation can take several minutes, hours, or even days to complete. After it finishes, if you update your code or data, your hard-earned results may no longer be valid. Unchecked, this invalidation creates chronic Sisyphean loop:

  1. Launch the code.
  2. Wait while it runs.
  3. Discover an issue.
  4. Restart from scratch.

1.2 Pipeline toolkits

Pipeline toolkits like GNU Make break the cycle. They watch the dependency graph of the whole workflow and skip steps, or “targets”, whose code, data, and upstream dependencies have not changed since the last run of the pipeline. When all targets are up to date, this is evidence that the results match the underlying code and data, which helps us trust the results and confirm the computation is reproducible.

1.3 The targets package

Unlike most pipeline toolkits, which are language agnostic or Python-focused, the targets package allows data scientists and researchers to work entirely within R. targets implicitly nudges users toward a clean, function-oriented programming style that fits the intent of the R language and helps practitioners maintain their data analysis projects.

1.4 What about drake?

The drake package is an older and more established R-focused pipeline toolkit. It is has become a key piece of the R ecosystem, and development will continue. However, years of community feedback have exposed major user-side limitations regarding data management, collaboration, parallel efficiency, and pipeline archetypes. Unfortunately, these limitations are permanent. Solutions in drake itself would make the package incompatible with existing projects that use it. That is why targets was created. The targets package borrows from past learnings and attempts to advance the user experience beyond drake’s potential capabilities. Please see the statement of need for technical details.

If you know drake, then you already almost know targets. The programming style is similar, and most functions in targets have counterparts in drake.

Functions in drake Counterparts in targets
use_drake(), drake_script() tar_script()
drake_plan() tar_pipeline(), tar_manifest()
target() tar_target(), tar_target_raw()
drake_config() tar_option_set()
outdated(), r_outdated() tar_outdated()
vis_drake_graph(), r_vis_drake_graph() tar_visnetwork(), tar_glimpse()
drake_graph_info(), r_drake_graph_info() tar_network()
make(), r_make() tar_make(), tar_make_clustermq(), tar_make_future()
loadd() tar_load()
readd() tar_read()
diagnose(), build_times(), cached(), drake_cache_log() tar_meta()
drake_progress(), drake_running(), drake_done(), drake_failed(), drake_cancelled() tar_progress()
clean() tar_deduplicate(), tar_delete(), tar_destroy(), tar_invalidate()
drake_gc() tar_prune()
id_chr() tar_name(), tar_path()
knitr_in() tarchetypes::tar_render()
cancel(), cancel_if() tar_cancel()
trigger() tar_cue()
drake_example(), drake_example(), load_mtcars_example(), clean_mtcars_example() Unsupported. Example targets pipelines are in individual repositories linked from here.
drake_build() Unsupported in targets to ensure coherence with dynamic branching.
drake_debug() Read here to learn about interactive debugging in targets.
drake_history(), recoverable() Unsupported in targets. Instead of trying to manage history and data recovery directly, targets maintains a much lighter/friendlier data store to make it easier to use external data versioning tools instead.
missed(), tracked(), deps_code(), deps_target(), deps_knitr(), deps_profile() Unsupported in targets because dependency detection is far easier to understand than in drake.
drake_hpc_template_file(), drake_hpc_template_files() Deemed out of scope for targets.
drake_cache(), new_cache(), find_cache(). Unsupported because targets is far more strict and paternalistic about data/file management.
rescue_cache(), which_clean(), cache_planned(), cache_unplanned() Unsupported due to the simplified data management system and storage cleaning functions.
drake_get_session_info() Deemed superfluous and a potential bottleneck. Discarded for targets.
read_drake_seed() Superfluous because targets always uses the same global seed. tar_meta() shows all the target-level seeds.
show_source() Deemed superfluous. Discarded in targets to conserve storage space in _targets/meta/meta.
drake_tempfile() Superfluous in targets because there is no special disk.frame storage format. (Dynamic file targets are much better for managing disk.frames.)
file_store() Superfluous in targets because all files are dynamic files and there is no longer a need to Base32-encode any file names.

Likewise, many make() arguments have equivalent arguments elsewhere.

Argument of drake::make() Counterparts in targets
targets names in tar_make() etc.
envir envir in tar_option_set()
verbose reporter in tar_make() etc.
parallelism Choice of function: tar_make() vs tar_make_clustermq() vs tar_make_future()
jobs workers in tar_make_clustermq() and tar_make_future()
packages packages in tar_target() and tar_option_set()
lib_loc library in tar_target() and tar_option_set()
trigger cue in tar_target() and tar_option_set()
caching storage and retrieval in tar_target() and tar_option_set()
keep_going error in tar_target() and tar_option_set()
memory_strategy memory in tar_target() and tar_option_set()
garbage_collection garbage_collection in tar_target() and tar_option_set()
template resources in tar_target() and tar_option_set()
curl_handles handle element of resources argument of tar_target() and tar_option_set()
format format in tar_target() and tar_option_set()

In addition, many optional columns of drake plans are expressed differently in targets.

Optional column of drake plans Feature in targets
format format argument of tar_target() and tar_option_set()
dynamic pattern argument of tar_target() and tar_option_set()
transform static branching functions in tarchetypes such as tar_map() and tar_combine()
trigger cue argument of tar_target() and tar_option_set()
hpc deployment argument of tar_target() and tar_option_set()
resources resources argument of tar_target() and tar_option_set()
caching storage and retrieval arguments of tar_target() and tar_option_set()
Copyright Eli Lilly and Company