Skip to contents

crew facilitates computationally demanding workloads that consume a lot of memory and CPU resources, and workloads that exhaust these resources may stall or crash. Oftentimes, a crash is silent and there are no clues for troubleshooting. But if you proactively monitor crew processes in log files, you can keep track of resource usage and generated prospective data which is useful in diagnosing problems. This vignette describes best practices for logging.

Processes

crew runs in 3 different kinds of R processes:

  1. The local R session from which you call crew_controller_local() or similar.
  2. The mirai dispatcher which sends tasks to workers. The dispatcher starts on controller$start() and terminates on controller$terminate() or when the local R session closes. The dispatcher always runs on the same machine as (1).
  3. The workers that accept tasks from (2). These workers may run on different machines, e.g. on a cluster or the cloud, depending on the launcher plugin you are using.

This vignette first covers logging for local the R session and dispatcher (1 and 2), then logging for workers (3).

Logging worker processes

Each launcher plugin has its own way to turn on logging for workers. In the case of the local plugin (crew_controller_local()), the crew_options_local() has arguments to configure how messages are written to log file, and each worker instance gets its own log file in a directory you specify. Other plugins in packages such as crew.cluster and crew.aws.batch have different arguments to configure platform-specific logging mechanisms (e.g. SLURM logs or AWS CloudWatch logs). Example for the local plugin:

library(crew)
log_directory <- tempfile()
controller <- crew_controller_local(
  workers = 2,
  options_local = crew_options_local(log_directory = log_directory)
)
controller$start()
controller$push(print("task 1"))
controller$push(print("task 2"))
controller$wait()
controller$terminate()
log_files <- list.files(log_directory, full.names = TRUE)

writeLines(readLines(log_files[1]))
#> R version 4.4.0 (2024-04-24)
#> ... # more session info...
#> [1] "task 1"

writeLines(readLines(log_files[2]))
#> R version 4.4.0 (2024-04-24)
#> ... # more session info...
#> [1] "task 2"

To add resource metrics to existing log files, regardless of your platform, use the options_metrics argument and set path = "/dev/stdout".1

library(crew)
log_directory <- tempfile()
controller <- crew_controller_local(
  name = "a",
  workers = 2,
  options_local = crew_options_local(log_directory = log_directory),
  options_metrics = crew_options_metrics(
    path = "/dev/stdout",
    seconds_interval = 1
  )
)
controller$start()
controller$push({
  Sys.sleep(2.5)
  print("task 1")
  Sys.sleep(2.5)
})
controller$push({
  Sys.sleep(5)
  print("task 2")
})
controller$wait()
controller$terminate()
log_files <- list.files(log_directory, full.names = TRUE)


writeLines(readLines(log_files[1]))
#> R version 4.4.0 (2024-04-24)
#> ... # more session info...
#> __AUTOMETRIC__|0.0.5|43098|crew_worker_a_1_c7839a65097de181ee1b2939|0|1728410013.006|2.100|0.210|116834304|420885200896|__AUTOMETRIC__
#> __AUTOMETRIC__|0.0.5|43098|crew_worker_a_1_c7839a65097de181ee1b2939|0|1728410014.018|0.000|0.000|116899840|420894638080|__AUTOMETRIC__
#> [1] "task 1"
#> __AUTOMETRIC__|0.0.5|43098|crew_worker_a_1_c7839a65097de181ee1b2939|0|1728410015.020|0.000|0.000|116932608|420903026688|__AUTOMETRIC__
#> __AUTOMETRIC__|0.0.5|43098|crew_worker_a_1_c7839a65097de181ee1b2939|0|1728410016.024|0.000|0.000|116932608|420903026688|__AUTOMETRIC__

writeLines(readLines(log_files[2]))
#> R version 4.4.0 (2024-04-24)
#> ... # more session info...
#> __AUTOMETRIC__|0.0.5|43099|crew_worker_a_2_3647db437624830b7b0491af|0|1728410013.006|2.100|0.210|115671040|420750983168|__AUTOMETRIC__
#> __AUTOMETRIC__|0.0.5|43099|crew_worker_a_2_3647db437624830b7b0491af|0|1728410014.014|0.000|0.000|115736576|420760420352|__AUTOMETRIC__
#> __AUTOMETRIC__|0.0.5|43099|crew_worker_a_2_3647db437624830b7b0491af|0|1728410015.020|0.000|0.000|115736576|420760420352|__AUTOMETRIC__
#> __AUTOMETRIC__|0.0.5|43099|crew_worker_a_2_3647db437624830b7b0491af|0|1728410016.025|0.000|0.000|115736576|420760420352|__AUTOMETRIC__
#> [1] "task 2"

The lines with __AUTOMETRIC__ record resource usage metrics over the life cycle of each worker. Use autometric::log_read() to read these metrics and autometric::log_plot() to visualize them. Below, the time column is elapsed time in seconds since logging began, and resident is memory usage in megabytes. More details are at autometric::log_read().

data <- autometric::log_read(log_directory)
data$name <- substr(data$name, 0, 20)
data
#>   version   pid                 name status  time core  cpu resident  virtual
#> 1   0.0.5 43098 crew_worker_a_1_c783      0 0.000  2.1 0.21 116.8343 420885.2
#> 2   0.0.5 43098 crew_worker_a_1_c783      0 1.012  0.0 0.00 116.8998 420894.6
#> 3   0.0.5 43098 crew_worker_a_1_c783      0 2.014  0.0 0.00 116.9326 420903.0
#> 4   0.0.5 43098 crew_worker_a_1_c783      0 3.018  0.0 0.00 116.9326 420903.0
#> 5   0.0.5 43099 crew_worker_a_2_3647      0 0.000  2.1 0.21 115.6710 420751.0
#> 6   0.0.5 43099 crew_worker_a_2_3647      0 1.008  0.0 0.00 115.7366 420760.4
#> 7   0.0.5 43099 crew_worker_a_2_3647      0 2.014  0.0 0.00 115.7366 420760.4
#> 8   0.0.5 43099 crew_worker_a_2_3647      0 3.019  0.0 0.00 115.7366 420760.4
autometric::log_plot(data, pid = 43098, metric = "resident")

Logging local processes

You can use the autometric package directly to monitor local crew processes (including the mirai dispatcher). To begin, you need the process IDs of the local processes. As of crew version 0.9.5.9012, every controller has a pids() method to get the process IDs of all the local processes.

library(crew)
controller <- crew_controller_local()
controller$start()
controller$pids()
#>                               local dispatcher-e07cd32eeac18cb9e1dd59fb 
#>                             2104319                             2104731

You can pass this named vector of process IDs directly into autometric::log_start(), which starts a background thread which logs resource usage metrics to the text file of your choice.2 Remember to call autometric::log_stop() when you are done (or restart R) to stop logging.

path <- tempfile()
autometric::log_start(
  path = path,
  seconds = 1,
  pids = controller$pids()
)
controller$push(Sys.sleep(5))
controller$wait()
autometric::log_stop()
controller$terminate()

Use autometric::log_read() to read these metrics and autometric::log_plot() to visualize them. Below, the time column is elapsed time in seconds since logging began, and resident is memory usage in megabytes. More details are at autometric::log_read().

data <- autometric::log_read(path)
data$name <- substr(data$name, 0, 10)
data
#>    version     pid       name status  time  core   cpu resident   virtual
#> 1    0.0.1 2104319      local      0 0.000 0.000 0.000 195.1007  9708.073
#> 2    0.0.1 2104731 dispatcher      0 0.000 0.000 0.000  31.7399  9514.222
#> 3    0.0.1 2104319      local      0 1.000 4.998 0.078 195.1007  9842.328
#> 4    0.0.1 2104731 dispatcher      0 1.000 0.000 0.000  31.7399  9849.901
#> 5    0.0.1 2104319      local      0 2.000 0.000 0.000 195.1007  9909.436
#> 6    0.0.1 2104731 dispatcher      0 2.000 0.000 0.000  31.7399  9984.119
#> 7    0.0.1 2104319      local      0 3.001 0.000 0.000 195.1007  9909.436
#> 8    0.0.1 2104731 dispatcher      0 3.001 0.000 0.000  31.7399 10118.337
#> 9    0.0.1 2104319      local      0 4.001 0.000 0.000 195.1007  9976.545
#> 10   0.0.1 2104731 dispatcher      0 4.001 0.000 0.000  31.7399 10185.445

In targets

As explained at https://books.ropensci.org/targets/crew.html, crew helps targets orchestrate pipelines. For a targets pipeline, to log all local processes as in the previous section, you will need to pass an already started controller to tar_option_set(). To do this safely:

  1. Only start the controller if you are actually running a pipeline. You can use targets::tar_active() in _targets.R to check if tar_make() is running.
  2. Do not run tar_make() with callr_function = NULL. This ensures the logging thread quits when the tar_make() isolated callr R process quits.

Example _targets.R file:

library(autometric)
library(crew)
library(targets)
library(tarchetypes)

if (tar_active()) {
  controller <- crew_controller_local(workers = 2)
  controller$start()
  log_start(
    path = "log.txt",
    seconds = 1,
    pids = controller$pids()
  )
}

tar_option_set(controller = controller)

list(
  tar_target(name = sleep1, command = Sys.sleep(5)),
  tar_target(name = sleep2, command = Sys.sleep(5))
)