Create an R6
object to launch and maintain
workers as SLURM jobs.
Usage
crew_launcher_slurm(
name = NULL,
seconds_interval = 0.5,
seconds_timeout = 60,
seconds_launch = 86400,
seconds_idle = Inf,
seconds_wall = Inf,
tasks_max = Inf,
tasks_timers = 0L,
reset_globals = TRUE,
reset_packages = FALSE,
reset_options = FALSE,
garbage_collection = FALSE,
crashes_error = 5L,
tls = crew::crew_tls(mode = "automatic"),
r_arguments = c("--no-save", "--no-restore"),
options_metrics = crew::crew_options_metrics(),
options_cluster = crew.cluster::crew_options_slurm(),
verbose = NULL,
command_submit = NULL,
command_terminate = NULL,
command_delete = NULL,
script_directory = NULL,
script_lines = NULL,
slurm_log_output = NULL,
slurm_log_error = NULL,
slurm_memory_gigabytes_required = NULL,
slurm_memory_gigabytes_per_cpu = NULL,
slurm_cpus_per_task = NULL,
slurm_time_minutes = NULL,
slurm_partition = NULL
)
Arguments
- name
Name of the launcher.
- seconds_interval
Number of seconds between polling intervals waiting for certain internal synchronous operations to complete, such as checking
mirai::status()
.- seconds_timeout
Number of seconds until timing out while waiting for certain synchronous operations to complete, such as checking
mirai::status()
.- seconds_launch
Seconds of startup time to allow. A worker is unconditionally assumed to be alive from the moment of its launch until
seconds_launch
seconds later. Afterseconds_launch
seconds, the worker is only considered alive if it is actively connected to its assign websocket.- seconds_idle
Maximum number of seconds that a worker can idle since the completion of the last task. If exceeded, the worker exits. But the timer does not launch until
tasks_timers
tasks have completed. See theidletime
argument ofmirai::daemon()
.crew
does not excel with perfectly transient workers because it does not micromanage the assignment of tasks to workers, so please allow enough idle time for a new worker to be delegated a new task.- seconds_wall
Soft wall time in seconds. The timer does not launch until
tasks_timers
tasks have completed. See thewalltime
argument ofmirai::daemon()
.- tasks_max
Maximum number of tasks that a worker will do before exiting. See the
maxtasks
argument ofmirai::daemon()
.crew
does not excel with perfectly transient workers because it does not micromanage the assignment of tasks to workers, it is recommended to settasks_max
to a value greater than 1.- tasks_timers
Number of tasks to do before activating the timers for
seconds_idle
andseconds_wall
. See thetimerstart
argument ofmirai::daemon()
.- reset_globals
TRUE
to reset global environment variables between tasks,FALSE
to leave them alone.- reset_packages
TRUE
to unload any packages loaded during a task (runs between each task),FALSE
to leave packages alone.- reset_options
TRUE
to reset global options to their original state between each task,FALSE
otherwise. It is recommended to only setreset_options = TRUE
ifreset_packages
is alsoTRUE
because packages sometimes rely on options they set at loading time.- garbage_collection
TRUE
to run garbage collection between tasks,FALSE
to skip.- crashes_error
Positive integer scalar. If a worker exits
crashes_error
times in a row without completing all its assigned tasks, then the launcher throws an informative error. The reason forcrashes_error
is to avoid an infinite loop where a task crashes a worker (through a segfault, maxing out memory, etc) but the worker always relaunches. To monitor the resources ofcrew
workers, please see https://wlandau.github.io/crew/articles/logging.html.- tls
A TLS configuration object from
crew_tls()
.- r_arguments
Optional character vector of command line arguments to pass to
Rscript
(non-Windows) orRscript.exe
(Windows) when starting a worker. Example:r_arguments = c("--vanilla", "--max-connections=32")
.- options_metrics
Either
NULL
to opt out of resource metric logging for workers, or an object fromcrew_options_metrics()
to enable and configure resource metric logging for workers.- options_cluster
An options list from
crew_options_slurm()
with cluster-specific configuration options.- verbose
Deprecated. Use
options_cluster
instead.- command_submit
Deprecated. Use
options_cluster
instead.- command_terminate
Deprecated. Use
options_cluster
instead.- command_delete
Deprecated on 2024-01-08 (version 0.1.4.9001). Use
command_terminate
instead.- script_directory
Deprecated. Use
options_cluster
instead.- script_lines
Deprecated. Use
options_cluster
instead.- slurm_log_output
Deprecated. Use
options_cluster
instead.- slurm_log_error
Deprecated. Use
options_cluster
instead.- slurm_memory_gigabytes_required
Deprecated. Use
options_cluster
instead.- slurm_memory_gigabytes_per_cpu
Deprecated. Use
options_cluster
instead.- slurm_cpus_per_task
Deprecated. Use
options_cluster
instead.- slurm_time_minutes
Deprecated. Use
options_cluster
instead.- slurm_partition
Deprecated. Use
options_cluster
instead.
Details
WARNING: the crew.cluster
SLURM plugin is experimental
and has not actually been tested on a SLURM cluster. Please proceed
with caution and report bugs to
https://github.com/wlandau/crew.cluster.
To launch a SLURM worker, this launcher
creates a temporary job script with a call to crew::crew_worker()
and submits it as an SLURM job with sbatch
. To see most of the lines
of the job script in advance, use the script()
method of the launcher.
It has all the lines except for the job name and the
call to crew::crew_worker()
, both of
which will be inserted at the last minute when it is time
to actually launch a worker.
Attribution
The template files at
https://github.com/mschubert/clustermq/tree/master/inst
informed the development of the crew
launcher plugins in
crew.cluster
, and we would like to thank
Michael Schubert for developing clustermq
and releasing it under
the permissive Apache License 2.0.
See the NOTICE
and README.md
files in the crew.cluster
source code for additional attribution.
See also
Other slurm:
crew_class_launcher_slurm
,
crew_class_monitor_slurm
,
crew_controller_slurm()
,
crew_monitor_slurm()
,
crew_options_slurm()