Create a controller with a Sun Grid Engine (SGE) launcher.
Source:R/crew_controller_sge.R
crew_controller_sge.Rd
Create an R6
object to submit tasks and
launch workers on Sun Grid Engine (SGE) workers.
Usage
crew_controller_sge(
name = NULL,
workers = 1L,
host = NULL,
port = NULL,
tls = crew::crew_tls(mode = "automatic"),
tls_enable = NULL,
tls_config = NULL,
seconds_interval = 0.25,
seconds_timeout = 60,
seconds_launch = 86400,
seconds_idle = Inf,
seconds_wall = Inf,
seconds_exit = NULL,
retry_tasks = TRUE,
log_resources = NULL,
tasks_max = Inf,
tasks_timers = 0L,
reset_globals = TRUE,
reset_packages = FALSE,
reset_options = FALSE,
garbage_collection = FALSE,
launch_max = 5L,
r_arguments = c("--no-save", "--no-restore"),
verbose = FALSE,
command_submit = as.character(Sys.which("qsub")),
command_terminate = as.character(Sys.which("qdel")),
command_delete = NULL,
script_directory = tempdir(),
script_lines = character(0L),
sge_cwd = TRUE,
sge_envvars = FALSE,
sge_log_output = "/dev/null",
sge_log_error = NULL,
sge_log_join = TRUE,
sge_memory_gigabytes_limit = NULL,
sge_memory_gigabytes_required = NULL,
sge_cores = NULL,
sge_gpu = NULL
)
Arguments
- name
Name of the client object. If
NULL
, a name is automatically generated.- workers
Integer, maximum number of parallel workers to run.
- host
IP address of the
mirai
client to send and receive tasks. IfNULL
, the host defaults to the local IP address.- port
TCP port to listen for the workers. If
NULL
, then an available ephemeral port is automatically chosen.- tls
A TLS configuration object from
crew_tls()
.- tls_enable
Deprecated on 2023-09-15 in version 0.4.1. Use argument
tls
instead.- tls_config
Deprecated on 2023-09-15 in version 0.4.1. Use argument
tls
instead.- seconds_interval
Number of seconds between polling intervals waiting for certain internal synchronous operations to complete, such as checking
mirai::status()
- seconds_timeout
Number of seconds until timing out while waiting for certain synchronous operations to complete, such as checking
mirai::status()
.- seconds_launch
Seconds of startup time to allow. A worker is unconditionally assumed to be alive from the moment of its launch until
seconds_launch
seconds later. Afterseconds_launch
seconds, the worker is only considered alive if it is actively connected to its assign websocket.- seconds_idle
Maximum number of seconds that a worker can idle since the completion of the last task. If exceeded, the worker exits. But the timer does not launch until
tasks_timers
tasks have completed. See theidletime
argument ofmirai::daemon()
.crew
does not excel with perfectly transient workers because it does not micromanage the assignment of tasks to workers, so please allow enough idle time for a new worker to be delegated a new task.- seconds_wall
Soft wall time in seconds. The timer does not launch until
tasks_timers
tasks have completed. See thewalltime
argument ofmirai::daemon()
.- seconds_exit
Deprecated on 2023-09-21 in version 0.1.2.9000. No longer necessary.
- retry_tasks
TRUE
to automatically retry a task in the event of an unexpected worker exit.FALSE
to give up on the first exit and return amirai
error code (code number 19).TRUE
(default) is recommended in most situations. UseFALSE
for debugging purposes, e.g. to confirm that a task is causing a worker to run out of memory or crash in some other way.- log_resources
Optional character string with a file path to a text file to log memory consumption. Set
log_resources
toNULL
to avoid writing to a log file. If you supply a path, then thelog()
method will write memory usage statistics to the file, and most controller methods will do the same with throttling so resource consumption is recorded throughout the whole life cycle of the controller.The log file is in comma-separated values (CSV) format which can be easily read by
readr::read_csv()
. The controller automatically deletes the old log file when it starts (whencontroller$start()
is called for the first time, but not subsequent times).The log file has one row per observation of a process, including the current R process ("client") and the
mirai
dispatcher. If the dispatcher is not included in the output, it means the dispatcher process is not running. Columns include: *type
: the type of process (client or dispatcher) *pid
: the process ID. *status
: The process status (fromps::ps_status()
). *rss
: resident set size (RSS). RS is the total memory held by a process, including shared libraries which may also be in use by other processes. RSS is obtained fromps::ps_memory_info()
and shown in bytes. *elapsed
: number of elapsed seconds since the R process was started (fromproc.time()["elapsed"]
).- tasks_max
Maximum number of tasks that a worker will do before exiting. See the
maxtasks
argument ofmirai::daemon()
.crew
does not excel with perfectly transient workers because it does not micromanage the assignment of tasks to workers, it is recommended to settasks_max
to a value greater than 1.- tasks_timers
Number of tasks to do before activating the timers for
seconds_idle
andseconds_wall
. See thetimerstart
argument ofmirai::daemon()
.- reset_globals
TRUE
to reset global environment variables between tasks,FALSE
to leave them alone.- reset_packages
TRUE
to unload any packages loaded during a task (runs between each task),FALSE
to leave packages alone.- reset_options
TRUE
to reset global options to their original state between each task,FALSE
otherwise. It is recommended to only setreset_options = TRUE
ifreset_packages
is alsoTRUE
because packages sometimes rely on options they set at loading time.- garbage_collection
TRUE
to run garbage collection between tasks,FALSE
to skip.- launch_max
Positive integer of length 1, maximum allowed consecutive launch attempts which do not complete any tasks. Enforced on a worker-by-worker basis. The futile launch count resets to back 0 for each worker that completes a task. It is recommended to set
launch_max
above 0 because sometimes workers are unproductive under perfectly ordinary circumstances. Butlaunch_max
should still be small enough to detect errors in the underlying platform.- r_arguments
Optional character vector of command line arguments to pass to
Rscript
(non-Windows) orRscript.exe
(Windows) when starting a worker. Example:r_arguments = c("--vanilla", "--max-connections=32")
.- verbose
Logical, whether to see console output and error messages when submitting worker.
- command_submit
Character of length 1, file path to the executable to submit a worker job.
- command_terminate
Character of length 1, file path to the executable to terminate a worker job. Set to
""
to skip manually terminating the worker. Unless there is an issue with the platform, the job should still exit thanks to the NNG-powered network programming capabilities ofmirai
. Still, if you setcommand_terminate = ""
, you are assuming extra responsibility for manually monitoring your jobs on the cluster and manually terminating jobs as appropriate.- command_delete
Deprecated on 2024-01-08 (version 0.1.4.9001). Use
command_terminate
instead.- script_directory
Character of length 1, directory path to the job scripts. Just before each job submission, a job script is created in this folder. Script base names are unique to each launcher and worker, and the launcher deletes the script when the worker is manually terminated.
tempdir()
is the default, but it might not work for some systems.tools::R_user_dir("crew.cluster", which = "cache")
is another reasonable choice.- script_lines
Optional character vector of additional lines to be added to the job script just after the more common flags. An example would be
script_lines = "module load R"
if your cluster supports R through an environment module.- sge_cwd
Logical of length 1, whether to launch the worker from the current working directory (as opposed to the user home directory).
sge_cwd = TRUE
translates to a line of#$ -cwd
in the SGE job script.sge_cwd = FALSE
omits this line.- sge_envvars
Logical of length 1, whether to forward the environment variables of the current session to the SGE worker.
sge_envvars = TRUE
translates to a line of#$ -V
in the SGE job script.sge_envvars = FALSE
omits this line.- sge_log_output
Character of length 1, file or directory path to SGE worker log files for standard output.
sge_log_output = "VALUE"
translates to a line of#$ -o VALUE
in the SGE job script. The default is/dev/null
to omit the logs. If you do supply a non-/dev/null
value, it is recommended to supply a directory path with a trailing slash so that each worker gets its own set of log files.- sge_log_error
Character of length 1, file or directory path to SGE worker log files for standard error.
sge_log_error = "VALUE"
translates to a line of#$ -e VALUE
in the SGE job script. The default ofNULL
omits this line. If you do supply a non-/dev/null
value, it is recommended to supply a directory path with a trailing slash so that each worker gets its own set of log files.- sge_log_join
Logical, whether to join the stdout and stderr log files together into one file.
sge_log_join = TRUE
translates to a line of#$ -j y
in the SGE job script, whilesge_log_join = FALSE
is equivalent to#$ -j n
. Ifsge_log_join = TRUE
, thensge_log_error
should beNULL
.- sge_memory_gigabytes_limit
Optional numeric of length 1 with the maximum number of gigabytes of memory a worker is allowed to consume. If the worker consumes more than this level of memory, then SGE will terminate it.
sge_memory_gigabytes_limit = 5.7"
translates to a line of"#$ -l h_rss=5.7G"
in the SGE job script.sge_memory_gigabytes_limit = NULL
omits this line.- sge_memory_gigabytes_required
Optional positive numeric of length 1 with the gigabytes of memory required to run the worker.
sge_memory_gigabytes_required = 2.4
translates to a line of#$ -l m_mem_free=2.4G
in the SGE job script.sge_memory_gigabytes_required = NULL
omits this line.- sge_cores
Optional positive integer of length 1, number of cores per worker ("slots" in SGE lingo).
sge_cores = 4
translates to a line of#$ -pe smp 4
in the SGE job script.sge_cores = NULL
omits this line.- sge_gpu
Optional integer of length 1 with the number of GPUs to request for the worker.
sge_gpu = 1
translates to a line of"#$ -l gpu=1"
in the SGE job script.sge_gpu = NULL
omits this line.
Attribution
The template files at
https://github.com/mschubert/clustermq/tree/master/inst
informed the development of the crew
launcher plugins in
crew.cluster
, and we would like to thank
Michael Schubert for developing clustermq
and releasing it under
the permissive Apache License 2.0.
See the NOTICE
and README.md
files in the crew.cluster
source code for additional attribution.
See also
Other sge:
crew_class_launcher_sge
,
crew_class_monitor_sge
,
crew_launcher_sge()
,
crew_monitor_sge()
Examples
if (identical(Sys.getenv("CREW_EXAMPLES"), "true")) {
controller <- crew_controller_sge()
controller$start()
controller$push(name = "task", command = sqrt(4))
controller$wait()
controller$pop()$result
controller$terminate()
}