mirai and crew: next-generation async to supercharge Plumber, Shiny, and targets


Charlie Gao and Will Landau

Carrying the baton for R

What is (next generation) async?

Parallelism: the ability to do multiple things at once

Async: not waiting while this happens

Async is a ‘first class’ experience in many languages

  • Rust has its ‘fearless concurrency’
  • Go has its Goroutines
  • Javascript has its Promises

Javascript ‘just works’TM in your web browser

Where we are now in R

Parallelism: the ability to do multiple things at once

Async: not waiting while this happens

Missing a ‘first-class’ async experience

  • {parallel} is parallel, not async
  • {callr} only for local parallelism (saves files to filesystem), not always async
  • {future} relies on {parallelly} and blocks if tasks > workers, not always async

Bringing first class async to R

  • Nanomsg Next Gen (NNG) implements async in C
  • Incredibly lightweight, brokerless messaging
  • {nanonext} brings NNG to R    

mirai

ミライ

Minimalist Async Evaluation Framework for R

Parallelism: the ability to do multiple things at once

Async: not waiting while this happens

  • {mirai} uses {nanonext} to deliver ‘first-class’ async
  • Connect thousands of parallel processes
  • Launch millions of tasks all at once
  • Responsiveness: milliseconds -> microseconds

The current generation of promises

A ‘promising’ object is used in Shiny ExtendedTask / Plumber:

  • future() blocks the session if tasks > workers
  • future_promise() has never exited ‘experimental’
  • Requires constant polling for resolution of each promise

The next generation of promises

  • mirai() is now a ‘promising’ object
  • Native support for Shiny ExtendedTask / Plumber
  • Event-driven: no polling & zero-latency


Launching
one million promises
all at once

Demo: simulating parallel coin flips

First-class async for R / R Shiny


x Slow polling

mirai promises

Extending mirai

Science demands heavy computing

  • Bayesian methods help safety/efficacy decisions in clinical trials
  • Each model could take hours to run
  • 1000+ simulations to design a clinical trial

Too much work for one laptop

  • Clinical trial simulations often need hundreds of computers
  • Need distributed computing: e.g. SLURM clusters, AWS Batch
  • Challenges: access, overhead, and cost

crew scales data science

  1. Auto-scaling reduces overhead and cost
  2. Plugins access big high-performance computing systems
  3. mirai provides low-overhead interprocess communication

Low-overhead communication

Low-overhead communication

Low-overhead communication

Low-overhead communication

Auto-scaling reduces overhead & cost

Auto-scaling reduces overhead & cost

Auto-scaling reduces overhead & cost

Auto-scaling reduces overhead & cost

Auto-scaling reduces overhead & cost

Auto-scaling reduces overhead & cost

Simple controller interface


# New controller

controller <- crew_controller_local(
  workers = 2,
  seconds_idle = 10,
  tasks_max = Inf
)

Simple controller interface


# Submit a task.

controller$push(1 + 1)

Simple controller interface


# Get the result.

controller$pop()

AWS Batch plugin


crew_controller_aws_batch(
  workers = 100,
  seconds_idle = 120,
  aws_batch_job_definition = "DEF",
  aws_batch_job_queue = "QUEUE",
  port = 57000 # +TCP in security group
)


Parallel + async in Shiny

observeEvent(input$button, {
  replicate(1000, 
    controller$push(flip_coin(), ...) %...>%
      collect_flips(controller, ...)
  )
})

crew accelerates targets


#_targets.R file

tar_option_set(
  controller = crew_controller_aws_batch(
    workers = 100,
    seconds_idle = 120,
    aws_batch_job_definition = "DEF",
    aws_batch_job_queue = "QUEUE",
    port = 57000 # +TCP in security group
  )
)

Recap

  • mirai is parallel and first-class async
  • crew plugs mirai into heavy-duty platforms

Thanks


https://wlandau.github.io/posit2024