みらい 未来
Minimalist Async Evaluation Framework
for R
Designed for simplicity, a ‘mirai’ evaluates an R
expression asynchronously in a parallel process, locally or distributed
over the network, with the result automatically available upon
completion.
Modern networking and concurrency built on nanonext and NNG (Nanomsg Next Gen) ensure
reliable and efficient scheduling, over fast inter-process
communications or TCP/IP secured by TLS.
Advantages include
being inherently queued thus handling many more tasks than available
processes, no storage on the file system, support for otherwise
non-exportable reference objects, an event-driven promises
implementation, and built-in asynchronous parallel map.
Use mirai()
to evaluate an expression asynchronously in
a separate, clean R process.
The following mimics an expensive calculation that eventually returns a vector of random values.
library(mirai)
<- mirai({Sys.sleep(n); rnorm(n, mean)}, n = 5L, mean = 7) m
The mirai expression is evaluated in another process and hence must
be self-contained, not referring to variables that do not already exist
there. Above, the variables n
and mean
are
passed as part of the mirai()
call.
A ‘mirai’ object is returned immediately - creating a mirai never blocks the session.
Whilst the async operation is ongoing, attempting to access a mirai’s data yields an ‘unresolved’ logical NA.
m#> < mirai [] >
$data
m#> 'unresolved' logi NA
To check whether a mirai remains unresolved (yet to complete):
unresolved(m)
#> [1] TRUE
To wait for and collect the return value, use the mirai’s
[]
method:
m[]#> [1] 7.418942 6.510695 6.620885 5.488792 8.438980
As a mirai represents an async operation, it is never necessary to
wait for it. Other code can continue to run. Once it completes, the
return value automatically becomes available at $data
.
while (unresolved(m)) {
# do work here that does not depend on 'm'
}
m#> < mirai [$data] >
$data
m#> [1] 7.418942 6.510695 6.620885 5.488792 8.438980
Daemons are persistent background processes for receiving mirai requests, and are created as easily as:
daemons(4)
#> [1] 4
Daemons may also be deployed remotely for distributed computing and launchers can start daemons across the network via (tunnelled) SSH or a cluster resource manager.
Secure TLS connections can be used for remote daemon connections, with zero configuration required.
mirai_map()
maps a function over a list or vector, with
each element processed in a separate parallel process. It also performs
multiple map over the rows of a dataframe or matrix.
<- data.frame(
df fruit = c("melon", "grapes", "coconut"),
price = c(3L, 5L, 2L)
)<- mirai_map(df, \(...) sprintf("%s: $%d", ...)) m
A ‘mirai_map’ object is returned immediately. Other code can continue
to run at this point. Its value may be retrieved at any time using its
[]
method to return a list, just like
purrr::map()
or base::lapply()
. The
[]
method also provides options for flatmap, early stopping
and/or progress indicators.
m#> < mirai map [3/3] >
m[.flat]#> [1] "melon: $3" "grapes: $5" "coconut: $2"
All errors are returned as ‘errorValues’, facilitating recovery from partial failure. There are further advantages over alternative map implementations.
mirai
is designed from the ground up to provide a
production-grade experience.
mirai パッケージを試してみたところ、かなり速くて驚きました
The following core integrations are documented, with usage examples in the linked vignettes:
Provides an alternative communications backend for R, implementing a
new parallel cluster type, a feature request by R-Core at R Project
Sprint 2023. ‘miraiCluster’ may also be used with foreach
via doParallel
.
Implements the next generation of completely event-driven, non-polling
promises. ‘mirai’ may be used interchageably with ‘promises’, including
with the promise pipe %...>%
.
Asynchronous parallel / distributed backend, supporting the next level of responsiveness and scalability within Shiny, with native support for ExtendedTask.
Asynchronous parallel / distributed backend for scaling Plumber applications in production.
Allows queries using the Apache Arrow format to be handled seamlessly over ADBC database connections hosted in background processes.
Allows Torch tensors and complex objects such as models and optimizers to be used seamlessly across parallel processes.
Targets, a Make-like pipeline tool for statistics and data science,
has integrated and adopted crew
as its default
high-performance computing backend.
Crew is a distributed worker-launcher extending mirai
to
different distributed computing platforms, from traditional clusters to
cloud services.
crew.cluster
enables mirai-based workflows on traditional
high-performance computing clusters using LFS, PBS/TORQUE, SGE and
Slurm.
crew.aws.batch
extends mirai
to cloud
computing using AWS Batch.
We would like to thank in particular:
Will Landau for being
instrumental in shaping development of the package, from initiating the
original request for persistent daemons, through to orchestrating
robustness testing for the high performance computing requirements of
crew
and targets
.
Joe Cheng for integrating
the promises
method to work seamlessly within Shiny, and
prototyping event-driven promises.
Luke Tierney of R Core,
for discussion on L’Ecuyer-CMRG streams to ensure statistical
independence in parallel processing, and making it possible for
mirai
to be the first ‘alternative communications backend
for R’.
Henrik Bengtsson for valuable insights leading to the interface accepting broader usage patterns.
Daniel Falbel for
discussion around an efficient solution to serialization and
transmission of torch
tensors.
Kirill Müller for discussion on using ‘daemons’ to host Arrow database connections.
for funding work on the TLS implementation in nanonext
,
used to provide secure connections in mirai
.
Install the latest release from CRAN:
install.packages("mirai")
The current development version is available from R-universe:
install.packages("mirai", repos = "https://shikokuchuo.r-universe.dev")
◈ mirai R package: https://shikokuchuo.net/mirai/
◈ nanonext R
package: https://shikokuchuo.net/nanonext/
mirai is listed in CRAN High Performance Computing Task View:
https://cran.r-project.org/view=HighPerformanceComputing
–
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.