With the downsize
package, you can toggle the test and
production versions of your workflow with the flip of a
TRUE/FALSE
global option. This is helpful when your
workflow takes a long time to run, you want to test it quickly, and unit
testing is too reductionist to cover everything.
Say you want to analyze a large dataset.
big_data <- data.frame(x = rnorm(1e4), y = rnorm(1e4))
But for the sake of time, you want to test and debug your code on a
smaller dataset. In your code, select your dataset with a call to
downsize()
.
my_data <- downsize(big_data) # downsize(big = big_data)
Above, my_data
becomes big_data
if
getOption("downsize")
is FALSE
or
NULL
(default). If getOption("downsize")
is
TRUE
, big_data
becomes
head(big_data)
. You can toggle the global option
downsize
with calls to production_mode()
and
test_mode()
, and you can override the option with
downsize(..., downsize = L)
, where L
is
TRUE
or FALSE
. Check if the workflow is in
test or production mode with the my_mode()
function.
Here is an example script in test mode.
library(downsize)
test_mode() # scales the workflow appropriately
my_mode() # shows if the workflow is in test or production mode
big_data <- data.frame(x = rnorm(1e4), y = rnorm(1e4)) # always large
my_data <- downsize(big_data) # either large or small
nrow(my_data) # responds to test_mode() and production_mode()
# ...more code, time-consuming if my_data is large...
To scale up the workflow up to production mode, replace
test_mode()
with production_mode()
and
leave everything else exactly the same.
library(downsize)
production_mode() # scales the workflow appropriately
my_mode() # shows if the workflow is in test or production mode
big_data <- data.frame(x = rnorm(1e4), y = rnorm(1e4)) # always large
my_data <- downsize(big_data) # either large or small
nrow(my_data) # responds to test_mode() and production_mode()
# ...more code, time-consuming if my_data is large...
An ideal workflow has multiple calls to downsize()
that
are configured all at once with a single call to
test_mode()
or production_mode()
at the very
beginning. Thus, tedium and human error are avoided, and the test is a
close approximation to the original task at hand.
You can provide a replacement for big_data
using
argument small
in downsize()
.
library(downsize)
big_data <- data.frame(x = rnorm(1e4), y = rnorm(1e4))
small_data <- data.frame(x = runif(16), y = runif(16))
test_mode()
my_mode() # getOption("downsize") is TRUE
## [1] "test mode"
my_data <- downsize(big_data, small_data) # downsize(big = big_data, small = small_data)
identical(my_data, small_data)
## [1] TRUE
If you set small
yourself, be sure that subsequent code
can accept both small
and big
. For example, if
small
is a data frame and big
is a matrix,
your code may work fine in test mode and break in production mode. In
addition, downsize()
will warn you if small
is
identical to or bigger in memory than big
(disable with
downsize(..., warn = FALSE
)). To be safer, use the
subsetting capabilities of the downsize()
function.
The command my_data <- downsize(big = big_data)
is
equivalent to
my_data <- downsize(big = big_data, nrow = 6)
. There are
multiple ways to subset argument big
in
downsize()
when it is time to scale down to test mode. As
in the following examples, be sure that small
is set to
NULL
(default). Otherwise, subsetter arguments such as
dim
, length
, nrow
, and
ncol
will be ignored.
## [1] 1 2
## [,1] [,2]
## [1,] 1 7
## [2,] 2 8
## [3,] 3 9
## [4,] 4 10
## [5,] 5 11
## [6,] 6 12
downsize(m, nrow = 2)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1 7 13 19 25 31
## [2,] 2 8 14 20 26 32
## [,1] [,2]
## [1,] 1 7
## [2,] 2 8
downsize(data.frame(x = 1:10, y = 1:10), nrow = 5)
## x y
## 1 1 1
## 2 2 2
## 3 3 3
## 4 4 4
## 5 5 5
## [1] 10 100 2 300 12
## [1] 3 3 2 3 3
## [1] 1 4 2 300 12
## [1] 10 1 2 300 12
Set random
to TRUE
to take a random subset
of your data rather than just the first few rows or columns.
## [,1] [,2]
## [1,] 25 7
## [2,] 26 8
## [3,] 27 9
## [4,] 28 10
## [5,] 29 11
## [6,] 30 12
You can interchange entire blocks of code based on the scaling/mode of the workload.
## [1] 2
production_mode()
downsize(big = {a = 1; a + 10}, small = {a = 1; a + 1})
## [1] 11
Variables set in code blocks are available after calls to
downsize()
.
test_mode()
tmp <- downsize(
big = {
x = "long code"
y = 1000
},
small = {
x = "short code"
y = 3.14
})
x == "short code" & y == 3.14
## [1] TRUE
production_mode()
tmp <- downsize(
big = {
x = "long code"
y = 1000
},
small = {
x = "short code"
y = 3.14
})
x == "long code" & y == 1000
## [1] TRUE
Use the help_downsize()
function to obtain a collection
of helpful links. For troubleshooting, please refer to TROUBLESHOOTING.md
on the GitHub page for
instructions.