I want to summarize where we’re at in terms of proposals.
For each of the three concrete proposals, I provide the signature of the map
function as it would look from the Stan language and the definition of the signature for the function argument f
.
In all cases, the rough structure is a map-reduce that follows the actual map function being defined in the Stan language:
Rectangular, retransmit data
vector[] map(F f, vector[] thetas,
vector[] xs_r, int[,] xs_i)
vector f(vector theta, vector xs_r, int[] xs_i)
PRO
- function is simple and encapsulated
- clean math library implementation
- doesn’t go beyond what already exists in Stan language
- could use standalone function implementations as is for the workers (non root)
CON
- retransmits data
- rectangular only, so may require ugly and expensive-to-transmit padding
Ragged, retransmit data
vector map(F f,
vector thetas, int[] theta_ends,
vector x_r, int[] x_r_ends,
int[] x_i, int[] x_i_ends)
vector f(vector theta, vector xs_r, int[] xs_i)
PRO
- all of the pros of the rectangular version
- function is same, so still encapsulated
- allows arbitrary raggedness (user must know ragged result bounds)
CON
- retransmits data
- awkward raggedness without built-in ragged structures
- return sizes implicit
- could
int[] result_ends
argument for anticipated sizes—check results match
Rectangular, Fixed Data
The data variables x_rs
and x_is
must be the names of data
or transformed data
variables, so that they can be loaded once and reused
vector[] map(F f, vector[] thetas, vector[] x_rs, int[,] x_is)
vector f(vector theta, vector x_r, int[] x_i)
This one has a map that just sends the data in once, but x_r
and x_i
must be the names of variables in the data/transformed data block. Each child process will need to load the data from the model, but then only needs to hang on to its own slice, x_rs[k]
and x_is[k]
. Then the function can be taken from the standalone function compilation.
PROS
- reads data once on each child process
- easier to implement than closure
- easy implementation without MPI
- should perform better than data transmission each time
CONS
- has to instantiate all of the model’s data on each process, even though it only needs a slice
- back of envelope, this should be OK for PK/PD apps
- might not be so OK for distributing big regressions
- function
f
needs to know how to grab its slice of data from the index
Could generalize this to ragged in the same way as before.
Closure, send data once
vector map(F f,
vector thetas, int[] theta_ends)
vector f(vector theta, int fold);
PRO
- data closure means no data arguments
- could be big win in reducing communications (latency still there in calls and overhead for waits)
CON
- requires functions that are closures over data
- requires major extension to Stan language parser and code generator for support and a lot of doc explaining them
- generate as member functions rather than static
- might like to have in language anyway
- could require too much memory per worker as each worker gets everything in the instantiated model