I’m not sure when everyone gets to Helsinki, but if you want to meet up and talk about this over coffee on Tuesday afternoon, I’m keen.
And just to triple-spam the thread (sorry!) this ability to store doubles (or vectors of doubles) during a likelihood evaluation and then read those stored values during another likelihood evaluation would be extremely useful for the algebraic solver in general. It would both speed it up and probably stabilize it.
We never ended up talking about this, but I only came up with two kinda rudimentary designs:
- Pass in a “key” value to every optimization in the program. This key value is used internally for iteration-to-iteration storage.
So something like:
model {
for(n in 1:N) {
y = optimize(..., n); // The model specifies an identifier for each optimization
}
...
}
So your optimize function might look like:
auto optimize(auto a, ..., int i) {
Eigen::VectorXd initial_guess;
load_guess(i, guess);
...
save_guess(i, guess);
}
Where that map from ids -> initial guesses would live is up in the air, but I figure it wouldn’t be too difficult to nail it down.
The downside to this is that it’d be hard to get these indexes right if your optimizations were buried in weird places:
functions {
vector myfunction(real a, real b, int n) { // Ugh -- have to carry index around manually
return optimize(..., n);
}
}
...
model {
...
for(n in 1:N) {
// What if myfunction sometimes used 2 optimizations and sometimes 1?
y[n] = myfunction(a, b, n);
}
...
}
So that’s not great. But in any case a wrong index got passed in and the initial guess didn’t exist or wasn’t the right size, then the regular initial conditions could be used.
- The other easy thing is maintain two stacks in each leapfrog step: one that gets written to (that will be used by the next leapfrog step), and one that gets read from (which was created in the last leapfrog step). So inside the C++ maybe you have something like:
auto optimize(auto a, ..., int i) {
Eigen::VectorXd guess;
pop_from_front_guess(guess);
...
push_to_back_guess(guess);
}
In this way you assume the order and number of your Stan model function evaluations don’t change.
Neither thing is terribly inspirational, but either would probably cover the bulk of the use cases. I don’t think the second thing would work with threading (the first one should be do-able, just annoying). The first thing probably wouldn’t interact well with MPI though.
In either case the initial conditions to the model would be a bit weird. Maybe in terms of nomenclature they’d be more like fallback initial conditions.
In either case we could use the Parameter packs for (de)serialization stuff to make the optimization internals look nice.
The model class is designed to be immutable after construction with data.
Breaking this abstraction is going to require a lot of care because we’ve assumed for multi-threading, for example, that the model doesn’t change. Autodiff gets thread local, but the model is still immutable.
Relaxing this assumption about the model will be hard, but I think it’d be even harder at the math library level.
What exactly do you want to store from iteration to iteration and how will it be indexed so it can be found in the next iteration?
The way this got solved for MPI is to statically generate indexes for each call to MPI. But each usage is blocking there until it finishes, so we don’t need to worry about the loops—the ID is still unique and valid for MPI (at least under current usage).
The trick’s going to be managing the lifecycle of these. They can’t just accumulate statically forever or we’ll leak memory.
I’m imagining something like this
template<typename T, int i> struct glm_poisson_laplace {
static Eigen::Matrix<doulbe,Eigen::Dynamic,1> stored_optimum;
// Other (non-static) storage
// Constructor and methods
}
If that makes sense. (I can make a more complete spec if needed). Each new instance of glm_poisson_laplace
would store one static vector that would be used “sensibly” in the opimization step and updated as needed.
To make this threadsafe, I guess we’d need to make this a distributed vector, but there should be tools for that.
That would then be a design thing: should the static vector be stored in chunks on the distributed nodes, or should it be stored on the master and broadcast each time? MPI is not my thing, so I have no real opinion on the design of the parallelism. But I do think it can be done safely in a fairly straightforward way.
The problem here is that the whole autodiff tree gets built and destroyed every HMC step.
The glm_poisson_laplace objects wouldn’t survive HMC step to HMC step unless there’s some background trickery I’m overlooking here.
Ok, so was your solution to store a static Eigen::Matrix<double, -1,num_instances>
in the initialization and then use the index i
to index into it?
I’m not sure what @Bob_Carpenter’s problem with the lifecycle was. They’d be cleaned up at the end of the program with everything else wouldn’t they?
The static
declaration?
I see two problems.
One, it changes the way our class works fundamentally. This isn’t going to break anything right away, but moving from purely immutable to caching state run to run gets complicated and I’d rather not do it unless absolutely necessary. That is, I’d rather do something external if we need it.
The problem with the simple static on the function is that there may be multiple chains executing, so it’d have to be thread local, which has a performance hit for synchronization. It would also have to keep some kind of map, as there might be two instances of the same kind of diagnostic going on.
I was pretty sure that’s what static
does, but I defer to literally everyone else on this.
If you’d prefer some dedicated memory somewhere (I think that’s what you mean by “external”) then I’m fine with that. It’s probably a bit more effort, but if it keeps the rest of the ship safely afloat, it’s worth it. Safety first!
I think this was me misunderstanding how multiple chains work. I had assumed that it was multiple instances of the same Stan program running, which shouldn’t lead to problems. But if there’s a push to use MPI-ish things to make multiple chains run then that’s definitely a thing we need ot be careful about!!!
Whoops, sorry about just taking a vacation from responding for 3 days :/. I missed the static in the original post, but there’s still the problem that there’d only be one copy of the static memory for all the different calls to the optimizer.
But I agree with you now, this seems like the best way to do development. If there’s multiple calls in your test models, add an integer argument to the function and set it when you call it.
The Eigen::MatrixXd
is fine. I was thinking std::map<double, Eigen::VectorXd>
might be easier so resizing isn’t an issue, but whatever you feel like working with.
I assume once there’s some validated test models and performance plots, the difference between saving and not saving these outputs will be enough to motivate the changes to the memory model.
Your assumption is correct, but there’s a threaded version of map_rect. Just don’t use that in your development and you’re good to go. Figuring this out can wait.
That is how things are set up now.
But we’re adding more multilthreading and multiple threads would share the same statics and thus need to synchronize any kind of access.
Yes, I meant something that’s part of the algorithm rather than part of one of our fundamental data types. So rather than having a special matrix with a static decorator of some kind, have an external mapping as part of the algorithm.