Efficient use of map_rect for parallelizing ODE systems

Hello,

I have a model with large systems of ODEs, not dissimilar to the hierarchical PK/PD models that @wds15 has worked with where subject-level ODEs are solved in parallel using map_rect(). I was hoping someone could comment on whether it would be inefficient to use map_rect to fill out a mean vector, as opposed to returning a vector of log-likelihood contributions that are summed in the model block. In our setting, the posterior distributions of the curves that solve the ODEs are of interest in addition to the posteriors of the parameters that govern the ODEs.

To be a bit more precise, I’m asking if something like the following would be inefficient:

transformed parameters {
// mean_fcn involves integrating ODEs to get the mean vector
// mean_vec is also of interest
vector[n] mean_vec = map_rect(mean_fcn, phi, theta, x_r, x_i); 
}
model {
y ~ normal(mean_vec, sigma);
}

Or, are efficiency gains from using map_rect/MPI only realized if map_rect returns a log-likelihood contribution for each shard? e.g.,

model{
// loglik_fcn integrates ODEs but returns a vector of log-likelihoods not means
target += sum(map_rect(loglik_fcn, phi, theta, x_r, x_i));
}

Thank you!

What are you exactly doing here? Is this a hierarchical model and you solve the ODE for many subjects for which you apply map_rect to do that in parallel? If the answer is yes, then it’s very likely that you will see speedups from using map_rect even if you return the means per time point. Doing so increases the communication cost, but if your ODE is not too small (a few states and/or a few parameters) or stiff, then integrating the ODEs takes forever in comparison to most other things and you can return the mean vector (though only returning the log-lik is always a lot more efficient).

Thanks, that’s exactly right. It’s a hierarchical model where we have conditionally independent units for which the ODEs could be integrated in parallel. Still in development, so the systems are small for now, but they’ll get big and complicated.

Could you explain a bit more about what you mean by the communication cost and why returning only the log-likelihood would be a lot more efficient, as opposed to just increasing the burden through an additional copy-cost of tracking the mean vector? Is it that the copy-cost really adds up b/c the ODEs are re-integrated in every leap-frog step?

Thanks!

Another followup. If the mean vector is of interest, would it be better to have map_rect return the log-likelihood in the model block and then re-integrate the ODEs in the generated quantities block as opposed to returning the mean vector in the transformed parameters block?

if you only return the log-like value, then it’s just a single value (and the gradients); whereas its a lot more values when you return the mean values… if you send this over the network via MPI, then less is better.

Even better is it to first calculate the posterior with map_rect only returning a single value in the model block (the log-lik) and then running with the generated quantities facility another run where you do call map_rect in the gq block and return the mean vector.

Thanks so much, this was really helpful!!! Now off to slay some computational demons!

1 Like

@wds15, if you don’t mind I have a followup question even though the issue is closed. Is there anything extra I need to worry about, from a computation/Stan perspective, if I have different types of measurements, say on different compartments in my system? In particular, is it fine to return a single log-likelihood value for each subject or do I need to explicitly return separate log-likelihood for each measurement? I’m running into an issue where I get the prior back for one of my measurement processes and want to rule out the possibility that this is just an implementation error.

Either way is just fine.

EDIT: BTW… you may want to use reduce_sum for easier coding - that’s part of cmdstan 2.23; but it won’t give you any details though - you can only return a single log-lik value per call.

Thanks!

Does reduce_sum still work if I have parameters that vary from subject to subject? e.g., in a hierarchical PK or transmission model?

Yes, of course.

1 Like