Break up a large dataset and take average

Yes. We are waiting some changes to Stan which makes it easier to make it more user friendly, but it’s already possible to truly parallelisize it.

I think Swupnil did, but can’t be sure.

You might be also interested in MPI discussion in another thread.

Does some one know ep-stan in detail? After diving deeper into the code, I am really getting worried that it is not parallelizable. If you look at the Worker.cavity() inside method.py, it is updating a global structure self.Mat and self.vec. In the reference implementation, Master is calling each of the sites sequentially. That allows the global structure update to occur in a orderly fashion. If multiple workers are solving for self.vec using linalg.cho_solver, the results would come out different. requiring sequential site updates makes no sense. You might as well just work on the original stan model without ep, since there is no speed up using parallelism.

Tuomas Sivula who wrote the code. Please make an issue in github. Note that this specific code is demonstrating the concept, and it wasn’t used for the real data example. We are waiting for the refactoring of Stan code before making something easier to use.

I need to ask for forgiveness. My mistake. self.vec applies individual Workers, not to the Master. We are good on this issue.

1 Like

What are you waiting for? The command structure and output callback refactoring has landed.

There’s an arXiv paper that explains what’s going on with the algorithm and how it can be parallelized.

Able easily continue with the current adaptation parameters and mass matrix. At some point of the algorithm tilted distributions are changing only a little, and it would be more efficient without the need for the adaptation phase. I think there were also some issues with how easy it would be to compute IS, which could give additional speedup in the final iterations.

If you’re talking about doing this at the C++ level, this is all under your control already if you’re writing algorithms on top of our existing algorithms.

If you’re talking about doing it at the level of the commands in services, Mitzi’s already plumbed that through.

The only thing that’s waiting now is for CmdStan, RStan, and PyStan to figure out how to pass matrices in and get them out.

What’s “IS”?

We’ll wait for those.

Sorry for using ambiguous acronym. Here IS is Importance sampling. For that we would like to re-evaluate the target (lp__) for the same (or sometimes different) MCMC draws with new data.

But those shouldn’t be part of the C++ implementation. You want to build an algorithm that’s exposed through on the of the services functions. At that level, you don’t need this.

If you just want to build on top, we need to get someone to prioritize this. Too many parallelization opportunities flying around.

I think we’ll talk more about some of these interface issues on the way to Stan 3 and after 2.17 is out.

I think we want to use PyStan and RStan. No hurry, we can wait. I have assumed that eventually PyStan and RStan will have what we want, and it’s enough to wait and no need to change prioritization (at least now)

I am past that point. The paper is clear. I was concerned about parallelizing class variables and matrix parameters in the code. But I am also past that point. I can see how the current code can be parallelized.

1 Like

Great! Let us know how your experiments go, and if you have suggestions for the code, please contact Tuomas.

I do have one question on the paper. Why is this variational approx stated as

qeqn

I see the gaussian part using the precision matrix. Where does the r’theta part come from?

It comes from the location parameter of the Gaussian.

@bhomass

I’m now working on something similar. Is your work available online anywhere for me to see what you’ve done?

Thanks.