Break up a large dataset and take average

avehtari · August 28, 2017, 8:15pm

Yes. We are waiting some changes to Stan which makes it easier to make it more user friendly, but it’s already possible to truly parallelisize it.

I think Swupnil did, but can’t be sure.

You might be also interested in MPI discussion in another thread.

bhomass · September 18, 2017, 6:49am

Does some one know ep-stan in detail? After diving deeper into the code, I am really getting worried that it is not parallelizable. If you look at the Worker.cavity() inside method.py, it is updating a global structure self.Mat and self.vec. In the reference implementation, Master is calling each of the sites sequentially. That allows the global structure update to occur in a orderly fashion. If multiple workers are solving for self.vec using linalg.cho_solver, the results would come out different. requiring sequential site updates makes no sense. You might as well just work on the original stan model without ep, since there is no speed up using parallelism.

avehtari · September 18, 2017, 6:58am

Tuomas Sivula who wrote the code. Please make an issue in github. Note that this specific code is demonstrating the concept, and it wasn’t used for the real data example. We are waiting for the refactoring of Stan code before making something easier to use.

bhomass · September 18, 2017, 7:17am

I need to ask for forgiveness. My mistake. self.vec applies individual Workers, not to the Master. We are good on this issue.

Bob_Carpenter · September 19, 2017, 11:15pm

What are you waiting for? The command structure and output callback refactoring has landed.

There’s an arXiv paper that explains what’s going on with the algorithm and how it can be parallelized.

avehtari · September 20, 2017, 7:37am

Able easily continue with the current adaptation parameters and mass matrix. At some point of the algorithm tilted distributions are changing only a little, and it would be more efficient without the need for the adaptation phase. I think there were also some issues with how easy it would be to compute IS, which could give additional speedup in the final iterations.

Bob_Carpenter · September 20, 2017, 5:31pm

If you’re talking about doing this at the C++ level, this is all under your control already if you’re writing algorithms on top of our existing algorithms.

If you’re talking about doing it at the level of the commands in services, Mitzi’s already plumbed that through.

The only thing that’s waiting now is for CmdStan, RStan, and PyStan to figure out how to pass matrices in and get them out.

What’s “IS”?

avehtari · September 20, 2017, 5:42pm

We’ll wait for those.

Sorry for using ambiguous acronym. Here IS is Importance sampling. For that we would like to re-evaluate the target (lp__) for the same (or sometimes different) MCMC draws with new data.

Bob_Carpenter · September 20, 2017, 5:57pm

But those shouldn’t be part of the C++ implementation. You want to build an algorithm that’s exposed through on the of the services functions. At that level, you don’t need this.

If you just want to build on top, we need to get someone to prioritize this. Too many parallelization opportunities flying around.

I think we’ll talk more about some of these interface issues on the way to Stan 3 and after 2.17 is out.

avehtari · September 20, 2017, 6:02pm

I think we want to use PyStan and RStan. No hurry, we can wait. I have assumed that eventually PyStan and RStan will have what we want, and it’s enough to wait and no need to change prioritization (at least now)

bhomass · September 21, 2017, 4:42pm

I am past that point. The paper is clear. I was concerned about parallelizing class variables and matrix parameters in the code. But I am also past that point. I can see how the current code can be parallelized.

avehtari · September 22, 2017, 8:54pm

Great! Let us know how your experiments go, and if you have suggestions for the code, please contact Tuomas.

bhomass · September 23, 2017, 11:23pm

I do have one question on the paper. Why is this variational approx stated as

qeqn

I see the gaussian part using the precision matrix. Where does the r’theta part come from?

avehtari · September 28, 2017, 7:20pm

It comes from the location parameter of the Gaussian.

apeterson91 · February 21, 2018, 10:43pm

@bhomass

I’m now working on something similar. Is your work available online anywhere for me to see what you’ve done?

Thanks.

Topic		Replies	Views
Stan on GPU: looking for model+dataset examples for empirical evaluation of speedups General	36	3470	March 5, 2018
Reduce computation time with highly-correlated covariates Modeling rstan	18	3952	July 3, 2020
Stuck at Warmup iteration with no error : CmdStanR CmdStan techniques , fitting-issues	48	3173	April 21, 2020
Parallelizing an approximate hierarchical Gaussian process with reduce_sum() Modeling cmdstanr , paralellization	12	232	March 31, 2025
Linear, parallell regression CmdStan	89	5333	September 20, 2018

Break up a large dataset and take average

Related topics