Parallel computation Chains over nodes of a cluster. Parallel computation of log_likelihood within a node

KongfuCat · September 10, 2018, 5:17pm

Dear Stan Team,
I am working on a larger Dataset of different Time series of experiments. Larger number of Observables the maximum calculation time on my of 8days cluster runs out. I would like to profile whether my assumptions about the time spend calculation the likelihood is the bottleneck of my stan model (since there are many data points and each of it requiring some matrix inversion, multiplications etc…). It is mathematical obvious that one could parallelize those for loops over each time trace to different CPUs while calculating each chain on a different node.

Are there any Helloworlds or Tutorials for those issues? I haven t found a helping documentation for manipulating the c++ code of stan model. The pystan stanc function delivers me some c++ code for some class but I don t really find a starting point to understand this class.

Especially where do I find some main function in which I can retrace the Programmblocks like the model{}and the transformed Parameterblock{}?

Where do I find of the C++ code which distributes the chains to different cores which I would like to change to distribute over nodes of the cluster?

Are there ways yet, to calculate the trajectory on the parameter space in parallel for a model with Hidden State space properties… such as Hidden Markov or a Kalmann filter?

Thanks for your comments and hints in advance.

Jan

bbbales2 · September 11, 2018, 1:04pm

Before you start with parallel stuff, and before you starting running models for long periods of time, you’ll want to do some work with simulated data to make sure your model is doing what you think and doing it efficiently. If you’re curious what I mean by that: https://www.youtube.com/watch?v=ZRpo41l02KQ&list=PLuwyh42iHquU4hUBQs20hkBsKSMrp6H0J&index=6&t=0s and then https://www.youtube.com/watch?v=6cc4N1vT8pk&list=PLuwyh42iHquU4hUBQs20hkBsKSMrp6H0J&index=7&t=0s

If you’ve already done that and the diagnostics pass (http://mc-stan.org/users/documentation/case-studies/rstan_workflow.html), then head to computational stuff. But it’s really hard computationally to make up for some things than can be fixed be reparameterizations or scalings or whatnot.

Anyway, for the parallel stuff, the place to start is map_rect (instead of hacking the C++ directly). I dunno if the docs are up, but this is a thread where folks have been working through an example using it: Linear, parallell regression

ermeel · September 11, 2018, 6:41pm

Regaerding map_rect and doc, check section 22 of Stan 2.18. User Guide

KongfuCat · September 13, 2018, 9:36am

Hey bbbales2, thanks for your immediate help. Maybe a few comenst to my workflow.
Sofar I am just working with simulated Data of a toy model in different Datasettings. Such that I know what the right model is. So with posterior predictive checks I might shrink down the space of one Parameter easily maybe with little luck for more. Since the models will get more complex with real data and the algorithm scales, if i remember right N_states^3, I am defiantly running into Problems. They already occurred now just I see no with a slide change in the Data.

1.a But in general does a weakly informative a informative decrease sampling from the posterior? (Probably yes? But I don know enough about HMMC to answer that)
1.b I guess the size of a parameter interval does influence the sampling time!? Or just the warmup?

Does it decrease computational time if I use a beta distributions instead of uniform distributions as a prior such that at the edges of the interval the probability drops in continuous fashion to zero. I mean like really flat betas…

I gonna proceed withe the other videos and see… I get further ideas from them.

Does map_rect parallelize over nodes or just over all processors of one node?

Thanks for your help. Mr bbbales2 and ermeel

bbbales2 · September 13, 2018, 5:29pm

N^3 sounds like a linear solve! What type of model is it?

Tighter priors that include the truth will probably allow the sampler to run faster. Making priors tight when they are not in the right place probably isn’t going to do any good though. This is what the prior predictives are for – trying to put the priors vaguely in the right place and on the right scale with the right amount of regularization to hopefully make the sampling fast.

This can be especially true with hard constraints. If you specify a parameter with a hard constraint and the data implies something that violates that constraint, it’ll run up against the edge of the constraint and have difficulty moving around.

Warmup and sampling both use the same model. They both use the NUTS sampler as well, though in warmup various NUTS parameters are being selected to make the model run efficiently, so you don’t get to treat the warmup draws like a regular MCMC chain.

If you’re trying to mimic a uniform with a beta, probably not. The sampler isn’t actually sampling on a bounded interval if you specify a bounded parameter. Check out the Transformation of Constraints section of the manual.

map_rect parallelizes over cores, so they could be cores of one computer, or cores of different computers. I don’t think there’s a huge

KongfuCat · October 10, 2018, 9:39am

Hi bbbales,

sorry Mr Bales i wasn’t really in the office the last weeks. Its a Kalman Filter which running partially in parallel now :).

Jan

bbbales2 · October 10, 2018, 4:26pm

Ah, if they’re Kalman filters, any chance you have a tridiagonal matrix?

You’ll want to do your solves manually if so. Stan doesn’t have sparse linear algebra stuff in place yet, but doing full matrix solves when stuff is actually sparse is rough.

KongfuCat · October 11, 2018, 2:02pm

What would you do with the tridiagonal matrix? Neither my transition matrix or the the observation matrix are. The model is singular… the observation matrix in general is not invertible and has couple of zeros in it. The dimensions of the observation space are much smaller then the state space.

bbbales2 · October 11, 2018, 4:01pm

It’s easy to write simple Stan code to solve tridiagonal matrices.

Have you seen the “Gaussian Dynamic Linear Models” thing in the manual? Is that anything like what you’re doing?

It’s section “61.7. Gaussian Dynamic Linear Models”, page 553 in the 2.17.0 manual.

KongfuCat · October 12, 2018, 12:55pm

Unfortunatly not quite if I understand y ~ gaussian_dlm_obs(F, G, V, W, m0, C0) correctly.

You assume for all data a constant W, which is the correlation matrix for the noise in the state updates. Right? In my case W is time depended due to a dependency onto the current state and parameters which also construct G only the observation matrix F and Transition Matrix G are constant in time.

bbbales2 · October 12, 2018, 4:44pm

Looks like it :(. Bummer! Your problem is hard, haha.

Back to your original question, hacking away at the C++ is possible (https://cran.r-project.org/web/packages/rstan/vignettes/external.html, or search around the forums). The danger to this is that a lot of time is spent in the autodiff backwards pass, which isn’t as easy to profile as the forward pass.

Topic		Replies	Views
Using Stan on a computing cluster. Any advice? CmdStan	20	5136	January 10, 2019
Stan on GPU: looking for model+dataset examples for empirical evaluation of speedups General	36	3551	March 5, 2018
Proposed parallelism RFC - Stan language bits Developers	14	1070	July 9, 2019
Parallel dynamic HMC merits Developers features	38	3210	September 17, 2019
RStan (PyStan) & MPI / GPU Developers features	43	3503	September 24, 2017

Parallel computation Chains over nodes of a cluster. Parallel computation of log_likelihood within a node

Related topics