Large model runs very slowly

My relatively large model is taking days to run. I am looking for ways this can be sped up.

I read there is this idea of using chains. What exactly is a chain? is it equivalent to threads? Do I specify the number chains equal to the number of cores in the system? Where do I specify the number of chains.

What else can I do for performance. I read there is a section on reparametrization, but I don’t see anything applicable to my model.

Chains are independent samplers. You run multiple chains with any inference to make sure all the solutions you get are consistent with each other. Chains are totally parallel, but you still need to run each of them for a long time so if running your model takes a long time, running multiple chains will also take a long time.

If your model is running slow, the first place to look is the parameterization (if you’ve got all the obvious vectorization in place). Can you run it with a small amount of data? Or a smaller version of the model? It’s pretty easy to accidentally code in a non-identified parameter or something (which will give the sampler a lot of trouble), and the easiest way to diagnose that is by looking at pairs/Shinystan output.

Or since (from the other thread) it sounds like you’re using cmdstan, just look at the CSV files. What is your stepsize doing (low number is worse)? Did your model ever get out of warmup? What is the treedepth getting to (high numbers are worse)?

1 Like

Still running
Iteration: 1000 / 2000 [ 50%] (Warmup)
Iteration: 1001 / 2000 [ 50%] (Sampling)
Iteration: 1100 / 2000 [ 55%] (Sampling)
Iteration: 1200 / 2000 [ 60%] (Sampling)

stansummary from output.csv gives

                Mean     MCSE   StdDev        5%       50%       95%  N_Eff  N_Eff/s    R_hat

lp__ -2.2e+02 1.4e-01 3.6e-01 -2.2e+02 -2.2e+02 -2.2e+02 6.5 inf 1.5e+00
accept_stat__ 8.4e-01 9.2e-03 1.5e-01 5.2e-01 8.9e-01 1.0e+00 253 inf 1.0e+00
stepsize__ 2.7e-05 5.7e-20 4.1e-20 2.7e-05 2.7e-05 2.7e-05 0.50 inf 1.0e+00
treedepth__ 1.0e+01 2.1e-16 3.6e-15 1.0e+01 1.0e+01 1.0e+01 294 inf 1.0e+00
n_leapfrog__ 1.0e+03 5.3e-14 9.1e-13 1.0e+03 1.0e+03 1.0e+03 294 inf 1.0e+00
divergent__ 0.0e+00 0.0e+00 0.0e+00 0.0e+00 0.0e+00 0.0e+00 294 inf -nan
energy__ 4.8e+03 3.9e+00 6.5e+01 4.6e+03 4.8e+03 4.9e+03 277 inf 1.0e+00

Are these numbers reasonable?

what exactly is an non-identified parameter?

treedepth__ 1.0e+01 ...
n_leapfrog__ 1.0e+03 ...

This means that NUTS is hitting its default internal limit (n_leapfrog of 1024 or treedepth of 10) of how many HMC steps to take to search for a new sample. It’s a bad sign.

what exactly is an non-identified parameter?

If you have the model:

parameters {
  real a;
  real b;

model {
  y ~ normal(a * b, 1.0);

If y was generated as normal(0, 1.0), then there’s a lot of different values of a and b that will produce a suitable fit. If you look at the posterior density on a plot of a vs. b, anything near one of the axes would probably be an okay fit (if either a or b are near zero, then then likelihood is high). a + b would do the same thing, but then posterior samples would be on the line a = -b.

This sorta stuff makes it really difficult for the sampler to explore space. Here’s a Gelman blog on it:

Long story short it’s really easy to accidentally add stuff like this to complicated models, and usually the easiest way to find and remove them is by looking at posterior pairplots and looking for correlations. To get to this point though your model will need to finish running. How much data are you using? Is there any way to do a small scale problem? You might be able to find out what’s wrong without having to use everything. You can also use simulated data for something like this (might be a better idea, if you aren’t sure how well your model actually fits your data).

If they hit internal limits, why didn’t the run stop? It kept on going.

Because it can just take a sample from the path developed thus far instead. Otherwise the algorithm might require a variable and unbounded amount of memory per iteration. This is a relatively common way of making algorithms take up a finite worst case amount of memory. You might want to read the NUTS paper for details.

Oh different limit – the 1024 is the limit in the number of leapfrog ODE steps the sampler will take in the course of getting one MCMC sample. A model that is working really well will often only required 20-30 such steps. It isn’t fixed. The number of steps changes at each MCMC sample, but it looks like in your model it’s stuck at the max.

Most of these concepts are general stats rather than Stan specific. We’re talking about the usual notion of identification in models. Google it or look in the problematic posteriors section of the manual for some simple examples.

I’m not sure how you expect us to help you with something like reparameterization when all we know is that you have a “relatively large model”.

Theres’a chapter in the manual on efficiency that covers both statistical efficiency and computational efficiency.

I’d also suggest first running for fewer than the default number of iterations, then doubling until you get the n_eff you need for whatever downstream inference you’re doing.

Yes, going thru NUTS paper.

ok, the key is setting limits to the parameters. That really sped things up.

excuse me. I searched thru the manual and can not find where you set the number of iterations if you want to differ from the default of 2000.

This will be in the interface manuals, not the main Stan manual. If you use Rstan, it’ll be in the Rstan manual (passed as an argument to the stan function). If it’s cmdstan, it’ll be in the cmdstan manual (passed as a command line argument – I think it’s called num_samples), etc.