Speed issues since upgrading to RStan v2.21.2

Since upgrading to RStudio (v1.3.1056), RTools (v4.0) and Rstan (2.21.2), the time it takes my model to run seems to have roughly 2x-3xed. I appreciate it could be so many things but I was just wondering if anything jumped out to people here?

I followed the installation instructions here (https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started) and unfortunately can’t recall what versions I had of everything before but I expect everything was last updated about a year ago.

Sorry this is such a vague question but as my model previous took about 20 hours to run, and I’m typically running it 1-2 times per week, any speed increases are of great use. I thought it was worth a shot asking here!

Can you share anything about your model?

Does it use ode solvers, algebra solvers, anything special? Do you know what the bottleneck is in your model? Its hard to say much without any details.

We did not observe any regression between 2.19 and 2.21 of this magnitude, at least that we know of.

1 Like

Sure, I’m actually modelling underdispersed count data so I use a generalised poisson pdf. This was built for me by someone much smarter than me.

One of the quirks though is that I have to set init = 0 as otherwise the initialisation of the sampling fails. From I call it returns negative infinity. Perhaps that’s off topic but just in case that would effect anything on the speed front.

Here is the pdf:

functions {
  // Generalized poisson with its mean and standard deviation as parameters.
  real generalized_poisson_ms_log(int n, real mu, real sigma2) {

    // Reparametrization.
    real theta  = pow(mu, 1.5) / sqrt(sigma2);
    real lambda = 1 - sqrt(mu) / sqrt(sigma2);
    if ((theta + lambda * n) < 0) {
      return negative_infinity();
    } else {
      return log(theta)
             + (n - 1) * log(theta + n * lambda)
             - lgamma(n + 1)
             - n * lambda
             - theta;

This might be related to the sampler changes frlm 2.19 to 2.21 or you may have uncovered some unkown regression.

You could debug this by trying out the cmdstanr interface that uses 2.24 stan version.

Does anything else happen in your model or does the model block just include that function call? There might be something else that was changed since 2.19.

Also you mentioned that it used to take 20 hours to run, is this because you have a very large dataset or a very complex model?

Is the additional time spent in warmup or sampling?

There was an lgamma regression that I remember, but cant remember which version had that. Though that was not a 2x-3x slowdown.

That’s a good point, it was this PR that changed lgamma, might be the culprit

The issue there was that the threadsafe version of lgamma (lgamma_r) wasn’t available in mingw32, but I wonder if this is still the case now that its at RTools4. What do you think @wds15?

Unfortunately both! I typically about 35k data points with about 30 parameters. I do also get quite variable chain speeds, particularly as the data set gets bigger, but the model always converges to sensible values without divergent transitions.

I’ve tried to iron the variable chain speeds out but still struggle a bit. One methodology that’s helped is setting increasingly more thoughtful and sensible priors (sometimes hard to think about on the log scale) but this has only achieved so much. The restriction of having to start the warm up at init = 0 I feel is part of the problem, as some of the parameters posterior would never have a value in that range. I would like to place sensible upper and lower bounds on these parameters away from 0, but can’t because the warmup fails (without init = 0).

It seems sampling

I do now get that warning:

In system(paste(CXX, ARGS), ignore.stdout = TRUE, ignore.stderr = TRUE) : 'C:/Rtools/mingw_/bin/g++' not found

But from what I understand from another post I can ignore that.

1 Like

Are the number of leapfrogs the same? Try: get_num_leapfrog_per_iteration(fit)

Do you have a saved model from the old run that you can look at by any change (I’m not even sure how possible it is to serialize these things so I won’t be surprised if no)?

(Edit: What we’re doing here is trying to figure out if this is a sampling thing – if the sampler seems to be doing a different thing, or if this is just a computational thing – that stuff is slower per gradient evaluation)

Are you able to post the full model? That will help us narrow things down

I seem to get a bunch of leapfrog numbers, the mean values:

Old model: 176
New model: 448

1 Like

Well that’ll do it. I’ll have to get back to you on this, but can you share your model/data or is that private?

I would need to get permission from a bunch of other people first. Can this be avoided?

I really appreciate the responses I’ve been getting on this, wasn’t expecting so much feedback so quickly!

And what’s the effective sample size of some parameters between the two? Is it the same (in which case the new model is working much harder to do the same thing)?

I’d look at the parameters with low ESS to do this comparison.

Yeah. I’ll send you more instructions later but what we’ll need to do is a bit of a binary search to figure out where in time this problem occurred so we can identify the commits that might have caused it.

Are you using rstan? There are ways to set init to the values you want and not just 0 or random. I don’t have it on hand, but you can use a previous run to get the dimensions for all the parameters and set those to sensible values like rng outputs of your priors