Start up takes "forever" after slight rewrite of model

I have a situation where I’m using a LOT of data (tens of thousands of survey responses) and I want to keep track of an intermediate quantity, so I modified my model

instead of in the model block having something like:

for(i …)
data[i]/function(parameters[i]) ~ some_distribution()

I have a transformed parameter:

intermediateval[i] = data[i]/function(parameters[i]);

then in the model block:

intermediateval ~ some_distribution()

the point being that I’ve got a big vector now that stores the intermediate value, and then the sampling statement is now vectorized.

after doing this and running stan, it takes “quite a while” (minutes?) to get the first message about the speed of calculation of the gradient, and then once that occurs, sampling becomes sort of similar speed to what was going on before.

Is there some massive one-time calculation that takes place before sampling that would get substantially longer in the case where I have this large intermediate value?

“quite a while” might be between 10 and 30 minutes for that first speed of the gradient statement, and then the sampling takes something like 4 or 5 hours to get 70 iterations

whereas before it would take only a few seconds to get that initial statement about the speed of the gradient, and then several hours to get the iterations.

Hey, at least at the end of the sampling I’m getting a good fit after all my playing with the model specification and parameterization!

Yes, it takes RStan an insane amount of time to allocate storage for a big vector because it makes one list element per cell of the vector.

Thanks. I guess I really need to get cmdstan working. Can I use cmdstan based on rstan install? Or do I have to install it separately?

1 Like