How to speed up a process?

Gang · June 14, 2017, 7:32pm

I’m trying to run a model as follows:

require(‘rstanarm’)
options(mc.cores = 32)
fm ← stan_lmer(Value ~ 1 + pulse + (1|Sbj)+(1+pulse|Path), data=dat, chain=32)

with a large dataset:

str(dat)
‘data.frame’: 459360 obs. of 4 variables:
Sbj : Factor w/ 44 levels "14","15","20",..: 1 1 1 1 1 1 1 1 1 1 ... Path : Factor w/ 10440 levels “Seed_10-100”,…: 2925 2945 2956 2967 2977 2986 2851 2862 2873 2884 …
Value: num 0.679 0.5819 0.2531 0.0469 1.2375 ... pulse : num 0.62 0.62 0.62 0.62 0.62 0.62 0.62 0.62 0.62 0.62 …

I see messages like the following:

1000 transitions using 10 leapfrog steps per transition would take 7400 seconds.

This is the first time I use the package. So, here are my questions:

How to guesstimate the runtime based on the above message?
Is there any room to speed up the process to some extent?
Any suggestions on the priors in this case?
Do I have to perform standardization (remove the mean and scale by the standard deviation) for all the variables? The reason I’m asking this is that I used to read somewhere that standardization may help in simplifying the interpretation of the results. Is this the case with rstanarm?

Your help is highly appreciated!

bgoodri · June 14, 2017, 7:45pm

About a month
I would start with stan_lmer(Value ~ 1 + pulse + (1|Sbj)+(1|Path), data=dat). You don’t need 32 chains. That will run reasonably quickly.
No one besides you knows enough about this data-generating process to say, but the only one that matters is on the standard deviation of the intercept shifts across levels of Path.
I would not divide variables by their empirical standard deviation. It is centered internally and shifted back in the output, so you don’t have to worry about that. The prior on pulse is by default a function of its standard deviation. Only divide by constants so that the parameters have reasonable units.

Gang · June 14, 2017, 8:00pm

Thanks a lot for the quick help, Dr. Goodrich!

About a month

That’s quite depressing…

I did a test run with 1% of the data and with 4 cores. It took about 7 minutes.

I would start with stan_lmer(Value ~ 1 + pulse + (1|Sbj)+(1|Path), data=dat). You don’t need 32 chains. That will run reasonably quickly.

I did that for a testing run. However, I would like to say something about the ‘pulse’ effect for each path, and that was the reason I wanted to add the random effects for ‘pulse’ with my original model:

fm ← stan_lmer(Value ~ 1 + pulse + (1|Sbj)+(1+pulse|Path), data=dat, chain=32)

In other words, I’d like to have something like the following in the output:

Estimates:
mean sd 2.5% 25% 50% 75% 97.5%
…
b[pulse Path:Seed_10] …

Please correct me if you’re wrong.

So, with my original model, there is no hope that I would be able to get it done within a realistic time frame?

No one besides you knows enough about this data-generating process to say, but the only one that matters is on the standard deviation of the intercept shifts across levels of Path.

Is that specified through “prior_covariance”? What would be a good choice other than the default (normal?)?

I would not divide variables by their empirical standard deviation. It is centered internally and shifted back in the output, so you don’t have to worry about that. The prior on pulse is by default a function of its standard deviation. Only divide by constants so that the parameters have reasonable units.

Thanks for the clarification!

mike-lawrence · June 14, 2017, 8:53pm

Could Path possibly be the combination of multiple other variables? If so, you might be able to speed things up by separating them out and treating them as separate effects.

bgoodri · June 14, 2017, 9:05pm

There is a chance that it gets done in a week or so. It all depends on the geometry. The prior_covariance takes the output of the decov() function, but the default of an exponential prior on the across-groups standard deviation is as good as any.

Topic		Replies	Views
Improving the efficiency of fitting a model using stan_lmer rstanarm fitting-issues , performance , rstanarm	5	1393	April 12, 2020
"Fixed_param" mode agonisingly slow General	7	1009	August 3, 2019
How to speed up the processing time General	1	43	October 21, 2024
Stan_glmer slow when data variance is really small rstanarm	20	1695	April 8, 2018
Stan_lmer really slow Modeling	10	1632	May 14, 2021

How to speed up a process?

Related topics