STAN model is very slow when using large data (only uses one CPU!)

dippatel1994 · September 6, 2023, 4:47pm

Hi Community,

I am currently using cmdstanpy to train a model, but with >3k records, it gets very slow. I have noticed that It only uses 1 core no matter what I do per chain. So, I read about the reduced_sum method and want to implement it in my code to utilize all available cores (I have 96 cores available). I have a model block, which is shown below. Can you help me with the correct reduced_sum implementation for this?

for (b in 1:B){
        s_level[b] ~ std_normal();
        
        for (r in 1:R){
            intercept_raw[b,r] ~ std_normal();
            sigma_raw[b,r] ~ std_normal();
            
            beta_seasonality_raw[b,r] ~ std_normal();
            
            value[I[b,r,1]:I[b,r,2]] ~  normal(mu[I[b,r,1]:I[b,r,2]], sigma[b,r]);;
        };
    }

Christopher-Peterson · September 6, 2023, 5:08pm

It’s hard to say much without more information, but the first thing I’d recommend is vectorizing your likelihoods and priors, instead of doing them one-by-one in the for() loop. This may reduce the need for reduce_sum().

// declare an in-block variable
s_level ~ std_normal();
to_vector(intercept_raw) ~ std_normal();
to_vector(sigma_raw) ~ std_normal();
to_vector(beta_seasonality_raw) ~ std_normal();

// Do something to align value, mu, and sigma to the same set of indices; may require a for loop

value ~ normal(mu_aligned, sigma_aligned);

dippatel1994 · September 6, 2023, 5:19pm

Thank you so much @ Christopher-Peterson for your reply. I am new to stan programming and don’t know how I can implement it. I would appreciate it if you could help me vectorize this block. I am really grateful for your response.

Christopher-Peterson · September 6, 2023, 10:01pm

I can help, but I’ll probably need to see the whole model code.

Incidentally, it may be worth seeing if brms can fit your model (although that would be based in R, not Python); it can automatically generate Stan code that uses reduce_sum(), and generally uses efficient parameterizations.

Topic		Replies	Views
RStan parallelising using reduce_sum() Modeling	1	447	July 30, 2021
Parallelization in Stan's models General rstan	5	119	May 19, 2025
Trouble with within-chain parallelization with cmdstan (via cmdstanr) Modeling	4	872	August 21, 2020
Increasing Stan efficiency by vectorizing for loop Modeling	6	680	October 9, 2022
Multiprocessing and/or multithreading problem - CmdStanPy Modeling cmdstanpy , paralellization	12	105	January 2, 2025

STAN model is very slow when using large data (only uses one CPU!)

Related topics