Within-chain parallelization not working with cmdstanr on linux server

Hello,

I am running a model using cmdstanr and threads that don’t seem to work. On my server I just get:

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
1954950 sdaza     20   0   46024  15128   5764 R  98.1   0.0   4:10.33 model_threads

Where %CPU never goes over 100% . If I run the same model using brms and cmdstanr as the backend, the threads work.

My code is:

library(cmdstanr)
This is cmdstanr version 0.4.0
mm = cmdstan_model("src/model.stan", cpp_options = list(stan_threads = TRUE))
fit = mm$sample(data = dat, chains = 1, threads_per_chain = 10)

model.stan (1.2 KB)

Any ideas on what might be happening?

Thanks!
Sebastian

Are you able to share your model (or at least the part that uses parallelization features)?

I added the model code in the question, so I have to add the parallelization into to stan code? Thanks!

Your model doesn’t look like it is using map_rect or reduce_sum which are the core ways parallelism is used in Stan programs. I think you’re observing the correct behavior in that case.

See Stan User’s Guide - Parallelization

That said, I’m not the expert on parallelism so I’ll ping @jonah to make sure this isn’t a CmdStanR issue

Got it, thanks. Now, I am using a version with parallelization, but still I get an error:

model.stan (2.3 KB)

> mm = cmdstan_model("src/model.stan", cpp_options = list(stan_threads = TRUE))
Compiling Stan program...
Syntax error in '/tmp/Rtmpmi0sj4/model-1ddb5441e5e494.stan', line 21, column 12 to column 23, parsing error:
   -------------------------------------------------
    19:              real sigma,
    20:              vector sigma_county,
    21:              corr_matrix[] Rho ) { 
                     ^
    22:          vector[size(mx)] mu;
    23:          for ( i in 1:size(mx) ) {
   -------------------------------------------------

An argument declaration (unsized type followed by identifier) is expected.
make: *** [make/program:50: /tmp/Rtmpmi0sj4/model-1ddb5441e5e494.hpp] Error 1
Error: An error occured during compilation! See the message above for more information.

Function arguments can’t be constrained types like correlation matrices, so that should just be matrix Rho I believe (ref manual page)

Doesn’t look like anything specific to CmdStanR. You’re right that the within chain parallelization would require using functions like the ones you mentioned.

I think this is correct (the parameter should be declared as corr_matrix in the parameters block but a function should just take a matrix, which can be a corr_matrix or any other matrix). Unfortunately the error message only says “unsized type” and not “unconstrained type” so it’s a bit misleading. @WardBrian should we open a stanc3 issue related to the error message?

It certainly should be better.
I can just open a PR to change it to An argument declaration (unsized and unconstrained type followed by identifier) is expected.

Thanks, that would be great!

1 Like

@sdaza I would also recommend seeing if you can get the simpler example of reduce_sum from this case study working:

https://mc-stan.org/users/documentation/case-studies/reduce_sum_tutorial.html

If that doesn’t work then it will be easier to for us to figure out why within chain parallelization isn’t working for you compared to the more complicated example in the model you shared.

Great, thanks so much @jonah and @WardBrian !

I ran a simple model with cmdstanr, and threads are working on my Linux server. So, my problem is with the stan code and the use of reduce_sum. I will try to solve it after reading the documentation.

Thanks!
Sebastian

2 Likes

I tried to run my model with parallelization but without success. Clarification: the stan code I am using was generated using the rethinking package.

When I change a bit the functions segment so that corr_matrix[] becomes just matrix[], still I don’t know what to do with the rest of the code to avoid semantic errors (sorry, I am new at this):

functions{
    real reducer( 
            vector mx,
            int start , int end , 
            int N,
            vector mx_sd,
            int[] time_period,
            int[] period,
            int[] time,
            int[] county,
            vector b_time_period_county,
            vector b_period_county,
            vector b_time_county,
            vector a_county,
            real b_time_period,
            real b_period,
            real b_time,
            real a,
            real sigma,
            vector sigma_county, 
            # corr_matrix[] Rho) { 
            matrix[] Rho) { 
        vector[size(mx)] mu;
        for ( i in 1:size(mx) ) {
            mu[i] = a_county[county[start+i-1]] + b_time_county[county[start+i-1]] * time[start+i-1] + b_period_county[county[start+i-1]] * period[start+i-1] + b_time_period_county[county[start+i-1]] * time_period[start+i-1];
        }
        return normal_lpdf( mx | mu , sigma );
    } 
}
Semantic error in '/tmp/RtmphvQFFv/model-387cb4564bdf94.stan', line 66, column 14 to line 83, column 17:
   -------------------------------------------------
    64:      YY ~ multi_normal( MU , quad_form_diag(Rho , sigma_county) );
    65:      }
    66:      target += reduce_sum( reducer , mx , 1 , 
                       ^
    67:              N,
    68:              mx_sd,
   -------------------------------------------------

Ill-typed arguments supplied to function 'reduce_sum':
(<F1>, vector, int, int, vector, array[] int, array[] int, array[] int,
 array[] int, vector, vector, vector, vector, real, real, real, real, real,
 vector, matrix)
where F1 = (vector, int, int, int, vector, array[] int, array[] int,
            array[] int, array[] int, vector, vector, vector, vector, real,
            real, real, real, real, vector, array[] matrix) => real
Available signatures:
(<F2>, array[] real, int) => real
where F2 = (array[] real, data int, data int) => real
  The first argument must be
   (array[] real, data int, data int) => real
  but got
   (vector, int, int, int, vector, array[] int, array[] int, array[] int,
    array[] int, vector, vector, vector, vector, real, real, real, real,
    real, vector, array[] matrix) => real
  These are not compatible because:
    The types for the first argument are incompatible: one is
     vector
    but the other is
     array[] real
make: *** [make/program:50: /tmp/RtmphvQFFv/model-387cb4564bdf94.hpp] Error 1
Error: An error occurred during compilation! See the message above for more information.

I have been looking for examples using correlation matrices without success. Any suggestion or guidance will be welcome. Thank you again!

Here the data: example.csv (773.9 KB)

Ok great!

Are you going to be passing in a single matrix or an array of matrices? With this code the argument expects an array of matrices. For a single matrix you would just use matrix Rho and not matrix[] Rho.

There may be other issues with this reduce_sum code, but unfortunately I’m a bit swamped at the moment and don’t have time to debug the rest of it right now. Given that this is now a slightly different issue than in the original post, you could try starting a new topic here on the forum to get more eyes on this and hopefully more people can help debug.

Thanks, no problem. I tried also matrix Rho, but it didn’t work. I will create a new question, thanks.

1 Like