Reduce_sum_static fails to compile with SIGSEGV error - PyStan3

Hello all. I am trying to implement reduce_sum_static to parallelize over N (participants; this is why grainsize=1), but when I compile my model in PyStan3, I get an error Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

functions {

    real partial_sum(real[,] RTu_rdm, int start, int end, real[,] RTl_rdm, real[,] Cohu_rdm, real[,] Cohl_rdm, 
                    real[] delta_rdm, real[] alpha_rdm, real[] tau_rdm, real[] beta_rdm, int w, int[] idx_rdm_obs, 
                    int[] Nu_rdm, int[] Nl_rdm) { 
        real lu = 0;
        real ll = 0;
        real lt = 0;

        if (idx_rdm_obs[start] != 0) {                                  
            vector[Nu_rdm[start]] delta_cohu;
            vector[Nl_rdm[start]] delta_cohl;

            delta_cohu = delta_rdm[start]*to_vector(Cohu_rdm[start,:Nu_rdm[start]]);            
            lu = wiener_lpdf(RTu_rdm[start, :Nu_rdm[start]] | alpha_rdm[start], tau_rdm[start], beta_rdm[start], delta_cohu);
            delta_cohl = delta_rdm[start]*to_vector(Cohl_rdm[start,:Nl_rdm[start]]);            
            ll = wiener_lpdf(RTl_rdm[start,:Nl_rdm[start]] | alpha_rdm[start], tau_rdm[start], 1-beta_rdm[start], -delta_cohl);              
        lt = lu + ll;  
        return lt;

data {

    int W;                                                          
    int N;                                                          
    int Xdim;                                                       

    int exo_q_num;                                                  
    vector[exo_q_num] U[N,W];                                       

    int<lower=1> W_rdm_obs[N];                                         
    int<lower=0> W_rdm_mis[N];                                         
    int<lower=0> idx_rdm_obs[N,W];                            
    int<lower=1> P_rdm;                                             
    int<lower=0> Nu_max_rdm;                            
    int<lower=0> Nl_max_rdm;                        
    int<lower=0> Nu_rdm[N,W];                     
    int<lower=0> Nl_rdm[N,W];                         
    real RTu_rdm[N, W, Nu_max_rdm];                
    real RTl_rdm[N, W, Nl_max_rdm];                    
    real Cohu_rdm[N, W, Nu_max_rdm];                          
    real Cohl_rdm[N, W, Nl_max_rdm];                  
    matrix[N,W] minRT_rdm;                          
    real RTbound_rdm;                      

transformed data {
    real Q = 1; 
    int cauchy_alpha = 5;  
    int<lower=1> num_par = P_rdm;
    int grainsize = 1;

parameters {
    real<lower = -pi()/2, upper = pi()/2> sigma_unif;  // for cauchy reparametrization; see stan manual, chapter 22.7
    vector[Xdim] mu_prior_x;                    
    vector[Xdim] X[N,W];
    matrix[Xdim, Xdim] A;                   
    matrix[Xdim, exo_q_num] B;             
    matrix[num_par, Xdim] C;                
    real alpha_rdm_pr[N,W];             
    real beta_rdm_pr[N,W];              
    real delta_rdm_pr[N,W];              
    real tau_rdm_pr[N,W];                

transformed parameters {
    real<lower=0> sigma_x;          
    real<lower=0> sigma_v;          
    real<lower=0> sigma_r;                  
    real<lower=0> sigma_a;
    real<lower=0> sigma_b;
    real<lower=0> sigma_c;
    real<lower=0> alpha_rdm[N,W];                 
    real<lower=0, upper=1> beta_rdm[N,W];                  
    real<lower=0> delta_rdm[N,W];                 
    real<lower=RTbound_rdm, upper=max(minRT_rdm[,])> tau_rdm[N,W];

    sigma_x = cauchy_alpha * tan(sigma_unif);      
    sigma_v = cauchy_alpha * tan(sigma_unif);      
    sigma_r = cauchy_alpha * tan(sigma_unif);     
    sigma_a = cauchy_alpha * tan(sigma_unif); 
    sigma_b = cauchy_alpha * tan(sigma_unif);         
    sigma_c = cauchy_alpha * tan(sigma_unif);     

    for (n in 1:N) {
        alpha_rdm[n] = exp(alpha_rdm_pr[n]);                
        delta_rdm[n] = exp(delta_rdm_pr[n]);                  

    for (n in 1:N) {
        for (w in 1:W) {
            beta_rdm[n,w] = Phi_approx(beta_rdm_pr[n,w]);                                                 
            tau_rdm[n,w]  = Phi_approx(tau_rdm_pr[n,w]) * (minRT_rdm[n,w] - RTbound_rdm) + RTbound_rdm;     

model {

    mu_prior_x ~ normal(0,sigma_x); // prior on X mean   
    // put priors an A, B, C
    to_vector(A) ~ normal(0,sigma_a);
    to_vector(B) ~ normal(0,sigma_b);
    to_vector(C) ~ normal(0,sigma_c);          

    for (w in 1:W) {
        for (n in 1:N) {
            if (w == 1) {                                                                                    
                X[n,w] ~ normal(mu_prior_x,sigma_v);                           
            else {         
                X[n,w] ~  normal((A * X[n,w] + B * U[n,w-1]), Q);                                                          
            alpha_rdm_pr[n,w] ~ normal(C[1,] * X[n,w],sigma_r);                 
            beta_rdm_pr[n,w] ~ normal(C[2,] * X[n,w],sigma_r);              
            delta_rdm_pr[n,w] ~ normal(C[3,] * X[n,w],sigma_r);               
            tau_rdm_pr[n,w] ~ normal(C[4,] * X[n,w],sigma_r);                 

    target += reduce_sum_static(partial_sum, RTu_rdm[,w,], grainsize, RTl_rdm[,w,], Cohu_rdm[,w,], 
                                Cohl_rdm[,w,], delta_rdm[w,], alpha_rdm[w,], tau_rdm[w,],
                                 beta_rdm[w,], w, idx_rdm_obs[,w], Nu_rdm[,w], Nl_rdm[,w]);

Python code from PyCharm

posterior =, data=data)

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

Any help will be very appreciated. I feel like I am missing something fundamental here.

Thank you!

Can you try to run outside pycharm?

Working on it now – at the end it suppose to sun on a CentOS7 cluster, so I am setting it up there rn.

A more general question - is this the right approach to set grainsize=1 to parallelize over N?

on the cluster I get the following error while building the model:

compileError(DistutilsExecError("command 'gcc' failed with exit status 1")), traceback: [' File "/pystan3/lib/python3.7/site-packages/httpstan/", line 93, in handle_models\n await httpstan.models.build_services_extension_module(program_code)\n', ' File "/pystan3/lib/python3.7/site-packages/httpstan/", line 207, in build_services_extension_module\n await asyncio.get_event_loop().run_in_executor(None, httpstan.build_ext.run_build_ext, extensions, build_lib)\n', ' File "/apps/centos7/Core/Anaconda3/5.1.0/lib/python3.7/concurrent/futures/", line 57, in run\n result = self.fn(*self.args, **self.kwargs)\n', ' File "/pystan3/lib/python3.7/site-packages/httpstan/", line 97, in run_build_ext\n\n', ' File "/apps/centos7/Core/Anaconda3/5.1.0/lib/python3.7/distutils/command/", line 339, in run\n self.build_extensions()\n', ' File "/apps/centos7/Core/Anaconda3/5.1.0/lib/python3.7/distutils/command/", line 448, in build_extensions\n self._build_extensions_serial()\n', ' File "/apps/centos7/Core/Anaconda3/5.1.0/lib/python3.7/distutils/command/", line 473, in _build_extensions_serial\n self.build_extension(ext)\n', ' File "/apps/centos7/Core/Anaconda3/5.1.0/lib/python3.7/distutils/command/", line 533, in build_extension\n depends=ext.depends)\n', ' File "/apps/centos7/Core/Anaconda3/5.1.0/lib/python3.7/distutils/", line 574, in compile\n self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)\n', ' File "/apps/centos7/Core/Anaconda3/5.1.0/lib/python3.7/distutils/", line 120, in _compile\n raise CompileError(msg)\n']

That is a compilation error.

I’m not sure what is the current situation with the verbosity settings.

Can you try to compile the model with CmdStan (CmdStanPy) and see if there is an error msg.

I have an error while downloading CmdStan to the cluster using install_cmdstan():

INFO:cmdstanpy:stan/lib/stan_math/stan/math/prim/prob/neg_binomial_2_log_glm_lpmf.hpp:139:17: error: expected ?;? before ?theta_tmp? WARNING:cmdstanpy:CmdStan installation failed

This is the last lines.

Is there a way to download not the latest version (2-24.0), but 2-23.0 with install_cmdstan()? Maybe this causes the problem.

I also tried uploading cmdstan-2.23.0.tar.gz directly to the cluster, but cmdstanpy didn’t recognize the binaries.

But when I compile it on my local machine using CmdStanPy, it compiles without errors.

I don’t know if that helps, but PyStan3 fails to compile models that were complied fine with PyStan2.19.

Check the version of the c++ compiler on the cluster in that case.

Checked that on the cluster:

>g++ --version
>g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36)

On my local machine it’s

g++ --version
Configured with: --prefix=/Applications/ --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 9.0.0 (clang-900.0.39.2)
Target: x86_64-apple-darwin16.7.0
Thread model: posix
InstalledDir: /Applications/

I know CmdStan documentation suggests using 4.9.3 or later, but it works fine on my local machine.

Your local compiler is clang 9.0, which is why ot works locally.

I can confirm it will definitely not work with 4.8.5.

ok, that’s good to know about clang! Thanks!
I asked my system admins if C++ compiler can be updated, so let’s hope for the best.
Is this also why PyStan3 models don’t compile?

On the cluster you will not be able to compile with g++ 4.8.5 with any Stan interface, so I would imagine so. All interfaces have the same backend that requires g++ 4.9.3 or clang 5.0+ (we officially say 6 because that is what we test).

Hmm are you sure about that? I run models just fine with pystan2.19 on this cluster…
Also, is there a way to easily change g++ to clang? Maybe cluster’s clang version is ok.

On the cluster run: clang++ --version to see if it exists.

pystan2.19 uses the older version of the Math backend that could be used with older compiler pre-4.9.3.

clang++: command not found… so now it’s all in the hands of the admins to update g++. So many setbacks!
And it makes sense about pystan2.19, thanks!

conda has its own compilers too (conda install possible), but I recommend following the official route on clusters.

Thanks! Btw, I compiled my PyStan3 model in the command line of my local machine and it still fails with Segmentation fault: 11. CmdStanPy still compiles fine, but multithreading doesn’t work there…

I can reproduce the segfault on Ubuntu with gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0.

I will create an issue for this. Even if there would be something wrong, things should never segfault.