Well, after switching to gcc 5.3.0, gcc openmpi, and cloning cmdstan from git it started working.
After installing cmdstan.2.8.11 from tar,
adding to make/local
and making the stan model and running it with mpirun -n 2 (intel MPI) the code runs but you can clearly see that different chains run on different cores and output of those chains is combined. The same code for cmdstan.2.18.0 and on different cluster ran fine and output indicated there was a single chain.
uname -a gives
Linux br006.pvt.bridges.psc.edu 3.10.0-693.21.1.el7.x86_64 #1 SMP Wed Mar 7 19:03:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
g++ -v gives
gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC)
stan model contains in transformed parameters block - I just needed to output log likelihood
lp = sum(map_rect(bnn, phi, theta, x_r, x_i));
and in model block
target += lp;