Hierarchical model for categorical data - sampling takes too long

Hello STAN experts,

I’m running a hierarchical model for predicting categorically distributed data – ratings made at different time points on a 0-10 scale, before and after receiving feedback. Specifically, the data were collected from 51 participants, where each participant has a total of 40 trials and received a feedback on each trial - so I’m trying to estimate the effect the feedback had on changes in participants’ ratings at the single-trial level. In the model I compute the probabilities for each possible categorical outcome on the scale (i.e. 11 possible outcomes and 10 thresholds) by using the beta_cdf function, and this is implemented in the transformed parameters section (the model did not work when this was placed in the model section for some reason). Since this is a hierarchical model I have a pretty big set of 39 parameters. Currently the sampling takes way too long: I’m only at ~35-40% after 2.5 weeks, for 6 chains, 2000 iterations each, with 500 warm-up and storage of 1500 iterations per chain.

I’m running the model in MATLAB STAN on a computer which I believe is considered fairly strong (the processor is Intel Xeon CPU E5-2683 v4 @ 2.10GHz, 2098 Mhz, 16 cores, 32 logical processors and 128GB RAM). The model exploits 2 CPUs and chains are running in parallel. I tried running it on R as well, but sampling time wasn’t shorter even when implementing R-STAN suggestions for speeding up sampling time (comments on toolchain and parallel sampling here - https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started).

Any suggestions about how to speed things up? The model was developed together with an expert in Bayesian statistics, so I assume that it is correct. I thought about the following options, but I’m not sure about how to implement them:

  • Looking at previous posts regarding slow sampling times, it seems that I can specify that only samples of specific parameters which I’m interested in will be saved, and that this may save some time – how can this be achieved?
  • Allocation of more CPUs – does anybody know how to do this in MATLAB STAN?

Any advice will be greatly appreciated.

Thanks,

Ofir