Hi all,

This follows from an earlier post here: SBC for arma model

I’ve been looking to use the 12 benchmark models as a means to validate new components in Stan. As a starting point, I have been doing SBC for the models using the current version of Stan.

The idea is to test things like SMC-Stan (which is currently a fork of CmdStan). I’ve therefore written some Python code which works with CmdStan to generate SBC histograms. The code, along with SBC-versions of the benchmark models, can be found here: https://github.com/PhilClemson/pySBC

The code works by running for a number of chains specified by the number of bins and expected bin count. The effective sample size is calculated for each chain separately (I think this makes sense, but happy to discuss), and the chains are then re-run with the same random seeds but with appropriate thinning based on the ESS. There’s also some error handling to account for things like random seeds where the chain doesn’t succesfully initialise, as well as time outs for chains that are taking a long time.

**Results**

The following models gave the expected uniform histograms.

eight_schools

garch

low_dim_corr_gauss

pkpd

The gp models also gave mostly uniform histograms apart from for the rho parameter.

gp_regr

gp_pois_regr

However, some of the models had non-uniform histograms, seemingly indicating bias in the model / algorithm (this includes the arma model from the previous post).

irt_2pl

low_dim_gauss_mix

low_dim_gauss_mix_collapse

For the latter two models I tried running for 10000 sampling iterations (vs 1000) but I didn’t see much improvement.

low_dim_gauss_mix (10000 samples)

low_dim_gauss_mix_collapse (10000 samples)

For the remaining two models (arK and sir) I wasn’t able to perform SBC in a reasonable time frame due to the tight constraints in the models and the difficulty in finding good random seeds.

**Questions**

I guess the main request would be if someone like @avehtari or @hyunji.moon could confirm that I’m doing SBC correctly. For example, I’ve had some questions around the ESS criteria previously (see Effective Sample Size in SBC) so it would be good to check that I’m not doing anything silly there.

If that looks good then it might also be worth checking the simulation code for the models that seem to fail SBC. @martinmodrak already spotted a mistake in the original arma_sbc code, so it might be that there are others lurking in those files (I apologise in advance if there’s any as obvious as that one!).

Also happy to discuss any other ideas people might have!