I’ve been looking into simulation based calibration recently with the hope of getting a Python script to automate the process using cmdstan. However, I’ve been having some trouble understanding the the advice provided in the Stan User’s Guide. Specifically 25.3.4:
Here it says that the samples should be thinned down to the effective sample size to ensure the samples are independent. The estimation of ESS for stan is based on the sample draws from multiple chains (16.4 Effective sample size | Stan Reference Manual). This fact is supported by the stansummary output, which gives ESS equal to the total number of samples if the input samples are from only one chain.
However, this seems to contradict the example given in 25.3.2:
Here the transformed data block is used to sample random values of the parameters from which the simulated data is generated.
My question is this: How we can draw random parameters inside stan from which to simulate the data, yet still run multiple chains on the same data from which to estimate the ESS? The only way we could ensure the data for the different chains is the same is if we set the random seed, but then the samples would be the same too.
I did consider setting different random starting values for each chain in the init files, but I wasn’t sure how this would influence the random seed for the data generation (although thinking about this now it would be easy to test). In any case this isn’t mentioned in the guide so I think I’m probably misunderstanding something when it comes to either the SBC procedure or calculating ESS.
Hi @hyunji.moon, @Dashadower, and @bnicenboim have all three worked on automating SBC. Perhaps they can offer some insight. I for one am very much looking forward to simplifying the usage of SBC so that it can easily become a standard procedure in model development.
I’m not sure if parallelizing the model works in this case. As far as I’m aware, parallelization defined in the stan file is only used for faster evaluation of the likelihood so I don’t think this can be applied to run parallel independent chains.
The ESS estimates work for single chains, they’ll just be less reliable.
I suspect it’s written that way to avoid writing any r or Python in the User’s Guide. We try not to put any interface stuff in the User’s Guide – strictly Stan code.
I think we’re assuming that the ESS is going to be basically the same regardless of the data drawn.
Maybe not a good assumption, but I think you’d estimate the ESS/thinning separately and then do SBC. Like, generate some data, fit the model, and use that to pick the amount of thinning for an actual SBC calculation.
Then separately do your SBC calculations (and don’t worry about calculating ESS for each chain of SBC).
Thanks for clarifying that, I’ve seen different definitions of ESS and wasn’t sure if it’s valid to use the autocorrelation from just one chain.
I think that’s good practice for SBC in general. It’s probably best to keep things standardised, especially since in the long term I’m looking to run SBC on a large set of models (most likely those in posteriordb).
That sounds like a decent assumption to make. I suppose running a few extra chains to estimate the ESS doesn’t make much difference in the context of SBC.