I’ve been looking into simulation based calibration recently with the hope of getting a Python script to automate the process using cmdstan. However, I’ve been having some trouble understanding the the advice provided in the Stan User’s Guide. Specifically 25.3.4:
Here it says that the samples should be thinned down to the effective sample size to ensure the samples are independent. The estimation of ESS for stan is based on the sample draws from multiple chains (16.4 Effective sample size | Stan Reference Manual). This fact is supported by the stansummary output, which gives ESS equal to the total number of samples if the input samples are from only one chain.
However, this seems to contradict the example given in 25.3.2:
transformed data block is used to sample random values of the parameters from which the simulated data is generated.
My question is this: How we can draw random parameters inside stan from which to simulate the data, yet still run multiple chains on the same data from which to estimate the ESS? The only way we could ensure the data for the different chains is the same is if we set the random seed, but then the samples would be the same too.
I did consider setting different random starting values for each chain in the init files, but I wasn’t sure how this would influence the random seed for the data generation (although thinking about this now it would be easy to test). In any case this isn’t mentioned in the guide so I think I’m probably misunderstanding something when it comes to either the SBC procedure or calculating ESS.
Thanks for reading!