Hi all, I am so glad to find this helpful community online!

I am using cox model in Rstan (leuk.stan) to test simulated survival data.

I only use one chain and 5000 burns, because I need to calculate thousands of models at once. After some sample tests they work well at small scales.

Then, my model for Y~Trt+X_1+…+X_30(all Xs are iid) works very well. I ran 500 simulations, within each simulation 10,000 MCMC samples were drawn, which were later translated into causal estimates I need. Then I check the bias and coverage probability of the credible intervals: I found almost 95% of the intervals, ranging in (0.025 quantile ,0.975 quantile) of the 10,000 estimates from each simulation, covers the true value.

However, when I turn to model for Y~Trt + X_1+…+X_15+Trt:X_1+…+Trt:X_15, the bias of estimates are still fine, but they have an average of 91%, instead of 95%, of the intervals mentioned above covers the true value.

Because my treatment is binary, I treated Trt:X_i as a first order term in the model, just like X_i. I haven’t yet figured how to write a stan model with interaction terms. I wonder if this is the problem and if yes, what should I do.

I am new to stan and Bayesian methods. Is there anything I miss? Thank you in advance for any comments.

There are no coverage guarantees for Bayesian intervals, in particular the width of a Bayesian posterior interval will not in general equal any definition of coverage. For more discussion you might find https://arxiv.org/abs/1803.08393 helpful.

Thank you for your quick reply. I am reading it. Could you give me a brief hint for why, by following the above process with non-informative priors for the constant parameters, it has 95% of the chances those intervals cover the true value? Are all the tests are coincidences, or do I miss something to interpret my simulation in a correct way?

I have checked these links, and I am still a little bit confused. Do some of the answers imply there is a way to find the coverage probability right? Thank you for the help!

This is a result of *asymptotics*. As your data become more informative relative to the complexity of your model the posterior becomes more and more Gaussian and posterior intervals converge towards maximum likelihood intervals. The fact that you see some posterior intervals with matching frequentist calibration but not others indicates that your model isn’t fully in the asymptotic regime.