Consider the following generative model to explain how the data arise.
The following figure, theta 1 and theta 2 mean model parameters.
In the model, theta 2 does not effect the data.
To validate the model, one method is to replicate the data from a known parameter and show that the mean of estimates over replications and true parameter does not differ.
The following model, the second parameter theta 2 does not affect directory the generative process of data. So, we no need to assume the true parameter theta 2 to generate the replicated data.
So, lack of the true parameter theta 2 leads us the impossibility of comparison of estimates and true parameter ?
Thanks for letting me know the idea. But I deem it is difficult to obtain some function \varphi() such that \theta_2 = \varphi( \theta_1)( = \varphi_f( \theta_1) ).
I build some hierarchical Bayesian Model with rstan and reviewers required the validation of my model. I showed some compatibility of existing methods and my proposed methods, however reviewer dose not believe such existing methods. So, I have to show the compatibility with truth !!
Thank you for letting me know the paper.
I will try to validate my model along the paper.
I really do not know such validation methods, so it may helps me.
Roughly speaking, this paper construct some test statistics which is uniformly distributed under the null hypothesis that the MCMC sampling is correct. And if MCMC sampling is not correct, then the histogram of the test statistics become skew shape and this deviation from uniformity tells us the MCMC contains bias. I want to implement but it needs to calculate the above quantities.
I have a question:
Is this method available for improper priors or improper posteriors.
How to choice the function f for the rank statistics:
I am not sure but
To focus on only one parameter then I think f=f_i=f_i(\theta_1,\theta_2,\cdots,\theta_n)=\theta_i ?.
To pool all prameters, the Euclid norm ?f=f(\theta_1,\theta_2,\cdots,\theta_n)= \sqrt{ \sum \theta_i ^2 }
Now, My model use improper priors for standard deviations or means for Gaussian. So for application of the SBC (Simulation based Calibration) I must use more stronger priors.
In the above, I use (\theta_1,\theta_2,...\theta_n) for the parameters of model (Not MCMC samples). That is (\theta_1,\theta_2,...\theta_n) \in \Theta. I am not sure how the rand statistics is affected from the definition of f.
I implement the Simulation Based Calibration (SBC) using f:\Theta \to \mathbb{R};(\theta_1,..\theta_d) \mapsto \sum\theta_i^2 for the rank statistics described in the following paper.
Then the hist gram of the rank statistics is as follows:
It shows that the histogram is far from uniformity.
I think the reason why the histogram is far from uniformity is the misspecified priors described in the section 6.1 in the paper. If data is plausible then by the function rstan::rstan::check_hmc_diagnostics() MCMC procedure is correct. However, in my model, if data from likelihood (model) with parameters from priors are not plausible, then the MCMC sampling contains divergent transitions in my model.
Generally speaking, the frequentist method is not use the prior, ( except lasso, Firth, ridge…) and my model also do not need priors when I fit my model to plausible data. But to implement the Simulation Based Calibration, I have to chose appropriate priors for drawing the plausible data.
In my case, the SBC give me validation of priors more than the validation of MCMC procedure.