How to validate the model containing the parameter which not relate the data?


Consider the following generative model to explain how the data arise.
The following figure, theta 1 and theta 2 mean model parameters.
In the model, theta 2 does not effect the data.

To validate the model, one method is to replicate the data from a known parameter and show that the mean of estimates over replications and true parameter does not differ.

The following model, the second parameter theta 2 does not affect directory the generative process of data. So, we no need to assume the true parameter theta 2 to generate the replicated data.
So, lack of the true parameter theta 2 leads us the impossibility of comparison of estimates and true parameter ?



Is it possible to consider \theta_2 and f as unknown parameters and solve for \theta_1 as a transformed parameter?

1 Like


Thanks for letting me know the idea. But I deem it is difficult to obtain some function \varphi() such that \theta_2 = \varphi( \theta_1)( = \varphi_f( \theta_1) ).

I build some hierarchical Bayesian Model with rstan and reviewers required the validation of my model. I showed some compatibility of existing methods and my proposed methods, however reviewer dose not believe such existing methods. So, I have to show the compatibility with truth !!



I did not understand your model, but

see Talts et al (2018) Validating Bayesian inference algorithms with simulation-based calibration for discussion on this topic.

1 Like


Thank you for letting me know the paper.
I will try to validate my model along the paper.
I really do not know such validation methods, so it may helps me.

Thank you !!



I read it.

Roughly speaking, this paper construct some test statistics which is uniformly distributed under the null hypothesis that the MCMC sampling is correct. And if MCMC sampling is not correct, then the histogram of the test statistics become skew shape and this deviation from uniformity tells us the MCMC contains bias. I want to implement but it needs to calculate the above quantities.

I have a question:

  1. Is this method available for improper priors or improper posteriors.
  2. How to choice the function f for the rank statistics:

I am not sure but

To focus on only one parameter then I think f=f_i=f_i(\theta_1,\theta_2,\cdots,\theta_n)=\theta_i ?.
To pool all prameters, the Euclid norm ?f=f(\theta_1,\theta_2,\cdots,\theta_n)= \sqrt{ \sum \theta_i ^2 }




What is you indexing here? You can focus on one parameter or any scalar quantity computed from all parameters.



Thank you for reply !!

Now, My model use improper priors for standard deviations or means for Gaussian. So for application of the SBC (Simulation based Calibration) I must use more stronger priors.

In the above, I use (\theta_1,\theta_2,...\theta_n) for the parameters of model (Not MCMC samples). That is (\theta_1,\theta_2,...\theta_n) \in \Theta. I am not sure how the rand statistics is affected from the definition of f.



We highly recommend to use proper priors in all cases.

I get this

But I don’t understand this



I implement the Simulation Based Calibration (SBC) using f:\Theta \to \mathbb{R};(\theta_1,..\theta_d) \mapsto \sum\theta_i^2 for the rank statistics described in the following paper.

Then the hist gram of the rank statistics is as follows:a

It shows that the histogram is far from uniformity.
I think the reason why the histogram is far from uniformity is the misspecified priors described in the section 6.1 in the paper. If data is plausible then by the function rstan::rstan::check_hmc_diagnostics() MCMC procedure is correct. However, in my model, if data from likelihood (model) with parameters from priors are not plausible, then the MCMC sampling contains divergent transitions in my model.

Generally speaking, the frequentist method is not use the prior, ( except lasso, Firth, ridge…) and my model also do not need priors when I fit my model to plausible data. But to implement the Simulation Based Calibration, I have to chose appropriate priors for drawing the plausible data.

In my case, the SBC give me validation of priors more than the validation of MCMC procedure.