Validation of model

The following validation procedure is correct ? or has any problem and if so, why?

To validate model, I use the following:
\theta^{\text{ true}} : a give true model parameter
f(y|\theta) : a likelihood
\pi(\theta|y): a posterior for data y

  • Draw datasets y_1,y_2,...,y_N from a likelihood f(y|\ \theta^{\text{ true}}) .
  • Draw n posterior MCMC samples \theta_1(y_i),\theta_2(y_i),.....,\theta_n(y_i) from posterior \pi( \cdot|y_i) for each dataset y_i.
  • Calculate a posterior mean \overline{\theta(y_i)}:=\sum_a \theta_a(y_i)/n for each datset y_i.
  • Calculate the mean error \sum_i\{\overline{\theta(y_i)} - \theta^{\text{ true}}\}/N or a variance \sum_i\{ \overline{\theta(y_i)} - \sum_i\overline{\theta(y_i)}/N\}^2

If this error is small, then can we conclude that our model is bias free?

I observed that the error decreases as the number of samples n increases.

Note that
I also tried to implement the SBC, but it needs priors and if i chose more non-informative prior, then SBC sampling failed since non-informative prior generates odd datasets. If I chose a prior in which SBC histogram is good, then such model dose not fit to various datasets.

1 Like

Have you considered graphical posterior predictive checks? https://mc-stan.org/bayesplot/articles/graphical-ppcs.html

3 Likes

It is possible that this means you could improve your priors (possibly introducing some dependencies between parameters or something). On the other hand, if figuring out a better prior is too costly,
I think it is reasonably safe to run SBC on a model with informative priors and if it works, then conclude it will work even with wider priors. Especially if the “informative” prior is something like N(0,1) while the prior you actually want to use is more like N(0,5) than N(0,100).

I am a bit out of my depth with this, but my best, slightly speculative, response follows:

I think that in some sense, you could say that the model is bias free for the specific \theta^{true} value you’ve used. In a working model, you would however always expect a bias towards prior mode (bigger if the \theta^{true} has low prior probability and if your dataset is small), so not sure this tells you much about correctness. The statistic you propose is IMHO actually telling you how big influence your prior has over your inferences (given the size of the simulated dataset).

however always expect a bias towards prior mode (bigger if the \theta^{true} has low prior probability and if your dataset is small), so not sure this tells you much about correctness.

To remove the bias causing by fixing only one model parameter \theta^{\text{ true}} , it needs to integral the above error over all possible \theta^{\text{ true}}, and then I need prior to calculate such a integral.

In my model, if variance of prior is large, then odd model parameter is generated, (e.g., Poisson rate =0) and such parameter does not allow Stan to execute SBC algorithm.

I am not sure, if a strong prior gives a uniform SBC histogram, then it can validate model?

Is it better to do SBC algorithm with weaker priors?

I am not sure how to select a prior.
Non-informative, improper prior is best for me, since using such prior, I get various good fitting to various data-sets. One fixed informative prior cannot fit a model to various data-sets, since such prior is bias for many data-sets.

I have a Transcendental idea.
SBC algorithm is a validation with respect to a prior \pi(\cdot).
We can calculate p value to test uniformity of SBC histogram w.r.t. a prior \pi(\cdot), and we can regard p value as a function over the set of all priors \{ \pi(\cdot) \}. If there is a measure \mu over the set of all priors \{ \pi() : \int \pi(\theta)d\theta=1 \} (e.g., Wasserstein metric) , then we can calculate the mean of p value over all priors, namely,

\text{ mean of p value} = \int_{\pi}[ \text{p value w.r.t } \pi] \mu(d\pi)

But it is difficult to calculate such such measure on an infinite dimensional set.

I think this strongly suggests your priors are problematic. Since you can a priori say that Poisson rate of 0 is a nonsense, you need a prior that puts negligible probability on parameters that would make that so. Similarly, if your model involves something like Y_i \sim Poisson(\exp (\mu_i)), then you a priori know that \mu_i < 710 just because you assume you can use computer to fit your model (\exp(710) is bigger than the largest double). You can probably put that bound way lower using very little background knowledge - e.g. if we are modelling the number of insurance claims, \mu_i \simeq 23 already means that you have more claims than there are people on earth, so you can safely put basically 0 probability on that.

Does that make sense?

I don’t fully understand the idea at the end of your post, but to me, it looks like you are just making your life more difficult :-) But maybe there is a cool theory paper somewhere there - I am really not competent to judge this well.

Now, in order to generate appropriate model parameters,
I try to make priors so that
Poisson rate will be greater than or equal to some small positive number.

Hahaha , You got me! ;)
I want to live more easily ! :'‑D