What if posteriors are not sensitive to priors in Bayesian inference?

I have a question related to the priorsense package if you have an idea. I read and understand that testing prior sensitivity is important. But what if the posteriors are not sensitive to priors, so posterior is non-sensitive to priors? So what is the meaning of using Bayes inference in this case?

Can someone share ideas? Though we say, Bayes inference is criticized as it is sensitive to priors. But what if it does not have this.

Hi @nguyenthiphuong, welcome to the Stan Forums!

First of all, I think you need to remove the template text from your post so that only your own text remains.

If the posterior is not sensitive to the prior, this simply means that the information from the data overwhelms the information from the prior. But perhaps @n-kall has more to contribute on this (he has developed a great R package for prior diagnostics and sensitivity analysis: GitHub - n-kall/priorsense: priorsense: an R package for prior diagnostics and sensitivity, see also the corresponding preprint: [2107.14054] Detecting and diagnosing prior and likelihood sensitivity with power-scaling).

3 Likes

Hi @fweber144 , Thanks for your response and comment and the link. Hope @n-kall can response to this as well?

Welcome to the Stan Discourse @nguyenthiphuong,

I took the liberty of editing that out, but as @fweber144 mentioned make sure you remove the template so readers will not be distracted and will get to your point quickly (for me thatā€™s one of the most important things when trying to reply: when I donā€™t have a lot of time I cannot spend time deciphering posts that are overly long or cryptic).

As @fweber144 also pointed out, that is the short (and correct) answer, but itā€™s a part of a much longer discussion, much of which are subjective opinions on priors being subjective. Another short but important point is: there is really no such thing as an uninformative prior (often used as a substitute for flat/uniform priors); for any given model a certain prior can be uninformative in one case and informative in another (e.g. different labs, one of which has measured some quantity previously, the other that never has). More importantly, thereā€™s no such thing as a default choice of priors, and doing MLE has the implicit assumption of flat priors, but itā€™s still a choice. So thereā€™s some confusion in the ā€œFrequentist-Bayesian Stat Warsā€ that is not backed by actual probability theory (though thatā€™s also my personal opinion, and Iā€™m sure we could have a long discussion with diverging points of viewā€¦)

1 Like

Thanks @caesoma for your reply,

First thing on the post editing, yes, I have not been familiar with this platform, hence quite confused at the first instances, but good to know your view and comment on that. It will be better from now I think.

Regarding the sensitivity test, the reason I asked this because in Bayesian inference, the advantage is that we can update the information, and if the posteriors are not sensitive to priors so what is the meaning of updating information. I read in some literature that, if the posteriors are not sensitive to priors, which mean that the (original) results using the default setting priors are robust, regardless of altering priors. Assuming that, I would have a new dataset (new measurements of the same population), and now I would like to have predictions. It would mean I also use the default priors (non-updating) or use the posteriors estimate of the (old) previous datasets as priors as now I update the information.
If the information from the data overwhelms the information from the prior, which means that the data fits well with the model regardless the priors altered and any non-informative priors are somehow informative depending on the context, should we conclude we have a good model selection to fit the data or a good data is collected to predict the posteriors (i.e., the relationships between the dependent variable and the predictors) in that context with such a dataset at hands?

Well, first things first: not being sensitive to priors likely only applies to a specific choice of priors, if you change them to make them extremely informative (e.g. replacing them with an approximation of the posterior of a previous experiment) will likely change that. Strictly speaking, the posteriors cannot be completely insensitive to priors, but the likelihood can dominate the posterior rendering it robust to small changes in the priors ā€“ still, whether that happens or not is not a feature of the model likelihood structure alone, but also of each choice of priors and the associated data.

If you get to the point where you collect new data, update the priors with previous estimates and continue to repeatedly obtain similar posteriors it means that you exhausted you ability to improve your estimate by obtaining more data and doing inference with new data sets. But that is only relevant to the context of that specific model choice, itā€™s possible that other (ā€œbetterā€) choices will allow you to extract more information (e.g. instead of sequentially fitting data and updating the priors you could fit a hierarchical model with all data sets and a hyperprior that accounts for between experiment variation).

Long story short (and again incomplete): parameter values are a result of the interaction of model choice (the general structure of the likelihood), the data set(s), and last and often least the choice of priors; the conclusions are usually system specific and depends on a host of factors related to those components, so thereā€™s no theoretical replacement for domain understanding and careful consideration and justification of experimental design, data collection, and modeling choices.

2 Likes

Hi,

The answers given so far are great, so I will focus specifically on the diagnostics given by priorsense. The prior sensitivity check in priorsense is done via power-scaling, which you can think about as strengthening or weakening the original prior a small amount, and seeing how the posterior changes. If the diagnostic shows that there is little or no prior sensitivity, it means that small changes to the chosen prior donā€™t affect the posterior that much. But this does not necessarily mean that the choice of prior has no impact on the posterior, as choosing a completely different prior, especially if it is very informative, could still change the posterior.

2 Likes

Thank you @caesoma and @n-kall for reacting to my concern. Now it becomes more clear in my mind what is really happening and consequences/interpretation of non-sensitivity of posteriors to altered priors. Personally speaking, what I found in priorsense package is the change in the distribution of posteriors (isnā€™t it?), correct me if I am wrong @n-kall ? And it does not change the mean value of prior.
Thank @caesoma for discussing the second point regarding the new dataset (as being assumed) as I am working with a dynamic system, and the validation of the model prediction is important as it might give good/bad projected scenarios (of crop diversity) if the conditions change, and it might give an informed decision on whether an action needs to be implemented to protect the natural plant diversity. That is why I might re-collect data, especially we expect the system might change due to the pandemic.
But looking at different views would be interesting indeed, but it also depends much on the purpose of modelling and its validation.

priorsense re-weights the posterior draws to estimate what would happen with a different prior. Depending on the distribution of the prior, the power-scaling can change the mean or not. For example power-scaling an exponential or beta prior can change both the mean and variance, but power-scaling a normal distribution will only change the variance. You can check Table 1 and Figure 3 from the paper for examples.

1 Like

Thanks @n-kall . Another very (silly) question but why prior-scaling cannot change the mean but only variance in the normal distribution? I have went through the paper but miss this point.

Thanks!

Itā€™s because itā€™s a symmetrical distribution, so the power-scaling acts the same on both ā€˜sidesā€™ of the distribution and therefore just makes the distribution wider / narrower. If the distribution is asymmetrical then it will shift the mean as well as changing the width.

1 Like

I am still not fully convinced with the normal distribution @n-kall . It is because, with default setting priors, the mean is normally 0, but if we have different mean (mean =! 0), then I set the different belief (trust) in the effect of the factor (predictor, x). Letā€™s say, if mean > 0, then we believe a positive effect of the predictor and vice versa. What do you think about this?

My explanation was specifically to do with the method used by priorsense, which we call power-scaling but is also known as tempering. This method is not able to shift the mean of the normal distribution, even if it would be useful to do so. The priorsense sensitivity check is testing for a specific type of sensitivity ā€“ sensitivity to power-scaling.

If you are particularly interested in the sensitivity to changing the mean of a prior, you could look into bayestestR::sensitivity_to_prior or adjustr.

2 Likes

Thanks @n-kall . Yes, now I see what you meant, my misunderstanding. Thanks for your suggestion!

Hi @n-kall , I read the vignette of the priorsense and it seems that at the moment, the package has not worked with models fitted in rstanarm (e.g., multilevel model), is it? I am running the multilevel model (2 levels) and fitting with brms works well with full model construction, priorsense check but got problem with variable selection step (which might be related to computational cost, I checked and if I reduced sample size to one-forth, plus number of draws, fold cross validation, it improves in terms of time consuming a bit but still very low). So I tried with stan-glmer but donā€™t know if the model can be checked with prior/likelihood sensitivity in this case, though variable selection using projpred package still works very very slowā€¦
If I donā€™t use the multilevel model, it worked much faster. But the idea of multilevel sounds more reasonable in my case with I have data with different groups.
Do you have some suggestions for me to work with priorsense (with models fitted by stan_glmer) or can I use brms for prior/likelihood sensitivity check and then fitting again with stan_glmer for the variable selection which might help me to reduce computational cost?
Thanks!

Thatā€™s correct. priorsense requires the log prior evaluations, which needs to be specified in the Stan code. brms models are compatible because they have the lprior specified in the Stan code, but as rstanarm uses pre-compiled models, it is not trivial to add this functionality.

I think this is a sensible option, as long as the same model and priors are used in both.

Thanks Paul. I am still wondering why moving from model fitting from Gaussian family to multilevel with Poisson family it costs so much of time. I will try with the options above to see.