Is it ok to cross-validate prior choice?

Hi,

i want to incorporate measurement error of the dependent variable in my modeling. Unfortunately, i don’t know much about the size of the measurement error and i don’t expect to learn about it from the model itself. I have lots of data though. So, i wondered whether it would be reasonable to cross-validate my prior choice? I would specify different priors for the measurement error and using these different priors, predict the observed variables in the validation set and compare those predicted to the observed ones. The prior with the best prediction is the most appropriate one?

Best, Felix

No.

That’s a typical maneuver in machine learning settings where they don’t usually have the machinery to fit both the prior and the data at the same time. In something like Stan, the typical approach is to build a hierarchical model to jointly fit both the prior and the data. For example, if you have a regression

y ~ normal(x * beta, sigma);
beta ~ normal(0, tau);
sigma ~ lognormal(0, rho);

we won’t do cross-validation to estimate the prior parameters tau and rho, we’ll just fit the joint model p(beta, tau, rho). The user’s guide has a lot of discussions of hierarchical/multilevel models.

You’ll have to decide on the prediction task.

1 Like

Thanks!

I might be not getting this right, but do you suggest to not specify a prior for such parameters than? is this than an empirical bayes approach? should i specify priors only if i have knowledge about them?

Stan’s GitHub wiki has Prior choice recommendations which can help answer your questions. If you can’t find the link, I can fetch it for you.

There’s no such thing as “not specifying a prior,” if one doesn’t specify a prior, they’re assuming a uniform prior. Betancourt has one case study about this, and how it can be treacherous.

However, it’s recommended one does predictive simulations, on their own, to verify the choice of the prior is sufficient. For example, it regularizes out unreasonable predictions.

A good rule of thumb is, from the Stan-wiki “any prior information can and should be included in your model” (correct me if I’m quoting this incorrectly).

2 Likes

To expand on the previous answers: I believe the recommended way to use domain knowledge to choose priors are prior predictive checks, see the Visualisation paper for some examples https://arxiv.org/abs/1709.01449 but there AFAIK isn’t a comprehensive tutorial yet.

No. We suggest using priors for everything. What I’m saying is that rather than fixing a prior through cross-validation (a form of what’s known as “empirical Bayes”), that you can jointly fit the prior parmeters and the likelihood parameters.

Exactly.

I do a lot of PPCs in my repeated binary trial case study, but that’s not the focus and they’re buried after a lot of other detail.

thanks everyone for your input!

Exactly. I find it annoying that I need to do CV to get the bound before actually doing lasso, while here we can just use a prior.