I’m still working on my minimal reproducible example (in between a day of zoom meetings) but I thought I could save my time and ask:
are there any known issues when running loo on a model that was run with ‘sample_priors = “only”’ ?
in short: we are testing some theories that attempt to give estimates for some of the parameters in a model. I.e, rather than fitting the model to the data, I am setting up the model using priors based of the theory, and would like to test how well that accounts for the data. We have two different theories that make competing claims for what the model parameter should be.
I was hoping to run the model with sample_priors = “only” twice (once for each theory) and then use loo and model_weights to compare how well them models do in predicting the data.
I can sample from the model’s fine, and the predicitons line up nicely with the data. But when I try to run loo, I get:
Error in validate_ll(log_ratios) : All input values must be finite
Does anybody have any insight as to what might be causing this?
[I’ll try and get a reproducable example coded up later today!]
If you sample only from prior, then there is no data to leave out in leave-one-out cross-validation. What you try to do is not logically valid.
Try computing prior predictive joint density that is also known as marginal likelihood with bridgesampling package. In that case you would also sample conditioning on data.
Thank you for your reply. It sounds like I am misunderstanding something, as I thought that the data is still included in the model object (under my_model$data)
I thought the model objects returned from brm are the same, whether I run sample_priors = “only”, not not.The only difference being that one only has samples from the prior, the other has samples from the estimated posterior. If they both contain the dataset, why can’t I run loo on both?
Clearly, my understanding is quite fuzzy in some places! And, I ended up in zoom meetings literally all day today, so I still haven’t done a simple reproducable example of what I’m after. But I hope my question makes sense!
Thank you again for your help. I will look into bridgesampling.
See videos on cross-validation and PSIS-LOO which is used by loo package. PSIS-LOO requires the full data posterior.
The dataset may be in the fit object, but if you are sampling from the prior it’s not used for the posterior. leave-one-out makes only sense if you have used some data to update the posterior. When you are sampling only from the prior you are already doing leave-all-data out inference, and maybe what you need is Holdout validation and K-fold cross-validation of Stan programs with the loo package and just replace holdout validation with leave all out validation (ie prior predictive performance). Difference to using bridgesampling is how easy it is do pointwise or joint predictions.