I am trying to implement approximate LFO-CV as described in the vignette here https://mc-stan.org/loo/articles/loo2-lfo.html. Having read the vignette, I implemented approximate LFO-CV without any refitting hoping that the Pareto k estimate would all be tolerable (below 0.7-ish). This was attractive to me because the models I want to use LFO-CV to evaluate take quite a bit of time to re-run, so I was hoping I could get away with no refitting if Pareto k’s were all reasonably small. Unfortunately, when I tried this, to my surprise, the Pareto k’s were often very bad (>1 or >>1). This surprised me because I had already used loo() to estimate LOOIC (which the vignette and accompanying paper notes, isn’t really appropriate for comparing model performance for forecasting purposes), and when I fit LOOIC, all the Pareto k’s were good. I’m probably missing something obvious, but is there a reason you’d expect Pareto k estimates to differ for LOO vs LFO? Thanks for any help you can offer! @paul.buerkner
What was the model? What was the size is of the data for the initial fit? Can you post a figures showing khat values per time step for LOO and LFO?
The model is a state-space fish population dynamics model where the process model residuals follow either: 1) an MA1 process, 2) an AR1 process, or 3) an AR1MA1 process. We’d like to use LFO-CV as a means to compare the timeseries structure for the residuals.
Full details of the model are here:
The timeseries is quite short (43 yrs).
Here is a comparison of the k-hat plots:
- from calling pareto_k_values after passing the point-wise log posterior predictive probabilities matrix to loo()
- from calling pareto_k_values within the loop where approximate LFO is calculated here: https://mc-stan.org/loo/articles/loo2-lfo.html with re-fitting turned off (so using only the original model fit to all the data):
Last note: The model technically has multiple likelihoods. I have calculated LOOIC and approximate LFO-CV using log point-wise posterior predictive probability densities for a single likelihood contribution (population abundance data) which is shown above because we are most interested in using this model for one-step-ahead forecasts of abundance.
However, I also tried to wanted to look at LOOIC and LFO-CV for the whole model fit. With two different likelihood contributions (age, abundance) in each year, I wasn’t sure how to do this, so I summed the log point-wise posterior posterior predictive density across the two data types by posterior draw and year and then made plots as above, finding similar results…I wasn’t sure if that was appropriate though, and we are less interested in predicting age composition. Have you used LFO-CV in contexts where there are multiple data types and if so did I handle this in a reasonable way or a crazy wrong way?
Thanks again for your help!!!
These PSIS-LOO khats are already so high that PSIS-LFO will certainly have problems, too.
It’s likely that you would get better results starting with a model fitted to less than full data as in the vignette.
This is correct.
No. Theory is clear, but removing more data is always more likely to change the posterior more which makes importance sampling more difficult.
Thanks, this is super helpful. I went ahead and completed the re-fits for every data point and the k-hats for the PSIS-LFO are now similar to the PSIS-LOO, which means some are too high. Looking like there isn’t going to be a super easy way to get around re-fits with this model.