Waic or Model Comparison for a big (hierarhical) model: memory efficient methods?

Ilan_Strauss · July 3, 2019, 4:12pm

Hi,

Is there a way to add_criterion in brms() using a memory efficient option, something comparable to the loo.function method. You see I have a large number of data points, around 300,000, and quite a few parameters (around 700). The model contains firms in countries across time, with random effects (at the country:year level) as well as fixed effects.

Alternatively, do you think it is `fine’ for model comparison purposes to either:

Use a sub-sample of the fitted model and calculate the WAIC on that; Or
Divide the fitted model into 3 sub-periods, say 1994-2001; 2002-2007; 2008-2017. And then look at the WAIC for each time period and compare the WAIC for these time periods across different models.

I know K-Fold is being recommended for hierarchical model comparisons but given the size of this model I am not sure if computationally feasible(?)

Many thanks,

jades · July 4, 2019, 6:17pm

Hi Ilan,

First, I’m not sure of the answer to your “add_criterion” question, but there are a couple of questions I have that you might want to think about.

700 parameters is a lot. Given that you could estimate these parameters as random effects, and use far less degrees of freedom, is there a particular reason you’re running these parameters as fixed?

Regarding time, it depends on your data. If your data points are divided evenly over the 25 year period, and you’re making it categorical (with 3 periods), you’re losing good data. Why not run time as a fixed-effect with a random slope of time? (It’s not clear what your variable is within firm; is that continuous or categorical; is that your dv or iv?)

If you’re looking to do k-fold, I haven’t found a way to do a multiple time-series outside of brms, but I also don’t know how to do it in brms (I’m now learning brms).

avehtari · July 4, 2019, 6:33pm

loo and brms will soon support sub sampling for LOO as described in Bayesian leave-one-out cross-validation for large data (we don’t recommend waic due to the lack of good diagnostic for reliability)

Ilan_Strauss · July 8, 2019, 4:36pm

Thanks for this @avehtari. I see in the ‘supplementary pdf’ the following note:

The functions are implemented based upon the loo package structure as the functions quickloo(),approx_psis()and psis_approximate_posterior(). An example how to run the code can be found in the documentation for quickloo(). No changes to author lists, versions ordate has been changed to preserve anonymity. If accepted, the code will be published open source.

I could not find the documentation though. I went to the `code’ section of the paper and it took me to the Loo homepage. I couldn’t find anything further on this there. Also tried reinstalling loo() package from github. Are you please able to share the provisional functions? I am using brms() objects. Many thanks.

avehtari · July 8, 2019, 10:54pm

I wrote “loo and brms will soon support sub sampling for LOO”. There will be documentation in loo after that support has been added. I will post here when there is more information available.

Ilan_Strauss · July 9, 2019, 5:39pm

Yes - it was more wishful thinking on my behalf(!) since I hope to use this for my dissertation (due soon). Thank you for the update and looking forward to hearing further.

paul.buerkner · July 9, 2019, 7:51pm

There is a brench supporting this, but it is just a beta version and has not been merged yet. See https://github.com/stan-dev/loo/pull/113. Today I opened a branch of brms to support subsampling loo, but this is just a very early version. See https://github.com/paul-buerkner/brms/tree/ssloo

Ilan_Strauss · July 9, 2019, 10:41pm

Exciting - thank you, Paul!

Topic		Replies	Views
Model comparison, Log-Likelihood and WAIC Modeling	1	1122	December 1, 2019
How to speed up `brms::loo_subsample()` for large models brms loo , hierarchical-model , model-comparison	16	1557	October 25, 2022
Model comparison- big data Modeling loo	5	821	July 24, 2018
Questions About Approximate Leave-One-Out Validation for Large Models Modeling techniques , performance , loo , brms	14	2082	October 21, 2022
Spatial model selection with loo brms loo	21	2472	August 15, 2022

Waic or Model Comparison for a big (hierarhical) model: memory efficient methods?

Related topics