Does using the cumulative family always lead to slow sampling?

abartonicek · January 29, 2020, 12:47am

Hi, I’m trying to model an ordinal response variable with several predictors, some of which are ordinal as well. Ideally, I would like to fit the model with random effects as the data is micro-longitudinal, and the response is a psychological scale (1-7 likert scale) measured across different items and participants over time. I would also like impose shrinkage on the predictor slopes. So, to my understanding, my end-goal is a relatively complex model and I am aware that it may take quite a while to fit.

I first tried fitting the model with the default gaussian() family, and I’m able to get the model to converge in very reasonable time (1-2 hrs or so, with the default number of chains and iterations). However, when I try to fit even a very basic model with just only few fixed effects and no random effects using the cumulative() family, the sampling time goes through the roof (i.e. the cumulative model with fixed effects only will take much longer than gaussian model with random effects).

My question is, is this cost to using the cumulative() family unavoidable or am I parametrizing the model wrong? I’ve tried imposing relatively informed priors, e.g. setting the prior on intercepts to be centered on the data median with smaller sd, however this did not help much.

Here’s how I would like to fit the “idealized” model, i.e. the one including the random effects. If you want to imagine what I actually tried to fit, drop the random effects & substitute only one or two predictor variables for the “.” .


mod <- brm(dfs_value ~ . - ID - dfs_item + (. - ID - dfs_item | ID) + (1 | dfs_item),
           data = data_daily_proc,
           family = cumulative(threshold = 'flexible'),
           prior = c(prior(normal(0, 0.5), class = 'b'),
                     prior(normal(5, 2.5), class = 'Intercept')),
           cores = 4)

Thank you very much in advance, and let me know if I should supply some more detail.

bbbales2 · February 4, 2020, 9:17pm

Sorry for the delay answering this.

If the issue is one with parameterization, I’d expect that the fit would have high treedepths (10 is the default max), divergences, or really low effective sample sizes.

I guess the question then is a treedepth is high relative to what? This I don’t know, but if a model works with like a treedepth of 4-6 I’d be comfortable saying it’s a good model.

If the issue was just computation then I’d expect none of these things to be a problem.

It seems weird that you’d include . - ID - dfs_item as a fixed effect and then also as coefficients in the hierarchical model. It seems like there’d be an identifiability problem between the coefficients of . - ID - dfs_item and the group level means of the coefficients of (. - ID - dfs_item | ID).

Also what does the “.” syntax mean?

abartonicek · February 5, 2020, 3:51am

Thank you so much for replying!

So you do think it’s about parametrization? Or do cumulative models just take long to fit? The model finished fitting after ~3 days, and it did have some divergent transitions and low ESS’s.

PS: I used the dot (.) in the formula syntax to mean “all other predictor”. That’s why I substract ID and dfs_item, because I only want to fit random effects across those variables, and don’t want to fit them as fixed effects.

sakrejda · February 5, 2020, 1:11pm

Hi, your formula is probably the source of your troubles.

. - ID - dfs_item + (. - ID - dfs_item | ID) + (1 | dfs_item)

Whatever is in . - ID - dfs_item will be mostly non-identifiable w.r.t. its random-effect version, if the algorithm is good enough the random effect version should shrink towards zero (but then the variance will collapse, and that causes problems too). My guess is that there is a shared intercept of some sort and then you should include only the random-effect version of the terms while leaving out the fixed effects.

bbbales2 · February 5, 2020, 4:08pm

I thought this too but I think I want to roll it back.

Like the model:

y ~ 1 + (1 | group)

Makes sense in brms at least because the (1 | group) terms are random effects away from zero. So like the term on the outside handles the mean, or whatnot.

I don’t know if it’s parameterization, but to figure this out I guess we gotta look at the posterior.

I guess my first question is what was the average treedepth? I looked in the brms manual and didn’t see how to do that. Have a look on your own, but if you can’t find anything message back. It should be possible to do this somehow I just don’t know offhand.

If we wanted a clean comparison of cumulative vs. regular Stan model speed, you could just write a simple model with simulated data and do the fits. This might be a good idea to get a sense of what the cost of doing the cumulative thing is.

sakrejda · February 5, 2020, 6:04pm

You’re right, the - syntax in the formula turned me around. Sorry @abartonicek! Sounds like you might have something more interesting going on but it’s hard to say anything useful without seeing what the diagnostics say (stepsizes/acceptance/treedepth). When a basic model (the version with fixed effects only) runs much longer it’s often just that something is still not identified.

Topic		Replies	Views
Question about family = cumulative in brms Modeling ordinal-response , priors , brms	5	1639	April 4, 2023
Ordinal mixed effects regression: low ESS on intercept sd + very slow sampling Modeling ordinal-response , mixed-model	9	1035	April 6, 2020
Student-family random intercepts model fails to converge while gaussian model does brms fitting-issues , specification	4	710	May 3, 2020
Sampling problems in ordinal model Modeling	7	737	March 11, 2020
Very slow ordinal model with unequal variances and category specific effects brms	1	486	July 27, 2020

Does using the cumulative family always lead to slow sampling?

Related topics