The choice will depend on your goals and what you’d like to assume; there isn’t a right or wrong choice here. The key difference is that the first formula uses a shared/hierarchical prior for the group effects, so you will observe shrinkage towards the mean for small groups. I’d recommend estimating both, just so you can see what’s going on, then make a choice based on what your aims are for modelling the data.
Ordinal models often omit the intercept because the thresholds/cutoffs serve a similar purpose. Depending on the default priors that brms uses, this may or may not matter.
“Should” is a strong word, here. In the abstract, both methods can be fine. The y ~ 1 + (1 | fct) method will impose multilevel partial pooling, which can make for more conservative and reproducible estimates. However, that tends to work better with more factor levels, say 10+. You can technically do it with 3, but it’ll likely be rough on the sampler and you might have to fiddle with control settings like adapt_delta.
The y ~ 1 + fct method is perfectly fine in many contexts, particularly with so few levels. It’s also probably conceptually simpler and more familiar to many.
As @simonbrauer said, “the choice will depend on your goals and what you’d like to assume.”
The key difference is that the first formula uses a shared/hierarchical prior for the group effects, so you will observe shrinkage towards the mean for small groups.
Is this also (roughly) the difference between fixed and random effects that frequentist-based statistics resources often mention?
I’d recommend estimating both, just so you can see what’s going on
For this particular case, both models yield almost the same posterior conditional effects with the default brms parameters (Rhat=1 for all parameters after fit). Indeed, I do not have expectations for why there should be any commonality between the data for different values of fct in my particular case. So, it seems that y ~ 1 + x is better suited for my use case.
“Should” is a strong word, here. In the abstract, both methods can be fine.
I see. So, if I understand correctly, for the particular case of chapter 19, the motivation to use y ~ 1 + (1 | fct) is that the goal there was to induce a hierarchical structure following Kruschke’s book/chapter. But without the requirement of being consistent with Kruschke’s book/chapter, one could indeed go for y ~ 1 + fct.
Yes, that’s a good summary. y ~ 1 + (1 | fct) can be a good model, but you are not necessarily beholden to Kruschke’s preferences (unless he’s your boss, then you are).
IDK where you are in your Bayesian modeling development, so please take this as a friendly comment. IMO, a simple model executed well can be preferable to a more sophisticated model executed poorly. If you feel you have a better grasp on y ~ 1 + fct, then maybe just do that. If you’re a multilevel jock, consider fitting that y ~ 1 + (1 | fct).
Haha, and even then one can try convincing their bosses otherwise – or at least discuss :)
a simple model executed well can be preferable to a more sophisticated model executed poorly
Thank you both for the response and the tip! Besides being simpler, y ~ 1 + fct does seem the better choice for my particular use case. I’ll probably make it a point to mention the formula used to fit the models in my presentations or reports. Hopefully, that makes it easier for another viewer or reader to point out if I should be doing something else given the broader context of that particular project!