Sum-to-zero intercepts and terminology

If its the intercept-prior you’re talking about then no, this prior is not the only thing identifying it. The random effects assumption that the group means come from a common distribution also does. This gaussian distribution identifies the model, because it doesnt let the random intercepts wander away infinitely.

With “prior”, I meant the joint prior for all parameters, not a marginal one. And I regarded the Gaussian distribution of the group-level effects as part of the prior. But of course, it may also be seen as part of the likelihood. In hierarchical models, drawing the line between prior and likelihood can be done in multiple ways. Sorry that I did not make this clear.

I my view the wideness of the intercept estimate is desireable.

It is desirable if the non-constrained model is really the model that a user wants to fit. The question is the latter one. That’s why I wanted to point out that in my opinion, users should be informed more often about the non-identifiability and its consequences. There are situations where the sum-to-zero constraint is preferable.

Rather the way I see it is that, first of all. Its not non-identified, but actually a completely healthy model, and second, this diffuseness is not caused by a technical problem but accurately reflects the statistical limitations of the underlying estimation problem.

Well, this might just be a question of terminology now. What you are calling “the statistical limitations of the underlying estimation problem” is what I’m calling “non-identifiability”. The Stan and R code I posted above illustrates pretty well what I’m meaning with “non-identifiability” and the diffuseness due to it.
Btw, in my original comment above, I said “a rather technical”, not just “a technical”, to avoid ambiguities with convergence problems. At some later occurrences, I left “rather” out, simply because of laziness. So perhaps we had a misunderstanding here.

1 Like