Sum-to-zero intercepts and terminology

If its the intercept-prior you’re talking about then no, this prior is not the only thing identifying it. The random effects assumption that the group means come from a common distribution also does. This gaussian distribution identifies the model, because it doesnt let the random intercepts wander away infinitely.

I’m basically going to repeat in my own words what was already said above, but I think it’s worth repeating as I suspect it is often misunderstood.

I’m with jsocolar on this point. I my view the wideness of the intercept estimate is desireable. I think trying to estimating this intercept with few groups is like trying to estimate the mean of some distribution from too small a sample size (e.g. 4). It’s just not a good idea.

An analogous problem is when you have a simple autoregressive model (e.g. an ar1 model). Here the same effect (huge variance of the intercept and other parameters) occurs when the autocorrelation is strong (ar1 parameter close to 1). It simply means that you need a longer timeseries to be able to estimate these parameters properly, as the autocorrelation reduces the effective sample size.

So in my view (in both of these model types) one shouldn’t think of the diffuseness of the intercept to be caused by a non-idendifieability, a technical problem. Rather the way I see it is that, first of all. Its not non-identified, but actually a completely healthy model, and second, this diffuseness is not caused by a technical problem but accurately reflects the statistical limitations of the underlying estimation problem.

Another, maybe less confusing, way to say this is that the difference between the sum-to-zero model and the other one is that the sum-to-zero model estimates the sample mean of the sample of groups we have, while the other model estimates the mean of the distribution that this sample was taken from. This can easily be confused in conversations as in the english language both of these might be called the “mean of the groups”.

1 Like