Sum-to-zero intercepts and terminology

Thanks for your comments on my reply.

As such, we must include the diffuseness due to the non-identifiability.

Hm, that’s the point I’m regarding as problematic. Of course, strictly speaking, you are right because we are talking about the non-constrained model taken as data-fitting model. But as I said above, I’m not sure if all users are aware of the fact that the diffuseness is caused by the non-identifiability and therefore of rather technical nature. I guess most users are not aware that the group-level effects could be constrained to sum to zero, thereby ensuring identifiability and avoiding the “technical” diffuseness. If they were aware, some might consider switching to a constrained model because it might be more appropriate for their analysis. These are cases were

Otherwise we’re just doing inference on the mean of the groups that we sampled.

is the objective (or one of the objectives) of the analysis. For example, consider a “random-effects meta-analysis”. In such analyses, the overall intercept as a centrality measure for only the groups (studies) that were included in the model fitting is often of interest. Prediction for a new group (study) might also be of interest, but that’s another topic (see next paragraph).

Apart from that, I think we need to make a difference between analysing the posterior and making predictions (for new groups). Could it be that you are thinking of predictions? If yes, then of course the uncertainty from the distribution of the group-level effects needs to be taken into account (additionally to the posterior uncertainty), but this is uncertainty I would differentiate from the technical diffuseness due to non-identifiability.

Again apart from that, I’m sorry because the wording in my comment above was a bit imprecise. I should have said “are needed” instead of “are of interest”. I’m also thinking of situations like calling brms::conditional_effects() with default arguments so that the posterior draws for the overall intercept are used, but the group-level effects are not added on top. In such situations, the non-identifiability does not sum out.

When the marginal posteriors of the group-level means are of interest, we obtain these by adding the intercept to the group-level offsets, thereby “summing out” the non-identifiability.

Sorry, the wording in my comment above was incorrect. I actually meant the group-level effects (or group-level offsets, as you said), so the differences between the group-level intercepts and the overall intercept. But thinking about it again, I guess these differences are rarely of interest (or needed) on their own. At least currently, I can’t think of a situation where they would.

EDIT: Below, I have revised some of the ideas expressed here.

2 Likes