I tried out the simple sum_to_zero_vector on \theta (including the singletons in the constraint, just by simple edit of your model here) and the st0 model gives basically the same estimates but has ever so slightly worse (very small difference) ESS per second.
Some ideas below â Iâve been thinking about this a bit (maybe wrongly).
I think it is odd that in the multi-comp BYM2 we sum-to-zero by component the spatial effects, but not the unstructured effects. I understand the former is necessary for identifiability, but it does change the qualitative interpretation of the parameters in a way that, in my mind, is not fully reconciled with the single-component case.
The motivation for the sum to 0 on \theta for me would be to recover the same interpretation of the parameters as you would have in the single-component case. So in the single component case I can say something like :
\theta^\star = \theta \times \sqrt{1-\rho} \times \sigma, is the unstructured area-level departure from the global (component) mean;
\phi^\star = \phi \times \sqrt\rho \times \sigma is the spatially structured area-level departure from the global (component) mean.
In multi-component BYM2, I retain the interpretation around \phi â I impose a sum to 0 constraint by component, such that each \phi is interpretable as the within-component-unitâs spatially-induced departure from the component-level mean.
But \theta interpretation becomes strange â it shares the variance across sub-national units over the full graph, so there is a soft centering effect â but this centering is global, not component-wise, and hence the original âunstructured deviations from the component meanâ interpretation is lost. It is also almost never actually summing to 0 in practice based on my experience.
Here I initially thought the easy fix would be to st0 by component also the \theta, but that would basically remove the component-level effect from the model â because you can think of decomposing \theta_{a[j]} = \alpha_j + \delta_{a[j]}, and if I force \sum_{a \in C_j} \theta_{a} = 0, then thatâs the same as setting \alpha_j = 0.
So then if you force \theta to st0 you have to add \alpha_j back in as a component-wise random effect â but then then question comes: do I do that inside the convolution or outside ?
It is actually somewhat strange that the component-level effect \alpha_j is implicitly included in the \theta of BYM2, because the whole point of that model is that we are trying to identify effects (structured and unstructured) which compete in explaining the share of the variance. But the cross-components variance, which is implicitly rolled into the variance of \theta, is not explainable by a spatial effect by definition, because the components are disconnected. So this looks like itâs, in some sense, artificially inflating the share of the unstructured variance relative to structured.
It would then make sense to keep \alpha_j outside the convolution. This changes the interpretation of \rho from % of area-level variance which is spatial to % of the within-component-level variance which is spatial, but it should in principle work I think.
I have tried a version of the model I describe, but it doesnât work very well in that it is very slow and has poor mixing. I guess I got carried away with the âsum_to_zero just worksâ idea, but Iâm still not super clear why we would use it for a random intercept in a multilevel model, and we would not in this case.