I’m not entirely sure I understood correctly what you want, if not, I’m sorry for the lengthy reply…
The way I tend to think about the zero sum constraints in hierarchical models is using a decomposition of the normal distribution:
Let’s say we have a iid vector variable (coefficients in a regression model for a categorical predictor most of the time) with a diagonal normal distribution:
Then we can decompose this into the sample mean \bar{y} = \frac{1}{N} e^Ty, where e = (1, 1, 1...)^T and deviations from that sample mean \delta = y - \bar{y}e. We can show that \bar{y} \sim N(0, \frac{\sigma^2}{N}) and \delta \sim N(0, \sigma^2(I - \frac{1}{N}ee^T)). We can use sum_to_zero_vector
to efficiently sample from \delta.
In a regression we can use this decomposition and to absorb the variance of \bar{y} into the intercept, or some other variable. Let’s say to keep it simple we have
(Maybe I shouldn’t have used y as variable, but I’m not changing it now…)
We can rewrite this as
This shows the overparametrization very nicely: the likelihood only depends on \text{intercept} + \bar{y}. But since both are scalar normal distributions, we can combine them into one variable:
Or in stan something like:
parameters {
real intercept_plus_y_bar;
sum_to_zero_vector[N] delta;
real<lower=0> sigma;
}
transformed parameter {
real intercept_sigma = 10;
real intercept_plus_y_bar_sigma = sqrt(intercept_sigma^2 + sigma^2 / N);
}
model {
intercept_plus_y_bar ~ normal(0, intercept_plus_y_bar_sigma);
delta ~ normal(0, sigma);
// Likelihood based on delta and intercept_plus_y_bar
...
}
Edit I just noticed a problem in the generalization, which means the following isn’t correct…
In the general case \bar{y} and \delta are not independent, which means we can’t sample using this decomposition.
Edit 2 I think the decomposition
where
might work, but I’m not 100% sure yet…
With that view, we can generalize this to y with a multivariate normal distribution:
Again, we define
and get
Unfortunately I don’t know how we could use the sum_to_zero_vector
to simplify things this time (but maybe there is a way?), so we’ll have to do it manually:
Given a cholesky factorization of \Sigma=LL^T, we can sample from \delta as follows:
-
Construct a Householder vector:
- u = \frac{e}{\sqrt{N}} + e_1 = (1+\sqrt{N}, 1, 1, ..., 1)/\sqrt{N}
- \beta = \frac{2}{u^Tu} (normalization factor)
-
Compute the Cholesky decomposition:
- \Sigma = LL^T where L is lower triangular
Sampling Procedure:
-
Generate z \sim N(0, I_{N-1}) (standard normal in \mathbb{R}^{N-1})
-
Extend z to an N-vector with a leading zero:
- \tilde{z} = (0, z_1, z_2, ..., z_{N-1})
-
Apply the Householder transformation:
- w = \tilde{z} - \beta \cdot u \cdot (u^T\tilde{z})
- This effectively projects onto the subspace orthogonal to e
-
Apply the Cholesky factor:
- \delta = Lw
-
To obtain the full sample y:
- Generate \eta \sim N(0, 1) (standard normal scalar)
- Compute \bar{y} = \eta \sqrt{\frac{1}{N^2}e^T \Sigma e}
- Compute y = \bar{y}e + \delta
I hope I didn’t completely miss the mark with your question :-)