I’m not entirely sure I understood correctly what you want, if not, I’m sorry for the lengthy reply…
The way I tend to think about the zero sum constraints in hierarchical models is using a decomposition of the normal distribution:
Let’s say we have a iid vector variable (coefficients in a regression model for a categorical predictor most of the time) with a diagonal normal distribution:
y \sim N(0, \sigma^2I)
Then we can decompose this into the sample mean \bar{y} = \frac{1}{N} e^Ty, where e = (1, 1, 1...)^T and deviations from that sample mean \delta = y - \bar{y}e. We can show that \bar{y} \sim N(0, \frac{\sigma^2}{N}) and \delta \sim N(0, \sigma^2(I - \frac{1}{N}ee^T)). We can use sum_to_zero_vector
to efficiently sample from \delta.
In a regression we can use this decomposition and to absorb the variance of \bar{y} into the intercept, or some other variable. Let’s say to keep it simple we have
\text{intercept} \sim N(0, 10^2)\\
y \sim N(0, \sigma^2I)\\
\text{data}_i \sim N(\text{intercept} + y_{j_i}, \sigma_l^2)
(Maybe I shouldn’t have used y as variable, but I’m not changing it now…)
We can rewrite this as
\text{intercept} \sim N(0, 10^2)\\
\bar{y} \sim N(0, \frac{\sigma^2}{N})\\
\delta \sim N(0, \sigma^2(I - \frac{1}{N}ee^T))\\
\text{data}_i \sim N(\text{intercept} + \bar{y} + \delta_{j_i}, \sigma_l^2)
This shows the overparametrization very nicely: the likelihood only depends on \text{intercept} + \bar{y}. But since both are scalar normal distributions, we can combine them into one variable:
\text{intercept_plus_y_bar} \sim N(0, 10^2 + \frac{\sigma^2}{N})\\
\delta \sim N(0, \sigma^2(I - \frac{1}{N}ee^T))\\
\text{data}_i \sim N(\text{intercept_plus_y_bar} + \delta_{j_i}, \sigma_l^2).
Or in stan something like:
parameters {
real intercept_plus_y_bar;
sum_to_zero_vector[N] delta;
real<lower=0> sigma;
}
transformed parameter {
real intercept_sigma = 10;
real intercept_plus_y_bar_sigma = sqrt(intercept_sigma^2 + sigma^2 / N);
}
model {
intercept_plus_y_bar ~ normal(0, intercept_plus_y_bar_sigma);
delta ~ normal(0, sigma);
// Likelihood based on delta and intercept_plus_y_bar
...
}
Edit I just noticed a problem in the generalization, which means the following isn’t correct…
In the general case \bar{y} and \delta are not independent, which means we can’t sample using this decomposition.
Edit 2 I think the decomposition
y = et + \eta
where
t \sim N(0, (e^T\Sigma^{-1}e)^{-1})) \\
\eta \sim N\left(0, \Sigma - \frac{\Sigma e e^T \Sigma}{e^T \Sigma^{-1} e}\right)
might work, but I’m not 100% sure yet…
With that view, we can generalize this to y with a multivariate normal distribution:
y \sim N(0, \Sigma)
Again, we define
\bar{y} = \frac{1}{N} e^Ty\\
\delta = y - \bar{y}e
and get
\bar{y} \sim N(0, \frac{1}{N^2}e^T\Sigma e)\\
\delta \sim N(0, (I - \frac{1}{N}ee^T)\Sigma(I-\frac{1}{N}ee^T))
Unfortunately I don’t know how we could use the sum_to_zero_vector
to simplify things this time (but maybe there is a way?), so we’ll have to do it manually:
Given a cholesky factorization of \Sigma=LL^T, we can sample from \delta as follows:
-
Construct a Householder vector:
- u = \frac{e}{\sqrt{N}} + e_1 = (1+\sqrt{N}, 1, 1, ..., 1)/\sqrt{N}
- \beta = \frac{2}{u^Tu} (normalization factor)
-
Compute the Cholesky decomposition:
- \Sigma = LL^T where L is lower triangular
Sampling Procedure:
-
Generate z \sim N(0, I_{N-1}) (standard normal in \mathbb{R}^{N-1})
-
Extend z to an N-vector with a leading zero:
- \tilde{z} = (0, z_1, z_2, ..., z_{N-1})
-
Apply the Householder transformation:
- w = \tilde{z} - \beta \cdot u \cdot (u^T\tilde{z})
- This effectively projects onto the subspace orthogonal to e
-
Apply the Cholesky factor:
-
To obtain the full sample y:
- Generate \eta \sim N(0, 1) (standard normal scalar)
- Compute \bar{y} = \eta \sqrt{\frac{1}{N^2}e^T \Sigma e}
- Compute y = \bar{y}e + \delta
I hope I didn’t completely miss the mark with your question :-)