Sum to zero constraints and multi-level models - best practise/example code

I’m not entirely sure I understood correctly what you want, if not, I’m sorry for the lengthy reply…

The way I tend to think about the zero sum constraints in hierarchical models is using a decomposition of the normal distribution:

Let’s say we have a iid vector variable (coefficients in a regression model for a categorical predictor most of the time) with a diagonal normal distribution:

y \sim N(0, \sigma^2I)

Then we can decompose this into the sample mean \bar{y} = \frac{1}{N} e^Ty, where e = (1, 1, 1...)^T and deviations from that sample mean \delta = y - \bar{y}e. We can show that \bar{y} \sim N(0, \frac{\sigma^2}{N}) and \delta \sim N(0, \sigma^2(I - \frac{1}{N}ee^T)). We can use sum_to_zero_vector to efficiently sample from \delta.

In a regression we can use this decomposition and to absorb the variance of \bar{y} into the intercept, or some other variable. Let’s say to keep it simple we have

\text{intercept} \sim N(0, 10^2)\\ y \sim N(0, \sigma^2I)\\ \text{data}_i \sim N(\text{intercept} + y_{j_i}, \sigma_l^2)

(Maybe I shouldn’t have used y as variable, but I’m not changing it now…)

We can rewrite this as

\text{intercept} \sim N(0, 10^2)\\ \bar{y} \sim N(0, \frac{\sigma^2}{N})\\ \delta \sim N(0, \sigma^2(I - \frac{1}{N}ee^T))\\ \text{data}_i \sim N(\text{intercept} + \bar{y} + \delta_{j_i}, \sigma_l^2)

This shows the overparametrization very nicely: the likelihood only depends on \text{intercept} + \bar{y}. But since both are scalar normal distributions, we can combine them into one variable:

\text{intercept_plus_y_bar} \sim N(0, 10^2 + \frac{\sigma^2}{N})\\ \delta \sim N(0, \sigma^2(I - \frac{1}{N}ee^T))\\ \text{data}_i \sim N(\text{intercept_plus_y_bar} + \delta_{j_i}, \sigma_l^2).

Or in stan something like:

parameters {
    real intercept_plus_y_bar;
    sum_to_zero_vector[N] delta;
    real<lower=0> sigma;
}
transformed parameter {
    real intercept_sigma = 10;
    real intercept_plus_y_bar_sigma = sqrt(intercept_sigma^2 + sigma^2 / N);
}
model {
    intercept_plus_y_bar ~ normal(0, intercept_plus_y_bar_sigma);
    delta ~ normal(0, sigma);
    // Likelihood based on delta and intercept_plus_y_bar
    ...
}

Edit I just noticed a problem in the generalization, which means the following isn’t correct…
In the general case \bar{y} and \delta are not independent, which means we can’t sample using this decomposition.

Edit 2 I think the decomposition

y = et + \eta

where

t \sim N(0, (e^T\Sigma^{-1}e)^{-1})) \\ \eta \sim N\left(0, \Sigma - \frac{\Sigma e e^T \Sigma}{e^T \Sigma^{-1} e}\right)

might work, but I’m not 100% sure yet…


With that view, we can generalize this to y with a multivariate normal distribution:

y \sim N(0, \Sigma)

Again, we define

\bar{y} = \frac{1}{N} e^Ty\\ \delta = y - \bar{y}e

and get

\bar{y} \sim N(0, \frac{1}{N^2}e^T\Sigma e)\\ \delta \sim N(0, (I - \frac{1}{N}ee^T)\Sigma(I-\frac{1}{N}ee^T))

Unfortunately I don’t know how we could use the sum_to_zero_vector to simplify things this time (but maybe there is a way?), so we’ll have to do it manually:

Given a cholesky factorization of \Sigma=LL^T, we can sample from \delta as follows:

  1. Construct a Householder vector:

    • u = \frac{e}{\sqrt{N}} + e_1 = (1+\sqrt{N}, 1, 1, ..., 1)/\sqrt{N}
    • \beta = \frac{2}{u^Tu} (normalization factor)
  2. Compute the Cholesky decomposition:

    • \Sigma = LL^T where L is lower triangular

Sampling Procedure:

  1. Generate z \sim N(0, I_{N-1}) (standard normal in \mathbb{R}^{N-1})

  2. Extend z to an N-vector with a leading zero:

    • \tilde{z} = (0, z_1, z_2, ..., z_{N-1})
  3. Apply the Householder transformation:

    • w = \tilde{z} - \beta \cdot u \cdot (u^T\tilde{z})
    • This effectively projects onto the subspace orthogonal to e
  4. Apply the Cholesky factor:

    • \delta = Lw
  5. To obtain the full sample y:

    • Generate \eta \sim N(0, 1) (standard normal scalar)
    • Compute \bar{y} = \eta \sqrt{\frac{1}{N^2}e^T \Sigma e}
    • Compute y = \bar{y}e + \delta

I hope I didn’t completely miss the mark with your question :-)