Sum to zero constraints and multi-level models - best practise/example code

aseyboldt · April 29, 2025, 10:20am

I’m not entirely sure I understood correctly what you want, if not, I’m sorry for the lengthy reply…

The way I tend to think about the zero sum constraints in hierarchical models is using a decomposition of the normal distribution:

Let’s say we have a iid vector variable (coefficients in a regression model for a categorical predictor most of the time) with a diagonal normal distribution:

y \sim N(0, \sigma^2I)

Then we can decompose this into the sample mean \bar{y} = \frac{1}{N} e^Ty, where e = (1, 1, 1...)^T and deviations from that sample mean \delta = y - \bar{y}e. We can show that \bar{y} \sim N(0, \frac{\sigma^2}{N}) and \delta \sim N(0, \sigma^2(I - \frac{1}{N}ee^T)). We can use sum_to_zero_vector to efficiently sample from \delta.

In a regression we can use this decomposition and to absorb the variance of \bar{y} into the intercept, or some other variable. Let’s say to keep it simple we have

\text{intercept} \sim N(0, 10^2)\\ y \sim N(0, \sigma^2I)\\ \text{data}_i \sim N(\text{intercept} + y_{j_i}, \sigma_l^2)

(Maybe I shouldn’t have used y as variable, but I’m not changing it now…)

We can rewrite this as

\text{intercept} \sim N(0, 10^2)\\ \bar{y} \sim N(0, \frac{\sigma^2}{N})\\ \delta \sim N(0, \sigma^2(I - \frac{1}{N}ee^T))\\ \text{data}_i \sim N(\text{intercept} + \bar{y} + \delta_{j_i}, \sigma_l^2)

This shows the overparametrization very nicely: the likelihood only depends on \text{intercept} + \bar{y}. But since both are scalar normal distributions, we can combine them into one variable:

\text{intercept_plus_y_bar} \sim N(0, 10^2 + \frac{\sigma^2}{N})\\ \delta \sim N(0, \sigma^2(I - \frac{1}{N}ee^T))\\ \text{data}_i \sim N(\text{intercept_plus_y_bar} + \delta_{j_i}, \sigma_l^2).

Or in stan something like:

parameters {
    real intercept_plus_y_bar;
    sum_to_zero_vector[N] delta;
    real<lower=0> sigma;
}
transformed parameter {
    real intercept_sigma = 10;
    real intercept_plus_y_bar_sigma = sqrt(intercept_sigma^2 + sigma^2 / N);
}
model {
    intercept_plus_y_bar ~ normal(0, intercept_plus_y_bar_sigma);
    delta ~ normal(0, sigma);
    // Likelihood based on delta and intercept_plus_y_bar
    ...
}

Edit I just noticed a problem in the generalization, which means the following isn’t correct…
In the general case \bar{y} and \delta are not independent, which means we can’t sample using this decomposition.

Edit 2 I think the decomposition

y = et + \eta

where

t \sim N(0, (e^T\Sigma^{-1}e)^{-1})) \\ \eta \sim N\left(0, \Sigma - \frac{\Sigma e e^T \Sigma}{e^T \Sigma^{-1} e}\right)

might work, but I’m not 100% sure yet…

With that view, we can generalize this to y with a multivariate normal distribution:

y \sim N(0, \Sigma)

Again, we define

\bar{y} = \frac{1}{N} e^Ty\\ \delta = y - \bar{y}e

and get

\bar{y} \sim N(0, \frac{1}{N^2}e^T\Sigma e)\\ \delta \sim N(0, (I - \frac{1}{N}ee^T)\Sigma(I-\frac{1}{N}ee^T))

Unfortunately I don’t know how we could use the sum_to_zero_vector to simplify things this time (but maybe there is a way?), so we’ll have to do it manually:

Given a cholesky factorization of \Sigma=LL^T, we can sample from \delta as follows:

Construct a Householder vector:
- u = \frac{e}{\sqrt{N}} + e_1 = (1+\sqrt{N}, 1, 1, ..., 1)/\sqrt{N}
- \beta = \frac{2}{u^Tu} (normalization factor)
Compute the Cholesky decomposition:
- \Sigma = LL^T where L is lower triangular

Sampling Procedure:

Generate z \sim N(0, I_{N-1}) (standard normal in \mathbb{R}^{N-1})
Extend z to an N-vector with a leading zero:
- \tilde{z} = (0, z_1, z_2, ..., z_{N-1})
Apply the Householder transformation:
- w = \tilde{z} - \beta \cdot u \cdot (u^T\tilde{z})
- This effectively projects onto the subspace orthogonal to e
Apply the Cholesky factor:
- \delta = Lw
To obtain the full sample y:
- Generate \eta \sim N(0, 1) (standard normal scalar)
- Compute \bar{y} = \eta \sqrt{\frac{1}{N^2}e^T \Sigma e}
- Compute y = \bar{y}e + \delta

I hope I didn’t completely miss the mark with your question :-)

Topic		Replies	Views
Defining a zero-sum constraint Modeling	4	4097	May 30, 2019
Sum-to-zero and sbc Modeling	0	487	March 1, 2022
Adding a soft sum-to-zero constraint to to a brms model brms rstan , specification , brms	1	968	February 10, 2022
Updates to case study on spatial models for areal data - Poisson CAR/IAR BYM and BYM2 Modeling	0	886	January 23, 2018
New Stan data type: zero_sum_vector Developers specification	23	3719	April 26, 2022

Sum to zero constraints and multi-level models - best practise/example code

Sampling Procedure:

Related topics