Correlated posterior - Sum to zero constraint for varying intercepts?!

Referring to

Martin’s post

Let’s say we have K groups, then the mean for group i is
\mu_i = \alpha + \beta_i where

\alpha \sim N(\mu_\alpha, \sigma_\alpha) \\ \beta_i \sim N(0, \sigma_\beta)

and to Li-Ping Liu’s paper:

Suppose x \sim N(\mu_x, \Sigma_x) and y = Ax + b, where b \sim N(0, \Sigma_b).
The mean \mu_y and variance \Sigma_y are:

\mu_y = E[y] = E[Ax + b] = AE[x] + E[b] = A\mu_x
\Sigma_y = Var[Ax + b] = Var[Ax] + Var[b] = A\Sigma_x A^{T} + \Sigma_b

Now, let matrix A = qr_Q(X) * scale, where X is our sum-to-zero approach, see in Thread above with diagonal(A * A^T) = 1.

And \Sigma_x = I*\sigma_b^{2}, eg. is a Diagonalmatrix with \Sigma_b elements.
And \Sigma_b = I * \sigma_a analog.

Now we estimate

A\mu_x \sim MVN(0, A\Sigma_x A^{T} + \Sigma_b)

Since \Sigma_x and \Sigma_b are Diagonalmatrices we can write:

A\mu_x \sim N(0, \sigma_a^2 + \sigma_b^2)

(Note that, diagonal(AA^T) scaled to 1.)

functions {
  vector Q_sum_to_zero_QR(int N) {
    vector [2*N] Q_r;

    for(i in 1:N) {
      Q_r[i] = -sqrt((N-i)/(N-i+1.0));
      Q_r[i+N] = inv_sqrt((N-i) * (N-i+1));
    }
    //Q_r = Q_r * inv_sqrt(1 - inv(N));
    return Q_r;
  }
  vector sum_to_zero_QR(vector x_raw, vector Q_r) {
    int N = num_elements(x_raw) + 1;
    vector [N] x;
    real x_aux = 0;

    for(i in 1:N-1){
      x[i] = x_aux + x_raw[i] * Q_r[i];
      x_aux = x_aux + x_raw[i] * Q_r[i+N];
    }
    x[N] = x_aux;
    return x;
  }
}
data {
  int<lower=0> N;
  int<lower=1> N_groups;
  int<lower=1,upper=N_groups> groups[N];
  int<lower=0> y[N];
}
transformed data {
  vector[2 * N_groups] groups_Q_r = Q_sum_to_zero_QR(N_groups);
  real mu_alpha = 3;
  real sigma_alpha = 1;
  real sd_scale_to_one = sqrt(1 - inv(N));  // scale factor for QR trans. having unit variance
}
parameters {
  real intercept_sweep;
  vector[N_groups-1] group_r_raw; 
  real<lower=0> group_sd;
}
transformed parameters {
  vector[N_groups] group_r_fit = sum_to_zero_QR(group_r_raw, groups_Q_r) + intercept_sweep;
}
model {
  target += poisson_log_lpmf(y |  group_r_fit[groups]);
  group_sd ~ normal(0, 1);
  sum_to_zero_QR(group_r_raw, groups_Q_r) * sd_scale_to_one + intercept_sweep ~ normal(0, sqrt(square(group_sd) + square(sigma_alpha)));
  intercept_sweep ~ normal(mu_alpha, sigma_alpha);
}
generated quantities {
  vector[N_groups] group_r = group_r_fit - intercept_sweep;
}

Result of Stan output of the Martin’s original poisson simulation #1 post:

cbind(group_r,apply(extr$group_r,2,mean))
          group_r            
 [1,] -1.30181732 -1.75454030
 [2,] -0.42344495 -0.44177469
 [3,]  0.09653739 -0.27486722
 [4,] -0.34670480 -0.44204368
 [5,] -0.28495873 -0.48253579
 [6,]  0.62022990  0.53144460
 [7,] -0.46827546 -0.59749400
 [8,]  1.35904724  1.20603853
 [9,] -0.52597660 -0.44336027
[10,]  0.28756486  0.51761413
[11,]  1.14154453  1.20636907
[12,]  1.07758251  0.91829919
[13,] -1.04367302 -0.69513474
[14,] -0.17835562 -0.37404690
[15,]  0.76650759  0.61077257
[16,]  1.29526395  1.31688770
[17,] -0.09173423  0.01503373
[18,]  0.35034911  0.24379565
[19,] -0.10705575 -0.21414595
[20,] -0.59695379 -0.84631160
> mean(extr$intercept_sweep)
[1] 3.051445
> mean(extr$group_sd)
[1] 2.583279

pair(stan_fit,pars=c("intercept_sweep","group_r_raw"))

Do we need to put a prior on the intercept?

Instead of b \sim N(0, \Sigma_b) we have b \sim N(\mu_b, \Sigma_b). Thus the mean \mu_y changes to:

\mu_y = E[y] = E[Ax + b] = AE[x] + E[b] = A\mu_x + \mu_b

I think it should be:

intercept_sweep ~ normal(mu_alpha, sigma_alpha);

*** Update: Scaling was wrong ***

Code to check correct scaling

functions {
  vector Q_sum_to_zero_QR(int N) {
    vector [2*N] Q_r;

    for(i in 1:N) {
      Q_r[i] = -sqrt((N-i)/(N-i+1.0));
      Q_r[i+N] = inv_sqrt((N-i) * (N-i+1));
    }
    //Q_r = Q_r * inv_sqrt(1 - inv(N));
    return Q_r;
  }
  vector sum_to_zero_QR(vector x_raw, vector Q_r) {
    int N = num_elements(x_raw) + 1;
    vector [N] x;
    real x_aux = 0;

    for(i in 1:N-1){
      x[i] = x_aux + x_raw[i] * Q_r[i];
      x_aux = x_aux + x_raw[i] * Q_r[i+N];
    }
    x[N] = x_aux;
    return x;
  }
}
data {
  int N;
}
transformed data {
  vector[2*N] Q_r = Q_sum_to_zero_QR(N);
  real x_raw_sigma = sqrt(1 - inv(N)); 

}
parameters {
  vector[N - 1]  x_raw;
}

transformed parameters {
  vector[N] x = sum_to_zero_QR(x_raw, Q_r) ;
}

model {
  x * x_raw_sigma ~ normal(0,  0.5);
}

Thanks to everyone in the post, any help is very much appreciated.

1 Like