Poisson Hierarchical Model Non-centered Parametrization

image

Hello everyone,

I am trying to fit the model in the picture on simulated data. I used the same model for generating data except that the covariance matrix was an identity matrix, and the beta bars were standard normal random variables. I have tried both centered and non-centered parameterization; I encountered with following problems.

Firstly, there were some divergent samples in the centered model. Before the switch to the non-centered version, I tried the handle the problem by increasing the parameter of LKJ distribution and decreasing the step size. I worked and posterior beta distributions almost centered around the generated beta value. But ıt was not the same for beta bars. Their posterior distributions are far away from generated beta bars. I wonder how I can handle this problem.

Secondly, I tried non-centered parameterization. It worked, but run time was ten times higher than centered parameterization for the same number of iterations. Is that normal? I thought run time would decrease.
Also, chains did not mix, and almost all iterations exceeded the max_treedepth even for max_treedepth=13.

I shared the data simulation code and centered and non-centered models in this order. I would be very happy if anyone could help me.

sim

data {
  /* Dimensions */
  int<lower=0> M; // number of varaibles 
  int<lower=0> N; // number of observations
  /* Design Matrix */
  matrix[N, M] X;
  /* offset */
  vector<lower=0>[N] offset;
  /* Hyperparameters*/
  vector<lower=0>[M] sigma; // diagonal values of the covariance matrix, equals to ones matrix.
}
generated quantities {
  vector[M] betabar; // mean vector of multivariate normal distribution
  vector[M] beta; // regression coefficients
  
  real<lower=0> a; // prior for relative risk
  vector<lower=0>[N] wi; //relative risk
  
  vector[N] xi; // mean of poisson distribution
  
  array[N] int<lower=0> y_rep; // response variable 
  
  for (i in 1 : M) {
    betabar[i] = normal_rng(0, 1);
  }
  beta = multi_normal_rng(betabar, diag_matrix(sigma));
  
  a = gamma_rng(1, 0.01);
  for (j in 1 : N) {
    wi[j] = gamma_rng(a, a);
  }
  
  xi = exp(X * beta + log(offset) + log(wi));
  
  for (z in 1 : N) {
    y_rep[z] = poisson_rng(xi[z]);
  }
}

NB2 centered

data {
  /* Dimensions */
  int<lower=0> M; // number of varaibles 
  int<lower=0> N; // number of observations
  /* Design Matrix */
  matrix[N, M] X;
  /* offset */
  vector<lower=0>[N] offset;
  /* Outcome */
  array[N] int<lower=0> y;
  /* Hyperparameters*/
  vector<lower=0>[M] sigma; //diagonal values of the covariance matrix, equals to 10s vector
}
parameters {
  vector[M] beta; // coeffiencts in linear equation
  real<lower=0> a; // prior for relative risk
  vector<lower=0>[N] wi; // relative risk
  vector[M] betabar; // mean vector of multivariate normal distribution
  cholesky_factor_corr[M] Lcorr; // cholesky factor (L_u matrix for R)
}
transformed parameters {
  vector[N] xi; // mean vector of poisson distribution
  corr_matrix[M] R; // correlation matrix
  cov_matrix[M] Sigma; // covariance matrix
  
  xi = exp(X * betabar + log(offset) + log(wi));
  R = multiply_lower_tri_self_transpose(Lcorr);
  Sigma = quad_form_diag(R, sigma);
}
model {
  target += normal_lpdf(betabar | 0, 1);
  target += lkj_corr_cholesky_lpdf(Lcorr | 4);
  target += multi_normal_lpdf(beta | betabar, Sigma);
  target += gamma_lpdf(a | 1, 0.01);
  
  target += gamma_lpdf(wi | a, a);
  target += poisson_lpmf(y | xi);
}

Non-centered

data {
  /* Dimensions */
  int<lower=0> M; // number of varaibles 
  int<lower=0> N; // number of observations
  /* Design Matrix */
  matrix[N, M] X;
  /* offset */
  vector<lower=0>[N] offset;
  /* Outcome */
  array[N] int<lower=0> y;
  /* Hyperparameters*/
  vector<lower=0>[M] sigma; //diagonal values of the covariance matrix, equals to 10s vector
}
parameters {
  vector[M] delta; // std normal variable that multiplied with covariance matrix
  vector[M] delta2; // std normal variable that multiplied with mean
  real<lower=0> a;
  
  vector<lower=0>[N] wi; // relative risk
  cholesky_factor_corr[M] Lcorr; // cholesky factor (L_u matrix for R)
}
transformed parameters {
  corr_matrix[M] R; // correlation matrix
  vector[M] beta; // regression coefficients
  vector[N] xi; // mean of poisson
  
  R = multiply_lower_tri_self_transpose(Lcorr);
  
  beta = delta2 * 10 + quad_form_diag(R, sigma) * delta;
  xi = exp(X * beta + log(offset) + log(wi));
}
model {
  target += normal_lpdf(delta2 | 0, 1);
  target += lkj_corr_cholesky_lpdf(Lcorr | 2);
  target += normal_lpdf(delta | 0, 1);
  target += gamma_lpdf(a | 1, 0.01);
  
  target += gamma_lpdf(wi | a, a);
  target += poisson_lpmf(y | xi);
}

Hey there!

I took liberty to format your code a bit. I didn’t have time to go into much detail (yet), but the thing that stood out was that it appears you want to fit a multi-level model, but you don’t have groups. Is that correct? (You are trying to fit varying effect for each individual?)

@Max_Mantei Hey, thanks for the answer. I also posted it in the modeling section. Somebody helped me to understand what I was really doing. I am very new to hierarchical modeling so, I did not know to model a hierarchical model. It turned out that what was doing is basically fit a regression model. I changed my code and built a multilevel model with one group level and observation level variable. I have to extend it to a non-nested hierarchical model since I have 3 different group levels and 1 observation level. Can you suggest to me any study case like that? Also, let’s say I want to use multivariate priors, can I use it for group-level intercepts or just for observation level variables. Lastly, what is the reason behind using multivariate priors? Thank you again, I really appreciate it.

Hey there! I don’t have time for a more elaborate answer right now, but you questions seem fundamental enough that I suggest checking out the book Statistical Rethinking, or the lecture series of its author: Richard McElreath - YouTube

Not sure if I get that right, but for hierarchical models to use multivariate normal priors for more than one “variable”. I.e. an intercept AND a slope. The reasoning behind the multivariate prior is that those parameters are expected to be correlated. For example, for a line to fit a cloud of points roughly, when you increase the intercept you probably have to decrease the slope… That’s at least the intuition ;)

The book (or lecture) will go into much more detail and hopefully you’ll find what you need!

Cheers,
Max

1 Like