Poisson Hierarchical Model Centered/Non-centered Parameterization

Hello everyone,

I am trying to fit the model in the picture on simulated data. I used the same model for generating data except that the covariance matrix was an identity matrix, and the beta bars were standard normal random variables. I have tried both centered and non-centered parameterization; I encountered with following problems.

Firstly, there were some divergent samples in the centered model. Before the switch to the non-centered version, I tried the handle the problem by increasing the parameter of LKJ distribution and decreasing the step size. It worked and posterior beta distributions almost centered around the generated beta values. But it was not the same for beta bars. Their posterior distributions are far away from generated beta bars. I wonder how I can handle this problem.

Secondly, I tried non-centered parameterization. It worked, but run time was ten times higher than centered parameterization for the same number of iterations. Is that normal? I thought run time would decrease.
Also, chains did not mix, and almost all iterations exceeded the max_treedepth even for max_treedepth=13.

I shared the data simulation code and centered and non-centered models in this order. I would be very happy if anyone could help me.

data_generator

data{

/* Dimensions <em>/
int<lower=0> M;// number of varaibles
int<lower=0> N;// number of observations
/</em> Design Matrix <em>/
matrix[N,M] X;
/</em> offset <em>/
vector<lower=0>[N] offset;
/</em> Hyperparameters*/
vector<lower=0>[M] sigma; // diagonal values of the covariance matrix, equals to ones matrix.

}

generated quantities{

vector [M] betabar; // mean vector of multivariate normal distribution
vector[M] beta; // regression coefficients

real<lower=0> a; // prior for relative risk
vector<lower=0>[N] wi; //relative risk

vector[N] xi; // mean of poisson distribution

int<lower=0> y_rep[N]; // response variable

for(i in 1:M){betabar[i]=normal_rng(0,1);}
beta=multi_normal_rng(betabar,diag_matrix(sigma));

a=gamma_rng(1,0.01);
for(j in 1:N){wi[j]=gamma_rng(a,a);}

xi=exp(X*beta+log(offset)+log(wi));

for(z in 1:N){y_rep[z]=poisson_rng(xi[z]);

}

}

centered

data{

/* Dimensions <em>/
int<lower=0> M;// number of varaibles
int<lower=0> N;// number of observations
/</em> Design Matrix <em>/
matrix[N,M] X;
/</em> offset <em>/
vector<lower=0>[N] offset;
/</em> Outcome <em>/
int<lower=0> y[N];
/</em> Hyperparameters*/
vector<lower=0>[M] sigma; //diagonal values of the covariance matrix, equals to 10s vector

}
parameters {

vector[M] beta; // coeffiencts in linear equation
real<lower=0> a; // prior for relative risk
vector<lower=0>[N] wi; // relative risk
vector [M] betabar; // mean vector of multivariate normal distribution
cholesky_factor_corr[M] Lcorr;// cholesky factor (L_u matrix for R)

}

transformed parameters{
vector[N] xi; // mean vector of poisson distribution
corr_matrix[M] R; // correlation matrix
cov_matrix[M] Sigma; // covariance matrix

xi=exp(X*betabar+log(offset)+log(wi));
R = multiply_lower_tri_self_transpose(Lcorr);
Sigma = quad_form_diag(R, sigma);
}

model {

target+=normal_lpdf(betabar|0,1);
target+=lkj_corr_cholesky_lpdf(Lcorr|4);
target+=multi_normal_lpdf(beta | betabar,Sigma);
target+=gamma_lpdf(a|1,0.01);

target+=gamma_lpdf(wi|a,a);
target+=poisson_lpmf(y|xi);

}

noncentered

data{

/* Dimensions <em>/
int<lower=0> M;// number of varaibles
int<lower=0> N;// number of observations
/</em> Design Matrix <em>/
matrix[N,M] X;
/</em> offset <em>/
vector<lower=0>[N] offset;
/</em> Outcome <em>/
int<lower=0> y[N];
/</em> Hyperparameters*/
vector<lower=0>[M] sigma; //diagonal values of the covariance matrix, equals to 10s vector

}
parameters {
vector[M] delta; // std normal variable that multiplied with covariance matrix
vector[M] delta2; // std normal variable that multiplied with mean
real<lower=0> a;

vector<lower=0>[N] wi; // relative risk
cholesky_factor_corr[M] Lcorr;// cholesky factor (L_u matrix for R)

}

transformed parameters{
corr_matrix[M] R; // correlation matrix
vector[M] beta; // regression coefficients
vector[N] xi; // mean of poisson

R = multiply_lower_tri_self_transpose(Lcorr);

beta=delta2*10+quad_form_diag(R, sigma) <em>delta;
xi=exp(X</em> beta+log(offset)+log(wi));
}

model {

target+=normal_lpdf(delta2|0,1);
target+=lkj_corr_cholesky_lpdf(Lcorr|2);
target+=normal_lpdf(delta|0,1);
target+=gamma_lpdf(a|1,0.01);

target+=gamma_lpdf(wi|a,a);
target+=poisson_lpmf(y|xi);

}

Edited by @jsocolar for syntax highlighting.

Hi, two questions for clarification:

  1. Why not use the negative binomial distribution with expection as used for the Poisson here and shape parameter alpha (now used for the gamma-distributed over dispersion term omega)? From the equations, it seems omega_i is unique for every observation i, which means you can use the NB distribution. That should be much more efficient.
  2. What is the reason to not just use a univariate prior for each of the betas? The equations seem to suggest that there are no group-level predictors, just m predictors…

[Edit: typos]

@jsocolar, thanks for the syntax highlighting; the old unformatted original post is still on discourse as well (as separate post)

Another point for speed: you are now manually updating the posterior with the full normalised log densities (all your “target =+” statements) whereas you only need the log densities up to a constant. You do the latter by using either a “sampling statement” like ‘’‘a~gamma(1,0.1)’’’ instead of ‘’‘target=+gamma_lpdf(a|1,0.1)’’’. Or use the non-normalised function gamma_ulpdf() instead of gamma_lpdf. (Poor choice of me to use gamma as example as I think you can actually integrate that our by using an NB distribution.)

@LucC Thank you for your responses.

  1. I am using that format because I am also trying the Poisson log-normal with the same model, but I will definitely change it to the NB model. I did not know it affected the efficiency.

  2. I think we have group-level predictors in the original problem, but I am still a bit confused about how group hierarchy works. In the original problem, we have different refrigerator models, their production time(month-year), and their installation time(month-time). I am trying to estimate the number of refrigerators installed while using the refrigerator model, production time, and installation time as predictors. I basically add them to the linear equation where I calculate the mean of Poisson. Is it a wrong way to model group-level hierarchy? Can you suggest a case study where I can understand how this type of modeling works?

For clarification, in the above problem m predictors represent m dummy variables of a factor predictor(Installation time).

Ad 1) Fair enough. See for instance here on Wikipedia about the gamma-poisson mixture and NB distribution being the same thing.

Ad 2) I learned about hierarchical modelling on the job, so I don’t have the best or latest recommendations for literature or study books. Maybe someone else does. I’m sure there are plenty of resources if you look for “multi-level” or “hierarchical modelling”. The Stan documentation also provides a nice high-level overview, I think.
As for the case you present here: your model actually doesn’t treat your simulated data as having a hierarchical structure as you are assigning each of the M groups its own coefficient beta. This is also the reason why your estimate of betabar seems “off”: you are trying to estimate a M-length mean vector and M by M covariance matrix from essentially one observation (i.e., one set of M model coefficients beta). In contrast, a simple hierarchical model might consider installation time (M groups) as a group-level predictor, meaning that aside from your coefficients for refrigerator model and and production time (which, as I understand, are not in the model at the moment) you will have a group-level term \epsilon_j, which is governed by the between group-variation parameter \sigma_M.

y_{ij} \sim NegBinom(\xi_{ij},\alpha)
\xi_{ij} = exp(\beta X_i+ log(offset_i) + \epsilon_j)
\epsilon_j \sim N(0,\sigma_M)

Note that these equations involve two suffixes: i for individual observations and j for groups. Also note that there is no multivariate distribution involved anywhere yet. Only if one were to add a second group-level predictor, one could consider that the effects of those two group-level predictors are correlated (e.g., if the effect of one is higher, the effect of another group-level predictor might be lower). But it doesn’t sound that this is at play here.

If you are using R and are having trouble to code up your model in Stan, consider using the rstanarm or brms package, which can both do what you need.

@LucC Thank you again for the answer. It clarifies many points. As the last question, do you think can a model on rstanarm be as complex as a model on rstan. I thought rstan is way flexible than those packages you mentioned.

1 Like

Happy to be of help.

As for rstanarm: yes it is less flexible as it includes a finite set of pre-compiled models. You’d have to check if hierarchical negative binomial models are included. If not, brms is super flexible and you should be able to work with that. The only (very minor) drawback compared to rstanarm is that brms doesn’t work with pre-compiled models and compiles on them on the fly. But that’s not really an issue. Also, brms (as well as rstanarm) works with formula syntax similar to glm and lme4, so if you’re used to those, that’s a bonus. Of course, if you’re confident coding up your own model for rstan, go for it! You could even specify your model with brms and extract the model code and adapt and use as wanted in rstan.

Happy modeling!

1 Like