Correct vectorization in hierarchical model

Hi all,

I am trying to code a hierarchical Bayesian Poisson regression model where the data are grouped.

The mathematical formulation is:

y_{ij} \sim Pois(\lambda_{ij})

log(\lambda_{ij}) = \sum\limits_{k = 1}^K {\rm{coeff}{_{kj}} \times {X_{kj}}}
(e.g.: \beta_j \times X1_{ij} + \gamma_j \times X2_{ij} when K=2)

\rm{coeff}_k \sim N(\mu_k, \sigma_k)
\rm{coeff}_k \sim N(\mu_k, \sigma_k)

\mu_k \sim N(0, 1)
\sigma_k^2 \sim πΌπ‘›π‘£πΊπ‘Žπ‘šπ‘šπ‘Ž(1, 1)

My stan code is pasted below, but I am not sure if the vectorizations are correct. Could you kindly give me some suggestions on how to improve the vectorization? Thanks a lot!

data {
  int<lower=1> I;            // number of points in each group
  int<lower=1> J;            // number of groups
  int<lower=1> K;            // number of coefficients 
  int<lower=1> N;
  int<lower=1> Y[N];         // count outcome    
  real<lower=1> X[K,N];      // attribute matrix 
  int<lower=1> group_id[N]; 

parameters {
  real coeff[K, J];           
  real mu[K];
  real<lower=0> sigma2[K];

transformed parameters {
  real log_lambda[N];
  for (i in 1:N) {
    log_lambda[i] = dot_product(coeff[,  group_id[i]],  X[, i]);

model {
  target += normal_lpdf(mu |0, 2);
  target += inv_gamma_lpdf(sigma2 |1, 1);
  for (k in 1:K) {
    target += normal_lpdf(coeff[k, ] | mu[k],  sqrt(sigma2[k]));
  target += poisson_log_lpmf(Y|log_lambda);

Looks correct to me. Are you having any issues sampling?

Thanks. It works, but the chains of posterior samples of the hyper parameters are different from those from the un-vectorized version (The posterior of coefficients are the same, though). For example:

  for (k in 1:K) {
    target += normal_lpdf(coeff[k, ] | mu[k],  sqrt(sigma2[k]));


  for (k in 1:K) {
     for ( j in 1:J) {
       target += normal_lpdf(coeff[k, j] | mu[k],  sqrt(sigma2[k]));

I assume there should be not any different except for the sampling time, right?

I recommend negative binomial rather than Poisson. Poisson has that fixed variance which can make it hard to fit data and can lead to overfitting and slow mixing.

1 Like

Thanks! I’ll definitely try that.

Hm, that does sound like something’s wrong. Sorry for my premature green-light; I tend to get confused when it comes to indexing and multi-dimensional arrays. Maybe @martinmodrak could take a look?

Hi, I found that the difference in posteior samples might not because of the vectorization. Even if I tried to sample with the same stan code (the vectorized version) multiple times, the return trace plots for the hyper paprameters were different!

1 Like

Some variation is to be expected but were they notably different? Also I wouldn’t fix too much on the traceplots, but instead on the posterior intervals and diagnostics (Rhat, n_eff)…

Also if you just want to use this model or similar (and your goal is not learning Stan or moving from this model to some complex beast) and are working with R, you should be able to use the brms package which will do a lot of the hard work for you…

Best of luck!

Thanks! I will check out the package. The posteriors for the hyper parameters are considerably different and the mean values as well. However, the posteriors for the coefficients remain the same.

Example 1

Example 2


Sorry for the delay. So the trace plots look like the region with large sigma is problematic to explore. This looks like it might be a case of Neal’s funnel and needs non-centered parametrization (search the manual and/or forums) for more details. Also generally we recommend putting half-normal prior on the sigma parameter in hierarchical models, the inverse-gamma priors on variance are known to have some undesirable properties (can put a lot of prior mass on very large variance - if I remeber correctly).

1 Like