Non-centered parameterization of a hierarchical negative binomial model in Stan

I’m trying to estimate the effect of a predictor on count data. I also want to account for time-dependent trends in the counts by including 3 autoregressive terms. I input these t-k data instead of creating them in the stan model.

These data are grouped, so I want to incorporate varying parameter estimates and intercepts for each group.

When I run this model, I get many divergent transitions unless I run it for many iterations (~20,000). I think the likelihood surface is too complex and causing the “funnel” problem. A standard recommendation is to then implement non-centered parameterization. All the documentation/examples I can find for non-centered parameterization are for normally distributed data however, and I can’t quite figure out how to implement it for a negative binomial model.

Does this reparameterization even make sense? Or maybe there’s a better way to get this to fit more efficiently?

data {
  int<lower=0> N; //number of observations
  int count[N]; //Outcome variable
  int Ngroups; //number of groups
  int groupID[N]; //group ID
  real predictor[N]; //Predictor variable
  real ARcount1[N]; // count t-1
  real ARcount2[N]; //count t-2
  real ARcount3[N]; //count t-3
}

parameters {
  real Intercept;  //global intercept
  real AR_term1[Ngroups]; //parameter for 1st autoregressive term
  real AR_term2[Ngroups]; //parameter for 2ndst autoregressive term
  real AR_term3[Ngroups]; //parameter for 3rdst autoregressive term
  real<lower=0> sigma_gr; // sigma for estimating the group intercepts
  real slope[Ngroups]; //the regression parameters on the predictor
  real ran_intercept[Ngroups]; //Intercept for the groups
  real<lower=0>  phi; //the overdispersion parameters
}


model {
  phi ~ cauchy(0, 2.5);
  sigma_gr ~normal(0,1);
  Intercept ~ normal(0,1); //priors following Gelman 2008
  slope1 ~ cauchy(0,2.5);
  AR_term1 ~ cauchy(0,2.5);
  AR_term2 ~ cauchy(0,2.5);
  AR_term3 ~ cauchy(0,2.5);

  for (n in 1:N){

   count[n] ~ neg_binomial_2(exp(Intercept + AR_term1[groupID[n]]*ARcount1[n]+
     AR_term2[groupID[n]]*ARcount2[n]+AR_term3[groupID[n]]*ARcount3[n]+
    slope1[groupID[n]]*predictor[n]+ran_intercept[groupID[n]]), phi);
  
}
 
  for(j in 1:Ngroups){
  ran_intercept[j]~normal(0,sigma_gr);  //varying intercept by group
  }
}


generated quantities {  //generate within-sample predictions for validation
  
  int<lower = 0> count_pred[N];
  
  for (i in 1:N){
    
   count_pred[i] = neg_binomial_2_rng(exp(Intercept + AR_term1[groupID[i]]*ARcount1[i]+
     AR_term2[groupID[i]]*ARcount2[i]+AR_term3[groupID[i]]*ARcount3[i]+
    slope1[groupID[i]]*predictor[i]+ran_intercept[groupID[i]]), phi);
  
 
}}


I’m working on a hierarchical example for the manual that is simultaneously more pedagogically straightforward as well as more performant. It’s not quite ready yet in terms of text, but the stan code an R code to simulate/fit it are here. The stan code shows how to include both centered/non-centered in the same model, which in turn may make it more clear what structures/variables in the model are affected by that choice and more importantly which aren’t. Those that aren’t include the observation-level structure, so you should be able to swap out the normal for a negative binomial easily.

You may need to take the log of the autoregressive counts. That’s what I typically see for this type of model. Or you do log(x+1) to avoid log(0).

You can also use neg-binom-2-log-glm instead of neg_binomial_2.

If this still fails, you may need to reparameterize your model by following prior choice recommendation.
.