Distribution of random intercept in longitudinal Poisson


This isn’t Stan-specific (yet).

I’ve got longitudinal count data and want to fit a model with intercepts varying between groups. I’ve got a few predictors, and the model fits fine using Stan or mgcv::gam and the model summaries look similar.

However, the random intercepts have spikes at 0, apparently because the remaining part of the linear predictor takes values lower than about -4 for some groups and the model as it’s specified appears to think that that’s close enough to 0 for it to not need another intercept.

So, the posterior distribution of the intercepts is bimodal with a spike at 0, and is not independent of the predictors.

What implications, if any, does it have for my interpretation of the other parameters in the linear predictor?

Also, I want to be able to simulate from the model, but the intercepts are tied to the values of the covariates, which seems wrong.

Can I respecify the model to overcome this? Do I need a hurdle model (at which point, this will become specific to Stan)? My existing model predicts numbers of 0s similar to those found in the data, so the data aren’t really “zero-inflated” if I understand the term correctly.


What is that average of counts in that group and what does dpois(0, group_avg) say? Is it that same as
mean(count[grp == x])?

About 50% of groups have all zero counts (at 5 observation times), so dpois(0, average) is 1 for those.

Most of the average counts are < 10, but there is a long tail drifting up to almost 50. So dpois(0, average) is effectively 0 beyond about 8.


Two ideas
a) mixture
b) change poisson

pow<-function(x,k) ((x)^(k))
square<-function(x) ((x)**2)
shanker <-function(x,theta)
shankerpoisson.log<-function(x,theta)  2*log(theta)-(2+x)*log(1+theta)-log(1+square(theta))+log(1+x+theta+square(theta))

real shanker_log(int x, real theta) {
       return 2*log(theta)-(2+x)*log(1+theta)-log(1+square(theta))+log(1+x+theta+square(theta));


  real PTED_log(int x, real theta, real alpha) {
    return log(theta) + log( (1-alpha) / pow(1 + theta, x+1) + 2 * alpha / pow(1 + 2*theta,x+1));

pow<-function(x,k) ((x)^(k))
square<-function(x) ((x)**2)
PTED <- function(x,theta,alpha) exp( log(theta) + log( (1-alpha) / pow(1 + theta, x+1) + 2 * alpha / pow(1 + 2*theta,x+1)))

Thanks for your help with this, and for the paper you linked.

I’m making some progress with the PTED and PSD models. I’ll post something more useful in a while.

Thanks again,

real PTED_log(int x, real theta, real alpha) {
  return log(theta) + log( (1-alpha) / pow(1 + theta, x+1) + 2 * alpha / pow(1 + 2*theta,x+1));

vector[N] mup;
real<lower=-1,upper=1> alpha;
target += PTED_log(y[i], (2.0 - alpha) / (2.0 * exp(mup[i])), alpha);

I coded it some time ago. The mu transformation in PTED_LOG refers to (14) of the paper.
(Its “old” Stan syntax)

As ever, thanks.

I’ve tried 2 formulations of the PTED model (using yours and (24) of the paper). I’ve also tried the Shanker model. In each case, the distribution of intercepts depends on the linear predictor.

Eventually, I found that I needed a separate intercept for those groups that have a 0 at baseline in a particular covariate. The resulting Poisson model seemed ok but simulations from it produced some physically impossible counts. Switching from log link to sqrt fixed that.

Every day’s a school day.

Thanks again,

You mean you essentially take squares to go from unconstrained to positive? That’s dangerous as it can lead to non-identifiable multimodality (-1 and 1 map to the same thing).

Poisson’s often not a good fit because you often get multimodality in count data (zero inflation is common) as well as truncation (some limit on high counts) along with overdispersion (the variance is higher than the mean).

Yes, I’ve found issues with trying to use squares. I’d thought I’d find a local maximum that would be tight enough to stop the algorithm escaping it, but I was wrong.

So my current plan is to truncate the linear predictor.

Linear predictor? I’m more with Bob to inflate the zero-case.
I would suggest to cut out the zero case. The zero case is easy
dpois(0, la) or PTED_poisson(0, la). Thus you can norm to by dividing
with 1 - dpois(0, la) or equiv. distribution.
Then you multiply by (1-theta) and define something else the zero case,
the theta. For theta you might use some predictors and then use a
inv_logit transform. Do everything on the log-scale, then everything will be fine.

This describes a hurdle model, and you can do model for zero vs. non-zero and model for non-zero counts separately. Google “zero-inflated and hurdle models” for more info.

You are right, I’m wrong! I meant zero-inflated and described the wrong model. The zero-inflated is in the Stan manual and can easily adapted. Sorry for confusion.

Hurdle model is often appropriate, too, and depending on the data generating process might be better model than zero-inflated. It’s good to know both, and since hurdle model is easier to write, I would start with that.