I’ve got longitudinal count data and want to fit a model with intercepts varying between groups. I’ve got a few predictors, and the model fits fine using Stan or mgcv::gam and the model summaries look similar.

However, the random intercepts have spikes at 0, apparently because the remaining part of the linear predictor takes values lower than about -4 for some groups and the model as it’s specified appears to think that that’s close enough to 0 for it to not need another intercept.

So, the posterior distribution of the intercepts is bimodal with a spike at 0, and is not independent of the predictors.

What implications, if any, does it have for my interpretation of the other parameters in the linear predictor?

Also, I want to be able to simulate from the model, but the intercepts are tied to the values of the covariates, which seems wrong.

Can I respecify the model to overcome this? Do I need a hurdle model (at which point, this will become specific to Stan)? My existing model predicts numbers of 0s similar to those found in the data, so the data aren’t really “zero-inflated” if I understand the term correctly.

I’ve tried 2 formulations of the PTED model (using yours and (24) of the paper). I’ve also tried the Shanker model. In each case, the distribution of intercepts depends on the linear predictor.

Eventually, I found that I needed a separate intercept for those groups that have a 0 at baseline in a particular covariate. The resulting Poisson model seemed ok but simulations from it produced some physically impossible counts. Switching from log link to sqrt fixed that.

You mean you essentially take squares to go from unconstrained to positive? That’s dangerous as it can lead to non-identifiable multimodality (-1 and 1 map to the same thing).

Poisson’s often not a good fit because you often get multimodality in count data (zero inflation is common) as well as truncation (some limit on high counts) along with overdispersion (the variance is higher than the mean).

Yes, I’ve found issues with trying to use squares. I’d thought I’d find a local maximum that would be tight enough to stop the algorithm escaping it, but I was wrong.

So my current plan is to truncate the linear predictor.

Linear predictor? I’m more with Bob to inflate the zero-case.
I would suggest to cut out the zero case. The zero case is easy
dpois(0, la) or PTED_poisson(0, la). Thus you can norm to by dividing
with 1 - dpois(0, la) or equiv. distribution.
Then you multiply by (1-theta) and define something else the zero case,
the theta. For theta you might use some predictors and then use a
inv_logit transform. Do everything on the log-scale, then everything will be fine.

This describes a hurdle model, and you can do model for zero vs. non-zero and model for non-zero counts separately. Google “zero-inflated and hurdle models” for more info.

You are right, I’m wrong! I meant zero-inflated and described the wrong model. The zero-inflated is in the Stan manual and can easily adapted. Sorry for confusion.

Hurdle model is often appropriate, too, and depending on the data generating process might be better model than zero-inflated. It’s good to know both, and since hurdle model is easier to write, I would start with that.