Poisson Regression using Missing Data

Hi. Looking to fit a poisson regression to missing data.
See model code:

data {
  int<lower=0> N_obs;
  int<lower=0> N_mis;
  int<lower=1, upper=N_obs + N_mis> ii_obs[N_obs];
  int<lower=1, upper=N_obs + N_mis> ii_mis[N_mis];
  int<lower=0> y_obs[N_obs];
  // Covariates
  vector[N_obs + N_mis] x1;
  vector[N_obs + N_mis] x2;


transformed data {
  int<lower=0> N = N_obs + N_mis;

parameters {
  int<lower=0> y_mis[N_mis];
  real beta_0;
  real beta_1;
  real beta_2;

transformed parameters {
  int<lower=0> y[N];
  y[ii_obs] = y_obs;
  y[ii_mis] = y_mis;
  vector[N] mu = beta_0 + beta_1*x1 + beta_2*x2;

model {
   y ~ poisson(mu);

However, parameters cannot be integers (y[N]), yet the model requires integers.


Perhaps this post is helpful to you. The trick is slicing your data in such a way so as to first only perform inference on the observed data and then (if desired), impute the missing values as generated quantities.

I believe you are describing log \lambda_i = \mu_i = \beta_{0} + \beta_{1}x + \beta_{2}x or some linear predictor. As currently constructed, you would need to exponentiate \mu in your model here: y ~ poisson(exp(mu)). You could keep everything on the log scale and do y ~ poisson_log(mu). OR, if you want to be more efficient (and you do not need to store mu), you could use Stan’s built in poisson_log_glm(x, alpha, beta) statement. Note, you’ll have to restructure your input where x is a matrix, beta_1 and beta_2 are a vector of coefficients and beta_0 is alpha or the intercept.

This all assumes you have proper priors on the parameters you have defined. Either way, you’d still be able to complete the missing data approach described in that post. The hardest point is indexing what is observed vs. not and ensuring that is coded correctly in Stan.