Missing data in categorical data models

To impute a response you may add a continuous parameter, create a sample statement in generated quantities block and use this for the estimation of the mode.

logistic regression

parameters {
real<lower=0, upper=1> y_miss;
}
model {
target += y_miss .* log_inv_logit(mu_miss) + (1 - y_miss) .* log1m_inv_logit(mu_miss);
}

credit goes to:

Poisson distribution

parameters {
real<lower=0> y_miss;
}
model {
target +=  y_miss * (mu_miss) - exp(mu_miss);
}

Ordinal probit/logit

Following is referring to Stan manual: 1.8 Ordered logistic and probit regression | Stan User’s Guide
Use a simplex parameter y_miss same dimension D and each y_miss[i] has its corresponding
theta[i].

parameters {
simplex[K] y_miss;
}
model {
 vector[K] theta;
// ... this is for the missing value
  theta[1] = 1 - Phi(eta - c[1]);
  for (k in 2:(K - 1)) {
    theta[k] = Phi(eta - c[k - 1]) - Phi(eta - c[k]);
  }
  theta[K] = Phi(eta - c[K - 1]);
// ...
  target += sum(y_miss .* log(theta));

In case of a ordinal logit model we replace the functions Phi with inv_logit.

1 Like