Divergent transitions in state space modelling with Poisson data

I am trying to test Stan with simulated Poisson time series data, and I am getting warnings of divergent transitions, although the results look fine:

n <- 100
y <- rpois(n, exp(cumsum(c(0, rnorm(n - 1, sd = 0.1)))))

stan_data <- list(n = n, y = y, x1 = 0, P1 = 100)
stan_inits <- list(
  list(sd_x = 0.5), list(sd_x = 0.1), list(sd_x = 0.01), list(sd_x = 0.05))

model <- '
data {
  int<lower=0> n;
  int y[n];
  real x1;
  real P1;

parameters {
  real<lower=0> sd_x;
  vector[n] x_raw;

transformed parameters {
  vector[n] x;
  x[1] = x1 + sqrt(P1) * x_raw[1];
  for(t in 2:n) {
    x[t] = x[t-1] + sd_x * x_raw[t];

model {
  target += normal_lpdf(x_raw | 0, 1);
  target += poisson_log_lpmf(y | x);

fit <- stan(model_code = model, data = stan_data, seed = 1, refresh = 100,
  iter = 2000, thin = 1,  chains = 4, 
  init = stan_inits)

Here I get 88 warnings about divergent transitions (and couple warnings about nans), and changing adapt_delta does not help (the red dot’s are below the diagonal). Still, the actual fit looks to be okay:

print(fit, pars = c("sd_x"))
Inference for Stan model: a1ebac59fcbd2845df6abf1f4857faec.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

     mean se_mean   sd 2.5%  25% 50%  75% 97.5% n_eff Rhat
sd_x 0.11       0 0.06 0.02 0.06 0.1 0.14  0.25   600 1.01

Samples were drawn using NUTS(diag_e) at Tue May 30 09:50:18 2017.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

And if I simulate bit longer time series, say 200, I do not get any warnings. But compared to my own random-walk-Metropolis codes the data/model is well behaved and I would expect Stan to work without any issues here as well.

I’ve seen this kind of problem with state-space models before. When I do a pairs plot with sd_x and x_raw[1:2] I see that both sd_x and x_raw[1] have skewed marginal posterior distributions, and are negatively correlated, and I’m guessing that may have something to do with it. With continuous observations and normal error distribution there are well-known recurrence equations to compute the marginal likelihood (integrating out x_raw) directly, and I’ve found that to help get rid of diverging transitions, but you don’t have that option here. I did find that if I ran your example with adapt_delta=0.99, the divergent transitions went away.

1 Like

Slide Standard Approaches Fail.

Isn’t your model a lagged count model, instead of a PAR model?