Help on a state space model

Folks, I am trying to estimate a state space model on some simulated data taken from the nimble manual.

I want to get any feedback on how to improve the model to recover the parameters better.

The data is simulated as below:

t <- 25; mu_0 <- 1
x <- rnorm(1 ,mu_0, 1)
y <- rnorm(1, x, 1)
a <- 0.5; b <- 1; c <- 1
for(i in 2:t){
x[i] <- rnorm(1, x[i-1] * a + b, 1)
y[i] <- rnorm(1, x[i] * c, 1)
}

The stan code used:

// non-centered version
data {
  int <lower=0> T;
  real x[T];
  real y[T];
}

parameters {
  real u_err[T];
  real <lower=0> s_obs;
  real <lower=0> s_level;
  real<lower=0, upper=1> a;
  real b;
}

transformed parameters {
  real u[T]; //Level

  u[1] = u_err[1];

for (t in 2:T) {
    u[t] = a * u[t-1] + b + s_level * u_err[t] ;
}

}

model {

  u_err ~ normal(0,1);

  y ~ normal(u,s_obs);

}



I have ignored estimating the c parameter here.

I run this code with defaults, 4000 iterations per chain and adapt_delta=0.9999 with the following warnings:

There were 2 divergent transitions after warmup. Increasing adapt_delta above 0.9999 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmupThere were 1 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-lowExamine the pairs() plot to diagnose sampling problems

I get the following results:

4 chains, each with iter=4000; warmup=2000; thin=1; 
post-warmup draws per chain=2000, total post-warmup draws=8000.

        mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
a       0.27    0.00 0.21 0.01 0.09 0.21 0.39  0.78  1871 1.00
b       1.51    0.01 0.46 0.45 1.22 1.59 1.85  2.20  2168 1.00
s_level 0.34    0.01 0.26 0.01 0.13 0.29 0.51  0.94   411 1.01
s_obs   0.83    0.01 0.22 0.25 0.72 0.84 0.96  1.24   308 1.01

Samples were drawn using NUTS(diag_e) at Tue Aug 28 15:15:21 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

The parameter estimates are somewhat off with expected values of

  • a=0.50,
  • b=1, and
  • s_level / s_obs = 1.

Hoping that I can improve the code.

Thanks.

Raghu

I’m not much of an R person, so maybe I’m missing something I wouldn’t from the mathematical description, but it seems like you start from some sort of deterministic skeleton and are simulating stochastic transitions in x at each step t given some (also random) initial condition, and y is the observation model. And you are using the deterministic model to have Stan estimate the parameters

It may not be a problem with Stan at all, but an issue with using a deterministic model to estimate parameters that drift stochastically with each step – as opposed to simulating an entire deterministic trajectory and taking samples from whatever distribution. If that’s the case it would be useful to plot the deterministic solution for your “true” parameters on top of your simulated data (can you post that?). If by chance your stochastic data drifted too far from it it is likely that MCMC twisted the function away from the true values to get a good fit and that’s why the estimates are “wrong”.

Hopefully I’m not missing the point here, and maybe there are stan-specific issues, but that’s something to watch for when trying to fit deterministic state-space models to stochastic simulations.

I too have been playing around with and having troubles fitting state space models in Stan. For your Stan model, I added some priors and removed the multiplication from your for loop. Here’s the overall model

data {
  int <lower=0> T;
  real x[T];
  real y[T];
}

parameters {
  vector[T] u_z;
  real <lower=0> s_obs;
  real <lower=0> s_level;
  real <lower=0, upper=1> a;
  real b;
}

transformed parameters {
  real u[T]; //Level
  vector[T] u_err = s_level * u_z;

  u[1] = u_err[1];
  for (t in 2:T) {
    u[t] = a * u[t-1] + b + u_err[t] ;
  }
}

model {
  u_z ~ normal(0,1);
  a ~ normal(0, 1);
  b ~ normal(0, 2.5);
  s_level ~ gamma(2, 1);
  s_obs ~ gamma(2, 1);
  y ~ normal(u, s_obs);
}

Despite the max_treedepth = 15 and adapt_delta = .9999, there’s still plenty of warnings, as you’ve already seen. It’s not good to ignore these. With the priors in such good agreement with the simulation model, there is better recovery of parameters.

4 chains, each with iter=4000; warmup=2000; thin=1; 
post-warmup draws per chain=2000, total post-warmup draws=8000.

        mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
a       0.49    0.01 0.21 0.07 0.35 0.50 0.64  0.89  1471 1.00
b       0.77    0.01 0.39 0.05 0.50 0.75 1.02  1.58  1548 1.00
s_level 1.32    0.02 0.35 0.56 1.10 1.33 1.54  1.98   352 1.01
s_obs   0.76    0.03 0.41 0.11 0.44 0.71 1.02  1.66   251 1.01

Since there seems to be at least some interest in state space models, it would be nice if there were a better way to fit them in Stan.

Thanks Edward,

I was able to get the following results with cauchy priors for b, s_level, and s_obs.

Inference for Stan model: nimble_example1.
4 chains, each with iter=4000; warmup=2000; thin=10; 
post-warmup draws per chain=200, total post-warmup draws=800.

        mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
a       0.33    0.01 0.20 0.03 0.17 0.31 0.47  0.79   773 1.00
b       1.45    0.02 0.48 0.41 1.11 1.49 1.79  2.30   800 1.00
s_level 1.09    0.03 0.47 0.11 0.78 1.17 1.42  1.86   292 1.00
s_obs   0.98    0.04 0.48 0.15 0.59 1.00 1.34  1.85   166 1.03

Still get the warnings.

Cheers.

I think there’s two different basal problems with this inference. First, like I mentioned is related to that fact that even the deterministic solution seems to have little information about the parameters compared even to noise \sim \mathcal{N}(0,1). If you compute the solution in the absence of any noise you get the line in the first figure, and adding gaussian noise gives the dots.
t%20vs%20wz

I assumed you were taking the deterministic solution and using it for the model likelihood, which would have led directly to the problem above, but you are considering an error term at each step which your are sampling at each step.
If you had a likelihood informative of the parameters that could solve the problem, but I’m not sure about the effectiveness of the strategy since it greatly increases the number of parameters.

I think a better solution may be a Kalman filter if you have updates linear on the state, that doesn’t require estimating the actual errors at each step, but updating the best estimate at each step based on the process noise and observation model. Because there are analytical solutions in this case it should be possible to implement this in Stan (I think there are some examples if you google it).
For nonlinear systems that may require stochastic simulation which I think is not possible to implement in Stan yet.

None of this makes the first problem go away, though. In the absence of process noise there’s a faint increase in the function value in the first 4 time points, and in with process noise the whole thing just looks like noise, so I’d probably solve that first.

t%20vs%20xy

Yes, I agree the Kalman Filter seems like the way to go.

With a bit more searching around, I found the Stan based R package walker to fit these sorts of model. It’s not a panacea for state space models in Stan, but at least its a good start.

1 Like

Here two projects on state space models and Kalman filters with Stan: