Unexplained poor sampling for simple state space model involving a cyclic component

I do not succeed to get reasonable sampling when a cyclic component is added to a state-space model. The same model and data using PyMC provides excellent sampling. Maybe that my code for a cyclic component is wrong/sub-optimal. Any suggestion would be appreciated.
The code below implements a combination of integrated random walk with a cycle of known frequency.

data{
  int<lower=1>    N;    // Time series length
  vector[N]       y;       // Observations
  real            omega; // normalized angular frequency [rad] of the cycle
}

parameters{
  real slope_ini;
  real level_ini;
  real cycle_ini;
  real conj_cycle_ini;
  vector[N-1]  slope;
  vector[N]  cycle;
  vector[N]  conj_cycle;
  real<lower=0>   sigma_slope; 
  real<lower=0>	  sigma_irreg;
  real<lower=0>   sigma_cycle;
}

transformed parameters {
  vector[N] level;
  level[1] = level_ini; 
  for (n in 2:N) {
    level[n] = level[n-1] + slope[n-1];
  }  
}

model{
  // observation equations
  y[1:N] ~ normal(level[1:N] + cycle[1:N], sigma_irreg);

  // slope equations
  slope[1] ~ normal(slope_ini, sigma_slope);
  for (n in 2:N-1) {
    slope[n] ~ normal(slope[n-1], sigma_slope); 
  }  
    
  // cycle equations
  cycle[1] ~ normal(cycle_ini, sigma_cycle);
  conj_cycle[1] ~ normal(conj_cycle_ini, sigma_cycle);  
  for (n in 2:N) {
    cycle[n] ~ normal(cos(omega)*cycle[n-1] + sin(omega)*conj_cycle[n-1], sigma_cycle);
    conj_cycle[n] ~ normal(-sin(omega)*cycle[n-1] + cos(omega)*conj_cycle[n-1], sigma_cycle);
  } 	
}

```stan
1 Like

Can you be more specific? Were there divergence warnings or low effective sample size or multiple chains failing to converge?

It would help if you could share the working PyMC code and the calls used to fit it. Were you using anything in PyMC or Stan other than random initialized NUTS? If you were just using PyMC’s NUTS implementation, my guess would be that it’s not the same model being implemented.

The main issue I see here computationally is that you’re using a centered parameterization of a hierarchical model for cycle and conj_cycle. My guess is that your program will mix much better with a non-centered parameterization, which only requires a change in declaration (and a recent enough version of Stan).

vector<offset= cycle_ini, multiplier=sigma_cycle>[N] cycle;

You can speed up that Stan model computation. Here are some suggestions.

  1. Use cumulative sum to define level in terms of slope.
level[1] = level_ini;
level[2:] = level_ini + cumulative_sum(slope);

or as a one-liner with a compound declare/define.

vector[N] level = level_ini + append_row(0, cumulative_sum(slope));
  1. Full indexes can be dropped, so what you’re calling “observation equations” can be just:
y ~ normal(level + cycle, sigma_irreg);

This will be more efficient because it doesn’t do an allocate and copy for y, level, and cycle.

  1. The loop in the model block can be replaced with vectorized forms:
cycle[2:] ~ normal(cos(omega) * cycle[1:(N - 1)] + sin(omega) * conj_cycle[1:(N-1])]);

and same for conj_cycle. It’d be more efficient still to define cycle and conj_cycle to be of length N-1 so this doesn’t require any slicing and then cycle[1] and conj_cycle[1] can be scalar parameters.

  1. Precompute cos(omega) and sin(omega) and store as transformed data variables and reuse.
transformed data {
  real cos_omega = cos(omega0;
  real sin_omega = sin(omega);
  real neg_sin_omega = -sin_omega;
}
1 Like

Dear Bob,

All convergence measures failed for the slope disturbance variance: poor sampling, strong autocorrelations over hundreds of lags, R-hat etc. Mixing very poor. I also found that even a simple integrated random walk model shows similar problems with the slope disturbance variance. Only the observation disturbance variance converges. Using the same (synthetic) dataset and model in PyMC gives perfect results. Therefore, I suspect that my model implementation is at least sub-optimal. I will first follow your recommendations regarding the simple integrated random walk model and will report the results. If I get this simple model working I will add a cycle, follow your instructions again, and let you know.

Anyway thanks a lot for your response.

Regards, Roland Klees

I converted the jupyter notebooks to Python. Enclosed the two files.

Roland Klees

SSM_cmdstanpy-experimental_synthetic_dataset.py (13.4 KB)

SSM_pymc-experimental_synthetic_dataset.py (26.8 KB)

I hope it’s okay to comment to this older post. I got curious about the claim that Stan does poorly at a model that PyMC samples well. I tried the recommended changes and those help but it seems slope also needs a non-centered parametrization (at least under the simulation in the SSM_cmdstanpy…dataset.py notebook):

vector<offset=slope_ini, multiplier=sigma_slope>[N-1] slope;

With the non-centered parametrization (and max_treedepth = 12), the convergence diagnostics for the slope disturbance variance – that’s sigma_slope – improve.

I would like to try your example and see whether it works better. I wonder if you could send me the python code you used and/or the STAN file. To be honest, I find it frustrating that I can’t get any reasonable sampling for even simple state space models with Stan, though everything works perfectly when using pymc or pymc-experimental.

I actually find the explicit non-centered parametrization (so slope = slope_ini + sigma_slope * slope_raw) instead of <offset=slope_ini, multiplier=sigma_slope>) easier to understand.

Aside: This way of implementing the non-centered parametrization also seems to work better on this problem. (For example: no need to increase max_treedepth.) I don’t know why this might be the case.

Here is the full Stan program I came up with:

data{
  int<lower=1> N;
  vector[N] Y;
  int S;           // cycle length
}
transformed data {
  real lambda = 2 * pi() / S; // normalized angular frequency [rad]
  real cos_lambda = cos(lambda);
  real sin_lambda = sin(lambda);
  real neg_sin_lambda = - sin(lambda);
}
parameters{
  real slope_ini;
  real level_ini;
  real cycle_ini;
  real conj_cycle_ini;
  real<lower=0> sigma_slope;
  real<lower=0> sigma_cycle;
  real<lower=0> sigma_irreg;

  vector[N-1] slope_raw;
  vector[N] cycle_raw;
  vector[N] conj_cycle_raw;
}
transformed parameters {
  vector[N-1] slope = slope_ini + sigma_slope * slope_raw;
  vector[N] level = level_ini + append_row(0, cumulative_sum(slope));
  vector[N] cycle;
  vector[N] conj_cycle;

  cycle[1] = cycle_ini + cycle_raw[1];
  conj_cycle[1] = conj_cycle_ini + conj_cycle_raw[1];

  for (n in 2:N) {
    cycle[n] = (
      cos_lambda * cycle[n-1] + sin_lambda * conj_cycle[n-1]
      + sigma_cycle * cycle_raw[n]
    );
    conj_cycle[n] = (
      neg_sin_lambda * cycle[n-1] + cos_lambda * conj_cycle[n-1]
      + sigma_cycle * conj_cycle_raw[n]
    );
  }
}
model{
  slope_ini ~ normal(0, 1);

  sigma_slope ~ student_t(3, 0, 1);
  sigma_cycle ~ student_t(3, 0, 1);
  sigma_irreg ~ student_t(3, 0, 1);

  slope_raw ~ std_normal();
  cycle_raw ~ std_normal();
  conj_cycle_raw ~ std_normal();

  Y ~ normal(level + cycle, sigma_irreg);
}

A few divergences still happen from time to time.

Thanks a lot. I will work on it and let you know about my findings. Maybe a quick remark: cylce_ini must be a vector of size 2 as the recursive equations for the cycle assume that you have decomposed a cycle with non-zero phase into a cosine wave and sine wave each one with phase 0.

1 Like