Trend-cycle time series - convergence or mixing issue?

I’m trying to roughly replicate Harvey, Trimbur & van Dijk (2005). See pages 3-4 for their model. Right now I’m only concerned with 1st order cycles.

As it stands right now, autocorrelations are really high - when I look at trace plots, several parameters (especially lambda and rho) look like they’ve just not really moving. Combining this with large Rhat values makes me think that it just hasn’t converged by the time it gets to sampling.

What can I do to improve convergence? It seems like I had a a similar problem when I was working on a local level model that had problems similar to “Neal’s funnel” that made me specify the ratios for the variances instead of the variances themselves. Do I need to find a way to reparametarize lambda and rho? I’m not sure how I would, but based on their particularly low sample sizes and high Rhats, they seem to be causing my fitting issues.

Stan code (ts_trend_cycle.stan)

data {
  int<lower=0> N;
  real y[N];

  # Priors on variances
  real <lower=0> sigma_epsilon_prior_mean;
  real <lower=0> sigma_epsilon_prior_std;
  real <lower=0> signal_noise_prior_mean;
  real <lower=0> signal_noise_prior_std;
  real <lower=0> cycle_trend_prior_mean;
  real <lower=0> cycle_trend_prior_std;

  # Priors on cycle process
  real rho_alpha;
  real rho_beta;
  real <lower=0> lambda_alpha;
  real <lower=0> lambda_beta;
}

parameters {
  # Variances
  real <lower=0> sigma_epsilon;
  real <lower=0> ratio_signal_noise;
  real <lower=0> ratio_cycle_trend;

  # Cycle process
  real <lower=0, upper=1> lambda;
  real <lower=0, upper=1> rho;

  # Time series of trend and cycle
  vector[N] beta;
  matrix[N, 2] psi;

  # Initial value of trend (diffuse / no likelihood)
  real mu_0_offset;
}

transformed parameters{
  real <lower=0> sigma_zeta;
  real <lower=0> sigma_kappa;
  matrix[2,2] T_psi;
  vector[N] mu;
  real mu_0;

  # Cycle transition equation
  T_psi[1,1] = rho * cos(lambda);
  T_psi[2,1] = -rho * sin(lambda);
  T_psi[1,2] = rho * sin(lambda);
  T_psi[2,2] = rho * cos(lambda);

  # Defined ratio_signal_noise = sigma_eta / sigma_epsilon
  sigma_zeta = sigma_epsilon * ratio_signal_noise;
  # Defined ratio_cycle_trend = sigma_kappa / sigma_eta
  sigma_kappa = sigma_zeta * ratio_cycle_trend;

  # Trend defined from slope
  # Fix scale of mu_0:
  mu_0 = mu_0_offset + y[1];
  mu = cumulative_sum(beta) + mu_0;
}

model {
  # Priors
  sigma_epsilon ~ normal(sigma_epsilon_prior_mean, sigma_epsilon_prior_std);
  ratio_signal_noise ~ normal(signal_noise_prior_mean, signal_noise_prior_std);
  ratio_cycle_trend ~ normal(cycle_trend_prior_mean, cycle_trend_prior_std);
  rho ~ uniform(rho_alpha, rho_beta);
  lambda ~ beta(lambda_alpha, lambda_beta);

  # Build trend
  # Beta nonstationary, ignore any "error" in the first period.
  beta[2:N] ~ normal(beta[1:N-1], sigma_zeta);

  # Build cycle
  # First period psi variance depends on inverted system since we have a max
  # eignevalue less than one here.
  psi[1] ~ normal(0, (1 / (1-rho^2)) * sigma_kappa);
  # Project psi forward each period
  for (i in 2:N) {
    psi[i] ~ normal(T_psi * psi[i-1]', sigma_kappa);
  }

  # Likelihood of data
  y ~ normal(mu + psi[,1], sigma_epsilon);
}

R code

# Load libraries
#####################################################################
library(rstan)
library(shinystan)
library(quantmod)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

# Get data from FRED
#####################################################################
getSymbols("GDPC1", src="FRED")

dat = list(y = as.vector(coredata(log(GDPC1))), N = dim(GDP)[1],
           signal_noise_prior_mean = 100, signal_noise_prior_std = 200,
           sigma_epsilon_prior_mean = 1, sigma_epsilon_prior_std = 2,
           cycle_trend_prior_mean = 0.5, cycle_trend_prior_std = 1,
           rho_alpha = 0, rho_beta = 1, lambda_alpha = 4, lambda_beta = 9);

# Fit
#####################################################################
fit = stan(file="ts_trend_cycle.stan", data=dat,
           control=list(max_treedepth=15, adapt_delta=0.95),
           iter=2000, chains=4)

pairs(fit, pars=list("sigma_zeta", "sigma_epsilon", "sigma_kappa",
                     "lambda", "rho", "mu_0_offset"))

launch_shinystan(drop_parameters(as.shinystan(fit), pars=c("beta", "mu", "psi")))

I have had difficulties with these kind of time series models as well in Stan. You might improve things a bit by reparameterizing a psi’s so that you are sampling psi_std from normal(0,1) and then in transformed parameters block you construct the actual psi’s from those. And perhaps you could do something else as well, I’m not really familiar with all the “Stan tricks”.

But as everything is Gaussian in your case, I would suggest you to marginalise over the states. No reparameterization is going to beat that as you decrease the dimensionality of your MCMC problem to fraction of the original. There is a function dlm_gaussian_obs or something similar already available in Stan which gives you the log-likelihood of linear-Gaussian state space model. But if and when you are also interested in sampling the states then you need to do bit more work. You might want to check this: https://github.com/helske/stannis/blob/master/exec/gaussian_ll_kalman2.stan. That file contains relevant functions for univariate state space models, and the code isn’t fully optimised as that was just a initial test for something else, but you get the idea. The sampling of the states in generated quantities block is based on https://www.jstor.org/stable/4140605.

1 Like

The reparameterization for the funnel isn’t a “Stan trick” per se, though we weren’t originally familiar with it when Matt Hoffman (who invented NUTS) independently discovered it, so we called it the “Matt trick” for a while. It was originally developed for Gibbs and Metropolis, where you’ll get biased estimates if you don’t reparameterize: http://84.89.132.1/~omiros/papers/val7.pdf

The point is to remove parameter dependencies in the prior, which reduces parameter dependencies when there isn’t a lot of data, which is typically of most time series where you get exactly one observation per parameter.

You can also vectorize the likelihood for psi[2:N] as

psi[2:N] ~ normal(T_psi * psi[1:(N-1)]', sigma_kappa);

It’d be better to transpose psi in the input, or to set a transposed transformed data variable to avoid the copies for transpose at each iteration.