Time Series with Horseshoe Prior

Hi everyone,

I am posting here because I am running into some troubles in trying to fit a time-series model using Stan.
The model is not too complicated, but it does have quite a number of parameters.

Say we observe a sequence of \left\{y_t\right\}_{t = 1}^{T}, with y_t \in \mathbb{R} for all t. We write the following:

\begin{cases} y_t = b + c_t + \varepsilon_t,\\ c_t = \gamma c_{t-1} + A_t + w_t, \end{cases}

with \varepsilon \overset{\mathrm{iid}}\sim \mathcal{N}\left(0, \sigma_1^2\right) and w\overset{\mathrm{iid}}\sim \mathcal{N}\left(0, \sigma_2^2\right).

This model was first introduced in [1].

We want to extend this model and make it Bayesian. To do so, we put a prior on the various parameters:

  1. b \sim \mathcal{N}\left(0, 10^2\right)
  2. \gamma \sim \mathcal{B}\left(2, 5\right)
  3. A_t \sim \mathcal{N}\left(0, \tilde{\lambda}_t^2 \tau^2\right)
  4. \tilde{\lambda}_t^2 = \frac{\kappa^2 \lambda^2_t}{\kappa^2 + \tau^2 \lambda^2_t} .

This is akin to [2].
The sequence \left\{A_t\right\}_{t=1}^{T} is called a spike-train. It should be a non-negative sequence, but for the time being we can relax this assumption.

The goal is to complete the model, adding the relevant priors for the scale-mixture parameters of the various A_t.

The implementation we have is the following:

data {
    int<lower=0> n; # number of observations
    vector[n] y; # time series
    real<lower=0> scale_global; # scale for the half-t prior for tau
    real<lower=1> nu_global; # degrees of freedom for the half-t priors for tau
    real<lower=1> nu_local; # degress of freedom for the half-t priors for lambdas
    real<lower=0> slab_scale; # slab scale for the regularised horseshoe
    real<lower=0> slab_df; # slab degrees of freedom for the regularised horseshoe
    }

parameters {  
   real b;
   vector[n] z;
   real<lower=0> aux1_global;
   real<lower=0> aux2_global;
   vector<lower=0>[n] aux1_local;
   vector<lower=0>[n] aux2_local;
   real<lower=0> kappa_aux;
   vector[n] c; #; auto-regressive coefficients
   real<lower=0, upper=1> gamma;
}

transformed parameters {   
   vector<lower=0>[n] lambda;
   real<lower=0> kappa; # slab scale
   vector<lower=0>[n] lambda_tilde; # `truncated` local shrinkage parameters
   real<lower=0> tau;
   vector[n] A; # spike train

   lambda = aux1_local .* sqrt ( aux2_local );
   kappa = slab_scale * sqrt(kappa_aux);
   tau = aux1_global * sqrt ( aux2_global ) * scale_global ;
   lambda_tilde = sqrt(kappa^2 * square(lambda) ./ (kappa^2 + tau^2*square(lambda)));  
   A = z .* lambda_tilde*tau;
}

model {
   z ~ normal(0, 1);
   aux1_local ~ normal(0, 1);
   aux2_local ~ inv_gamma(0.5 * nu_local, 0.5 * nu_local);
   aux1_global ~ normal(0, 1);
   aux2_global ~ inv_gamma(0.5 * nu_global, 0.5 * nu_global);
   kappa_aux ~ inv_gamma(0.5 * slab_df, 0.5 * slab_df);
   b ~ normal(0., 10.);
   gamma ~ beta(2., 5.);
   c[1] ~ normal(0., 2.);
   c[2:n] ~ normal(gamma * c[1:(T - 1)] + A[2:T], 10.);

   for (k in 2:T) {
      y[k] ~ normal(b + c[k], 10.);
   }
}

However, for some reason, this does not seem to work. I was wondering whether this might be due to the number of parameters growing with faster than the number of observations.

Thanks!
Andrea

References:
[1] J. T. Vogelstein et al., ‘Fast Nonnegative Deconvolution for Spike Train Inference From Population Calcium Imaging’, Journal of Neurophysiology, vol. 104, no. 6, pp. 3691–3704, Dec. 2010, doi: 10.1152/jn.01073.2009. https://doi.org/10.1152/jn.01073.2009
[2] J. Piironen and A. Vehtari, ‘Sparsity information and regularization in the horseshoe and other shrinkage priors’, Electron. J. Statist., vol. 11, no. 2, Jan. 2017, doi: 10.1214/17-EJS1337SI.

What do you mean when you say the model does not seem to work?

A nice debugging strategy is to start with a simpler model, verify it works, then add model components one-by-one until you run into problems. That way you can isolate the problem. I would recommend starting with an overly simple model, say:

y_t = b + \epsilon_t

and make sure that works first, then work up from there.

I’m also a little confused by these lines:

 c[2:n] ~ normal(gamma * c[1:(T - 1)] + A[2:T], 10.);

and

y[k] ~ normal(b + c[k], 10.);

Shouldn’t these variances be estimated?

1 Like