Partially-fixed Gaussian-process prior for varying slopes model: HMC not progressing

Hi all,

I am a complete beginner to Stan, so apologies in advance if I have made an obvious mistake or asking a silly question.

Set up
I am attempting to fit a “rolling” regression model in which the slope parameter varies through time:

y_t \sim \text{normal}(\mu_t, \sigma)
\mu_t = \alpha + \beta_t\cdot X_t

In particular, I want to apply a Gaussian process prior to \beta so that it varies relatively smoothly through time:

\beta \sim \text{multi-normal}\left(0, k(t_i, t_j) \right)

Where k is an exponentiated quadratic kernel.

When I use a fixed (true) GP as the prior, and infer \beta_t, \alpha, and \sigma, I can get back a posterior that closely captures the true values. However, when I try to also infer one of the GP parameters, even with an informative prior, sampling never seems to progress past the first iteration / warmup. Specifically I get stuck at: “Iteration: 1 / 1000 [ 0%] (Warmup)”.

Ultimately of course I would like to infer all parameters and build up a more complicated model. I understand that modelling \beta as a random walk is another option. I have done this already and would like to trouble-shoot the GP prior for now.

# Simulation parameters
gp_alpha <- 0.25
gp_rho <- 20
alpha <- 0
sigma <- 0.05

# Set up GP
eps <- sqrt(.Machine$double.eps)
t <- 1:500 
D <- as.matrix(dist(t))^2
K <- gp_alpha^2 * exp(-1/(2 * gp_rho^2) * D) + diag(eps, length(t))

# Simulate data
beta <- mvtnorm::rmvnorm(1, mean=rep(0, length(t)), sigma=K)[1, ] # unobserved
x <- runif(length(t)) # observed
y <- alpha + (beta * x) + rnorm(length(t), sd=sigma) # observed

# plot(t, beta, type = 'l', lwd=2, xlab="time", ylab="Beta")
  data {
    int<lower=1> N;
    vector[N] y;
    real x[N];
    real t[N];
    real<lower=0> gp_rho;
  parameters {
    real alpha;
    real<lower=0> sigma;
    real<lower=0> gp_alpha;
    vector[N] Z;
  model {
    // Partially-fixed GP prior
    matrix[N, N] cov = cov_exp_quad(t, gp_alpha, gp_rho)
                       + diag_matrix(rep_vector(1e-10, N));
    matrix[N, N] L_cov = cholesky_decompose(cov);
    vector[N] beta = L_cov * Z;
    // Mean model
    for (i in 1:N) {
      y[i] ~ normal(alpha + beta[i] * x[i], sigma);
    Z ~ normal(0, 1);
    // other priors
    gp_alpha ~ normal(0, 0.2);
    sigma ~ exponential(1);
    alpha ~ normal(0, 0.05);
# Fit model
dat <- list(N = length(y), y = y, x = x, t = 1:length(y), gp_rho = gp_rho)
s <- rstan::stan_model(model_code = mod, verbose = TRUE)
fit <- rstan::sampling(s, data = dat, cores = 4, iter = 1000, chains = 4)

Any help or advice is appreciated. I am using R version 4.1.0 with rstan version 2.21.2.

In your example, you have t <- 1:500. Stan does Cholesky decompose on every leapfrog step. Cholesky decompose of 500x500 matrix is slow enough that if there are many leapofrog steps, the computation can be very slow. sigma seems also to be that small that the non-centered parameterization you are using may be worse than centered parameterization (centered can be better if likelihood is strong compared to the prior). For 1D GP with suitable covariance functions (such as exponentiated quadratic, Matern, periodic), it is computationally more efficient to present the GP with basis functions as demonstrated in case studies Gaussian process demonstration with Stan and Birthdays.

Hope this helps

1 Like

Hi Aki,

Thank you very much for your response, that is all really helpful information and gives me plenty to work with.

1 Like