I’m having trouble with estimating a simple AR(1) + noise model with Stan:
set.seed(1)
x <- arima.sim(model = list(ar = 0.5), n = 500)
y <- 1 + rnorm(500, x, 0.5)
model <- "
data {
int<lower=0> T;
vector[T] y;
}
parameters {
real alpha;
real<lower=0, upper=1> rho;
real<lower=0> sigma_y;
real<lower=0> sigma_x;
vector[T] x_raw;
}
transformed parameters {
vector[T] x;
x[1] = sigma_x * inv(sqrt(1 - rho^2)) * x_raw[1];
for(t in 2:T) {
x[t] = rho * x[t - 1] + sigma_x * x_raw[t];
}
}
model {
x_raw ~ std_normal();
alpha ~ normal(0, 2);
rho ~ beta(2, 2);
sigma_y ~ normal(0, 2);
sigma_x ~ normal(0, 2);
y ~ normal(alpha + x, sigma_y);
}
"
fit <- rstan::stan(
model_code = model,
data = list(y = y, T = length(y)),
cores = 4, iter = 1e4)
I get low BFMI warnings and large Rhats, and some divergences:
Warning messages:
1: There were 2221 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.
2: There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
https://mc-stan.org/misc/warnings.html#bfmi-low
3: Examine the pairs() plot to diagnose sampling problems
4: The largest R-hat is 1.25, indicating chains have not mixed.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#r-hat
5: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess
6: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#tail-ess
> print(fit, pars = c("x", "x_raw"), include = FALSE, use_cache = FALSE)
Inference for Stan model: anon_model.
4 chains, each with iter=10000; warmup=5000; thin=1;
post-warmup draws per chain=5000, total post-warmup draws=20000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
alpha 1.00 0.01 0.09 0.85 0.94 1.00 1.07 1.17 45 1.08
rho 0.44 0.01 0.07 0.32 0.39 0.43 0.47 0.58 101 1.02
sigma_y 0.41 0.04 0.21 0.10 0.23 0.44 0.58 0.77 27 1.18
sigma_x 1.03 0.01 0.10 0.80 0.96 1.05 1.12 1.17 49 1.08
lp__ 22.54 90.67 315.59 -374.18 -225.73 -87.70 241.85 631.77 12 1.31
Switching to centered parameterization does not help.
I know that as both the latent AR process and observations are Gaussian, I could easily marginalize x
out using Kalman filter(*), and then sampling should work much better, but I’m still surprised that Stan struggles with such a simple model? Any tips on how get this working?
edit: Here is also the pairs plot from the noncentered model above:
And same from the centered version:
(*) Estimating the same model with bssm
which in this case uses Kalman filter and random walk Metropolis works OK:
library(bssm)
bssm_model <- ar1_lg(
y,
rho = uniform(init = 0.4, min = 0, max = 0.999),
sd_y = halfnormal(init = 1, sd = 2),
sigma = halfnormal(init = 1, sd = 2),
mu = 0,
xreg = matrix(1, nrow = length(y)),
beta = normal(init = 0, mean = 0, sd = 2)
)
out <- replicate(4,
run_mcmc(bssm_model, iter = 1e5, burnin = 1e4),
simplify = FALSE)
draws <- posterior::bind_draws(
lapply(out, function(x) as_draws(x, states = 0)),
along = "chain")
draws |> posterior::summarise_draws()
# A tibble: 4 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 rho 0.423 0.415 0.0689 0.0663 0.324 0.548 1.00 5085. 7031.
2 sigma 1.04 1.07 0.103 0.0978 0.849 1.18 1.00 3166. 4841.
3 sd_y 0.383 0.391 0.211 0.250 0.0411 0.717 1.00 1454. 873.
4 coef_1 1.01 1.01 0.0842 0.0835 0.876 1.15 1.00 22587. 31678.
edit: multiple chains with bssm, tidied the codes.