I’m working with time series models with latent parameters (e.g. stochastic volatility). These latent parameters are assumed to follow an AR process. Some of these may be highly persistent, but not random walks (as we don’t think they can grow unbounded when forecasting). Unfortunately, the length of these time series is also limited, although in the example I discuss, the issue remains even with longer time series (which I don’t show but you can check).
When trying to diagnose issues with the models, I went back to basics on the following. We can write an AR process as y_{t} = constant + \alpha y_{t-1} + \epsilon_{t}. However, it seems more intuitive to me to come up with priors for the unconditional mean than for the constant, as the unconditional mean is more of a scientific quantity, and the constant has very different meanings depending on \alpha. Thus, one can also write the AR process as y_{t} = (1-\alpha) * \bar{y} + \alpha y_{t-1} + \epsilon_{t} and provide priors.
In the following code, I implement both approaches. When using the first, with the constant, I add the Jacobian term (hopefully correctly) so that I can still put my prior on \bar{y} rather than on the constant. However, the number of divergences is much higher in the constant specification rather than in the unconditional mean specifications. This seems to be relatively robust in these toy examples I have played with. Do you have a good intuition for why that is the case? The issue improves with less persistent processes, and longer samples, but I would still like to get a grasp on why there is this difference.
Example:
library(rstan)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)
set.seed(20231016)
model_unc_mean <- stan_model(model_code = "
data {
int<lower=1> N;
vector[N] y;
}
parameters {
real<lower=-1,upper=1> alpha;
real<lower=0> sigma;
real unc_mean;
}
model {
alpha ~ normal(0, 1);
unc_mean ~ normal(0, 1);
sigma ~ gamma(1,1);
y[2:N] ~ normal((1-alpha) * unc_mean + alpha * y[1:(N-1)], sigma);
}
generated quantities {
real constant = (1-alpha) * unc_mean;
}
")
model_constant <- rstan::stan_model(model_code = "
data {
int<lower=1> N;
vector[N] y;
}
parameters {
real constant;
real<lower=-1,upper=1> alpha;
real<lower=0> sigma;
}
transformed parameters {
real unc_mean = constant / (1-alpha);
}
model {
// jacobian adjustment
target += log(1 / (1-alpha));
alpha ~ normal(0, 1);
unc_mean ~ normal(0, 1);
sigma ~ gamma(1,1);
y[2:N] ~ normal((1-alpha) * unc_mean + alpha * y[1:(N-1)], sigma);
}
")
generate_ar_series <- function(length, unc_mean, alpha, sigma) {
# generate a few observations to burn in / stationarize
N_burn_in <- 1000
output <- vector(length = length + N_burn_in)
output[1] <- unc_mean
# add some burn in
for (x in seq(2, length + N_burn_in)) {
output[x] <- (1-alpha) * unc_mean + alpha * output[x-1] + rnorm(1, sd = sigma)
}
output <- output[(N_burn_in+1):(length+N_burn_in)]
return(output)
}
example_series <- generate_ar_series(100, .2, .97, .3)
plot(seq_len(length(example_series)), example_series, type = "l")
estimation_unc_mean <- rstan::sampling(model_unc_mean, data = list(y = example_series, N = length(example_series)))
estimation_constant <- rstan::sampling(model_constant, data = list(y = example_series, N = length(example_series)))
#> Warning: There were 150 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them.
#> Warning: Examine the pairs() plot to diagnose sampling problems
estimation_unc_mean
#> Inference for Stan model: anon_model.
#> 4 chains, each with iter=2000; warmup=1000; thin=1;
#> post-warmup draws per chain=1000, total post-warmup draws=4000.
#>
#> mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
#> alpha 0.97 0.00 0.02 0.91 0.95 0.97 0.99 1.00 1526 1
#> sigma 0.32 0.00 0.02 0.28 0.30 0.32 0.33 0.37 2242 1
#> unc_mean -0.62 0.02 0.92 -2.11 -1.31 -0.73 -0.05 1.40 1575 1
#> constant -0.03 0.00 0.04 -0.14 -0.05 -0.02 0.00 0.02 1627 1
#> lp__ 57.57 0.04 1.36 54.06 56.93 57.93 58.58 59.15 1163 1
#>
#> Samples were drawn using NUTS(diag_e) at Wed Oct 25 10:32:18 2023.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split chains (at
#> convergence, Rhat=1).
estimation_constant
#> Inference for Stan model: anon_model.
#> 4 chains, each with iter=2000; warmup=1000; thin=1;
#> post-warmup draws per chain=1000, total post-warmup draws=4000.
#>
#> mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
#> constant -0.03 0.00 0.04 -0.15 -0.06 -0.02 0.00 0.02 543 1.01
#> alpha 0.97 0.00 0.02 0.91 0.95 0.97 0.98 1.00 533 1.01
#> sigma 0.32 0.00 0.02 0.28 0.30 0.32 0.34 0.37 1301 1.00
#> unc_mean -0.71 0.03 0.86 -2.11 -1.32 -0.82 -0.19 1.22 1190 1.00
#> lp__ 61.37 0.04 1.14 58.36 60.94 61.68 62.17 62.59 1045 1.00
#>
#> Samples were drawn using NUTS(diag_e) at Wed Oct 25 10:32:19 2023.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split chains (at
#> convergence, Rhat=1).
check_hmc_diagnostics(estimation_unc_mean)
#>
#> Divergences:
#> 0 of 4000 iterations ended with a divergence.
#>
#> Tree depth:
#> 0 of 4000 iterations saturated the maximum tree depth of 10.
#>
#> Energy:
#> E-BFMI indicated no pathological behavior.
check_hmc_diagnostics(estimation_constant)
#>
#> Divergences:
#> 150 of 4000 iterations ended with a divergence (3.75%).
#> Try increasing 'adapt_delta' to remove the divergences.
#>
#> Tree depth:
#> 0 of 4000 iterations saturated the maximum tree depth of 10.
#>
#> Energy:
#> E-BFMI indicated no pathological behavior.
pairs(estimation_unc_mean)
pairs(estimation_constant)