Divergences vs no divergences with different parametrizations of AR process

lpf · October 25, 2023, 8:59am

I’m working with time series models with latent parameters (e.g. stochastic volatility). These latent parameters are assumed to follow an AR process. Some of these may be highly persistent, but not random walks (as we don’t think they can grow unbounded when forecasting). Unfortunately, the length of these time series is also limited, although in the example I discuss, the issue remains even with longer time series (which I don’t show but you can check).

When trying to diagnose issues with the models, I went back to basics on the following. We can write an AR process as y_{t} = constant + \alpha y_{t-1} + \epsilon_{t}. However, it seems more intuitive to me to come up with priors for the unconditional mean than for the constant, as the unconditional mean is more of a scientific quantity, and the constant has very different meanings depending on \alpha. Thus, one can also write the AR process as y_{t} = (1-\alpha) * \bar{y} + \alpha y_{t-1} + \epsilon_{t} and provide priors.

In the following code, I implement both approaches. When using the first, with the constant, I add the Jacobian term (hopefully correctly) so that I can still put my prior on \bar{y} rather than on the constant. However, the number of divergences is much higher in the constant specification rather than in the unconditional mean specifications. This seems to be relatively robust in these toy examples I have played with. Do you have a good intuition for why that is the case? The issue improves with less persistent processes, and longer samples, but I would still like to get a grasp on why there is this difference.

Example:

library(rstan)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)

set.seed(20231016)

model_unc_mean <- stan_model(model_code = "
                                    data {
                                    int<lower=1> N;
                                    vector[N] y;
                                    }
                                    
                                    parameters {
                                    real<lower=-1,upper=1> alpha;
                                    real<lower=0> sigma;
                                    real unc_mean;
                                    }

                                    model {
                                    alpha ~ normal(0, 1);
                                    unc_mean ~ normal(0, 1);
                                    sigma ~ gamma(1,1);
                                    y[2:N] ~ normal((1-alpha) * unc_mean + alpha * y[1:(N-1)], sigma);
                                    }
                                    
                                    generated quantities {
                                    real constant = (1-alpha) * unc_mean;
                                    }
                                    ")

model_constant <- rstan::stan_model(model_code = "
                                    data {
                                    int<lower=1> N;
                                    vector[N] y;
                                    }
                                    
                                    parameters {
                                    real constant;
                                    real<lower=-1,upper=1> alpha;
                                    real<lower=0> sigma;
                                    }
                                    
                                    transformed parameters {
                                    real unc_mean = constant / (1-alpha);
                                    }
                                    
                                    model {
                                    // jacobian adjustment
                                    target += log(1 / (1-alpha));
                                    alpha ~ normal(0, 1);
                                    unc_mean ~ normal(0, 1);
                                    sigma ~ gamma(1,1);
                                    y[2:N] ~ normal((1-alpha) * unc_mean + alpha * y[1:(N-1)], sigma);
                                    }
                                    ")

generate_ar_series <- function(length, unc_mean, alpha, sigma) {
  # generate a few observations to burn in / stationarize
  N_burn_in <- 1000
  output <- vector(length = length + N_burn_in)
  output[1] <- unc_mean
  # add some burn in
  for (x in seq(2, length + N_burn_in)) {
    output[x] <- (1-alpha) * unc_mean + alpha * output[x-1] + rnorm(1, sd = sigma)
  }
  output <- output[(N_burn_in+1):(length+N_burn_in)]
  return(output)
}

example_series <- generate_ar_series(100, .2, .97, .3)

plot(seq_len(length(example_series)), example_series, type = "l")


estimation_unc_mean <- rstan::sampling(model_unc_mean, data = list(y = example_series, N = length(example_series)))
estimation_constant <- rstan::sampling(model_constant, data = list(y = example_series, N = length(example_series)))
#> Warning: There were 150 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them.
#> Warning: Examine the pairs() plot to diagnose sampling problems

estimation_unc_mean
#> Inference for Stan model: anon_model.
#> 4 chains, each with iter=2000; warmup=1000; thin=1; 
#> post-warmup draws per chain=1000, total post-warmup draws=4000.
#> 
#>           mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
#> alpha     0.97    0.00 0.02  0.91  0.95  0.97  0.99  1.00  1526    1
#> sigma     0.32    0.00 0.02  0.28  0.30  0.32  0.33  0.37  2242    1
#> unc_mean -0.62    0.02 0.92 -2.11 -1.31 -0.73 -0.05  1.40  1575    1
#> constant -0.03    0.00 0.04 -0.14 -0.05 -0.02  0.00  0.02  1627    1
#> lp__     57.57    0.04 1.36 54.06 56.93 57.93 58.58 59.15  1163    1
#> 
#> Samples were drawn using NUTS(diag_e) at Wed Oct 25 10:32:18 2023.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split chains (at 
#> convergence, Rhat=1).
estimation_constant
#> Inference for Stan model: anon_model.
#> 4 chains, each with iter=2000; warmup=1000; thin=1; 
#> post-warmup draws per chain=1000, total post-warmup draws=4000.
#> 
#>           mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
#> constant -0.03    0.00 0.04 -0.15 -0.06 -0.02  0.00  0.02   543 1.01
#> alpha     0.97    0.00 0.02  0.91  0.95  0.97  0.98  1.00   533 1.01
#> sigma     0.32    0.00 0.02  0.28  0.30  0.32  0.34  0.37  1301 1.00
#> unc_mean -0.71    0.03 0.86 -2.11 -1.32 -0.82 -0.19  1.22  1190 1.00
#> lp__     61.37    0.04 1.14 58.36 60.94 61.68 62.17 62.59  1045 1.00
#> 
#> Samples were drawn using NUTS(diag_e) at Wed Oct 25 10:32:19 2023.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split chains (at 
#> convergence, Rhat=1).

check_hmc_diagnostics(estimation_unc_mean)
#> 
#> Divergences:
#> 0 of 4000 iterations ended with a divergence.
#> 
#> Tree depth:
#> 0 of 4000 iterations saturated the maximum tree depth of 10.
#> 
#> Energy:
#> E-BFMI indicated no pathological behavior.
check_hmc_diagnostics(estimation_constant)
#> 
#> Divergences:
#> 150 of 4000 iterations ended with a divergence (3.75%).
#> Try increasing 'adapt_delta' to remove the divergences.
#> 
#> Tree depth:
#> 0 of 4000 iterations saturated the maximum tree depth of 10.
#> 
#> Energy:
#> E-BFMI indicated no pathological behavior.

pairs(estimation_unc_mean)

pairs(estimation_constant)

blackrabite · October 27, 2023, 11:06am

I am not sure but is it ok in the second model to put a prior on the transformed parameter “unc_mean” and not on the actual parameter “constant”?

lpf · October 27, 2023, 12:45pm

Thanks for the response.

As far as I understand, it is not a problem if one is careful. If the prior is on a transformed parameter, we may need to make a Jacobian adjustment (I found this to be an instructive example: Jacobian adjustments explained simply).

That’s what I’m doing in this line in the second model: target += log(1 / (1-alpha));

because unc_mean = constant / (1-alpha), so we need to adjust the target by log(abs(dy/dx)), which should be log(1/(1-alpha)) because I took the derivative with respect to the constant.

As I write this, I am wondering if there is any issue because alpha is also a parameter.

(Also, now I think I should have written log(abs(1 / (1-alpha))) but given \alpha \in (-1,1) in this case I think it doesn’t make a difference)

Topic		Replies	Views
Persistent divergent transitions in simple model Modeling fitting-issues	5	541	April 17, 2020
Hyperparameter: prior gets ignored, but setting hard value leads to non-convergence Modeling fitting-issues	2	390	August 25, 2023
Divergent transitions - possible to relate to certain parameters? Algorithms	4	1301	May 26, 2017
Constrained latent non-centered time series Modeling	6	1137	May 14, 2018
Problems remain after non-centered parametrization of correlated parameters Modeling	34	4038	July 3, 2019

Divergences vs no divergences with different parametrizations of AR process

Related topics