Hi everyone,
I am trying to do nowcasting by replicating the khakieconomics code. However, I get poor performance (at least I am not satisfied) in retrieving the model parameters (low ess_bulk
, high \hat{R}).
Moreover, since these are the first steps for a larger project (multivariate response, gap-filling, and so on…) I would ask your opinion if I should use Gaussian Dynamic Linear Model. If it’s the case, I am struggling to understand the notation used in the manual, and maybe someone could point me to some stan code to start with because I can’t acces directly Durbin&Koopman’s book.
However, back to the original point, I reproduced the original code to simulate the data:
remove(list = ls())
library(dplyr)
set.seed(2)
N = 300
z1 = rnorm(n = N, mean = 0, sd = 3)
z2 = rnorm(n = N, mean = 0, sd = 3)
x = rep(NA_real_, N)
x[1] = 1
for(i in 2:N) {
x[i] = x[i-1] + 0.4 * z1[i] + -0.3 * z2[i] + rnorm(n = 1, mean = 0, sd = 0.2)
}
time_step = seq(from = 0, length.out = N, by = 2)
freq = 28
y = x + rnorm(n = N, mean = 0, sd= 1)
y[!(1:N %% freq == 0)] = NA_real_
data = data.frame(TIME = time_step, Z1 = z1, Z2 = z2, X = x, Y = y)
y = y[!is.na(y)]
input = list(N1 = length(y), N2 = N, freq = freq, y = y, z1 = z1, z2 = z2)
library(cmdstanr)
library(posterior)
model = cmdstan_model("nowcasting.stan")
fit = model$sample(data = input, seed = 123, chains = 4, parallel_chains = 4, refresh = 100, max_treedepth = 10)
and revamped the Stan model syntax
data {
int N1; // length of low frequency series
int N2; // length of high frequency series
int freq; // every freq-th observation of the high frequency series we get an observation of the low frequency one
vector[N1] y;
vector[N2] z1;
vector[N2] z2;
}
parameters{
vector[2] gamma;
real<lower = 0> sigma_eta;
real<lower = 0> sigma_epsilon;
vector[N2] x;
}
model {
int count = 0;
x[1] ~ normal(1,1);
for (i in 2:N2) {
x[i] ~ normal(x[i-1] +z1[i] * gamma[1] + z2[i] * gamma[2], sigma_eta);
if(i%freq==0){
count += 1;
y[count] ~ normal(x[i], sigma_epsilon);
}
}
gamma ~ std_normal();
sigma_eta ~ cauchy(0, 1);
sigma_epsilon ~ cauchy(0, 1);
}
with these results:
Warning: 11 of 4000 (0.0%) transitions ended with a divergence.
This may indicate insufficient exploration of the posterior distribution.
Possible remedies include:
* Increasing adapt_delta closer to 1 (default is 0.8)
* Reparameterizing the model (e.g. using a non-centered parameterization)
* Using informative or weakly informative prior distributions
356 of 4000 (9.0%) transitions hit the maximum treedepth limit of 10 or 2^10-1 leapfrog steps.
Trajectories that are prematurely terminated due to this limit will result in slow exploration.
Increasing the max_treedepth limit can avoid this at the expense of more computation.
If increasing max_treedepth does not remove warnings, try to reparameterize the model.
> fit
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
lp__ 219.80 205.31 138.65 111.75 45.27 499.16 1.19 16 NA
gamma[1] 0.46 0.46 0.06 0.06 0.36 0.56 1.00 1460 1248
gamma[2] -0.30 -0.30 0.03 0.03 -0.35 -0.26 1.02 1555 1080
sigma_eta 0.32 0.31 0.12 0.12 0.11 0.52 1.18 16 NA
sigma_epsilon 0.85 0.78 0.56 0.57 0.10 1.83 1.03 103 485
x[1] 0.01 0.03 0.92 0.92 -1.52 1.48 1.00 2953 3410
x[2] 0.55 0.58 0.96 0.97 -1.05 2.07 1.00 2537 3248
x[3] 1.95 1.96 0.97 0.96 0.34 3.48 1.00 2311 2732
x[4] 2.09 2.10 1.05 1.05 0.36 3.71 1.01 2304 2822
x[5] 1.31 1.31 1.05 1.03 -0.45 2.94 1.01 2758 2806
# showing 10 of 305 rows (change via 'max_rows' argument or 'cmdstanr_max_rows' option)
Warning message:
NAs introduced by coercion
As ever, any help is appreciated!