Hi all, I’m working with count time series models (specifically, INGARCH). I’ve tried a lot of things to no avail, but maybe you all can help!
Mathematically, the very simplest version of INGARCH is a time-dependent Poisson distribution with a mean that depends on the previous mean and previous observation.
\lambda_t = \beta_0 + \alpha y_{t-1} + \delta \lambda_{t-1}
y_t| \lambda_t \sim Poi(\lambda_t)
Just as some background: As an observation-driven state space model, INGARCH is almost like an ARMA, except that the noise is nonlinear since it distributed as Poisson (or Negative Binomial). If it was normally distributed additive noise, it would be an ARMA! It’s not, and since variance is a function of the mean for Poisson and Negative Binomial, it ends up being a GARCH. The general form, with exogenous covariates x, looks like this:
\lambda_t = \beta_0 + \sum^p_{l=1}\alpha_k y_{t-k} + \sum^q_{l=1}\delta_l \lambda_{t-l} + \sum^J_{j=1} \beta_j x_{tj}
y_t| \lambda_t \sim Poi(\lambda_t) or NegBin(\lambda_t,\phi)
When p=1 and q=1, the model convergences and fits correctly, predicting out-of-sample data very well (as shown above). As soon as we allow second-order autoregression, the Stan model falls apart and I can’t figure out why (it compiles, but does not coverge). I’ve attached the simulation code ingarch(1,1)_analysis.R (3.2 KB), and the Stan for the working p=1 and q=1 version is here:
data {
int<lower=1> K; // Number of covariates
int<lower=1> N_train; // Number of time points
matrix[N_train, K] X_train; // Covariate matrix
array[N_train] int y_train; // Observed counts
int<lower=1> N_test; // Number of test time points
matrix[N_test, K] X_test; // Covariate matrix for test data
array[N_test] int y_test;
}
parameters {
vector[K] beta; // Regression coefficients for covariates
real<lower=0,upper=1> alpha; // Weight on previous count
real<lower=0,upper=1> delta; // Weight on previous intensity
real<lower=0> phi; // NB dispersion parameter
}
transformed parameters {
vector[N_train] lambda; // Time-varying intensity
// Initial intensity: using stationary mean given the covariates at t=1
lambda[1] = exp(dot_product(X_train[1], beta)) / (1 - alpha - delta);
// Recursive definition of lambda
for (t in 2:N_train) {
lambda[t] = exp(dot_product(X_train[t], beta)) + alpha * y_train[t - 1] + delta * lambda[t - 1];
}
}
model {
// Enforce stationarity
if (alpha + delta >= 1)
target += negative_infinity();
// Priors
beta ~ normal(0, 5);
alpha ~ beta(2, 2);
delta ~ beta(2, 2);
phi ~ gamma(2, 0.1);
// Likelihood: Poisson likelihood using the computed intensities
for (t in 1:N_train)
// variance = mu + mu^2/phi_nb.
y_train[t] ~ neg_binomial_2(lambda[t], phi);
}
generated quantities {
// In-sample posterior predictive checks:
array[N_train] int y_rep; // Replicate for training data
vector[N_train] log_lik; // Pointwise log-likelihood for training data
for (t in 1:N_train) {
y_rep[t] = neg_binomial_2_rng(lambda[t], phi); // Draw a replicate count
log_lik[t] = neg_binomial_2_lpmf(y_train[t] | lambda[t], phi); // Compute log likelihood for y_train[t]
}
// Out-of-sample predictions for test data:
array[N_test] int y_test_rep; // Predicted counts for test data
vector[N_test] lambda_test; // Predicted intensities for test data
// For test data, use the last training observations as initial conditions:
lambda_test[1] = exp(dot_product(X_test[1], beta)) + alpha * y_train[N_train] + delta * lambda[N_train];
y_test_rep[1] = neg_binomial_2_rng(lambda_test[1], phi);
for (t in 2:N_test) {
lambda_test[t] = exp(dot_product(X_test[t], beta)) + alpha * y_test[t - 1] + delta * lambda_test[t - 1];
y_test_rep[t] = neg_binomial_2_rng(lambda_test[t], phi);
}
}
I’ve made the general form by looping over p and q, but the simplified version with explicit p=3 and q=3 is better for debugging. Simulations for this work as well (given small parameters to prevent blow-up)!
ingarch(3,3)_analysis.R (5.4 KB)
But the Stan model does not.
The main four things I’ve tried to do are:
- Log parametrization to prevent blow-up
- Extremely strong priors to get convergence
- Explicit initialization (in R)
- Unit root safeguards (sum_ar variable)
The errors I get are:
Chain 4 Rejecting initial value:
Chain 4 Error evaluating the log probability at the initial value.
Chain 4 Exception: neg_binomial_2_lpmf: Location parameter is -0.499742, but must be positive finite! (in 'C:/Users/<redacted>', line 76, column 4 to column 48)
Chain 4 Exception: neg_binomial_2_lpmf: Location parameter is -0.499742, but must be positive finite! (in 'C:/Users/<redacted>', line 76, column 4 to column 48)
Chain 4 Initialization between (-2, 2) failed after 100 attempts.
Chain 4 Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
Chain 4 Initialization failed.
Warning: Chain 1 finished unexpectedly!
Warning: Chain 2 finished unexpectedly!
Warning: Chain 3 finished unexpectedly!
Warning: Chain 4 finished unexpectedly!
> read_cmdstan_csv()
Error in assert_file_exists(files, access = "r", extension = "csv") :
argument "files" is missing, with no default
In addition: Warning messages:
1: In doTryCatch(return(expr), name, parentenv, handler) :
display list redraw incomplete
2: In doTryCatch(return(expr), name, parentenv, handler) :
invalid graphics state
3: In doTryCatch(return(expr), name, parentenv, handler) :
invalid graphics state
data {
int<lower=1> K; // Number of covariates
int<lower=1> N_train; // Number of training time points
matrix[N_train, K] X_train; // Covariate matrix for training data
array[N_train] int y_train; // Observed training counts
int<lower=1> N_test; // Number of test time points
matrix[N_test, K] X_test; // Covariate matrix for test data
array[N_test] int y_test; // (True test counts; used for rolling forecasts)
vector<lower=0>[3] alpha_rate;
vector<lower=0>[3] delta_rate;
}
parameters {
vector[K] beta; // Regression coefficients
// Fixed to three lags:
real<lower=0, upper=1> alpha1; // Lag 1 weight for counts
real<lower=0, upper=1> alpha2; // Lag 2 weight for counts
real<lower=0, upper=1> alpha3; // Lag 3 weight for counts
real<lower=0, upper=1> delta1; // Lag 1 weight for intensity
real<lower=0, upper=1> delta2; // Lag 2 weight for intensity
real<lower=0, upper=1> delta3; // Lag 3 weight for intensity
real<lower=0> phi; // NB dispersion parameter
}
transformed parameters {
vector[N_train] lambda; // Time-varying intensity for training data
real sum_ar = alpha1 + alpha2 + alpha3 + delta1 + delta2 + delta3;
// For t = 1, use stationary initialization:
lambda[1] = exp(dot_product(X_train[1], beta)) / (1 - (alpha1 + alpha2 + alpha3) - (delta1 + delta2 + delta3));
// For t = 2, use a one‐step recursion (only lag 1 available):
if (N_train >= 2)
lambda[2] = exp(dot_product(X_train[2], beta))
+ alpha1 * y_train[1]
+ delta1 * lambda[1];
// For t = 3, use lags 1 and 2:
if (N_train >= 3)
lambda[3] = exp(dot_product(X_train[3], beta))
+ alpha1 * y_train[2] + alpha2 * y_train[1]
+ delta1 * lambda[2] + delta2 * lambda[1];
// For t >= 4, unroll the three lags explicitly:
for (t in 4:N_train) {
lambda[t] = exp(dot_product(X_train[t], beta))
+ alpha1 * y_train[t-1]
+ alpha2 * y_train[t-2]
+ alpha3 * y_train[t-3]
+ delta1 * lambda[t-1]
+ delta2 * lambda[t-2]
+ delta3 * lambda[t-3];
}
}
model {
// Enforce stationarity: sum of all AR weights must be less than 1.
if (sum_ar >= 1)
target += negative_infinity();
// Priors:
beta ~ normal(0, 5);
alpha1 ~ exponential(alpha_rate[1]);
alpha2 ~ exponential(alpha_rate[2]);
alpha3 ~ exponential(alpha_rate[3]);
delta1 ~ exponential(delta_rate[1]);
delta2 ~ exponential(delta_rate[2]);
delta3 ~ exponential(delta_rate[3]);
phi ~ gamma(2, 0.1);
//phi ~ normal(10,10000);
// Likelihood: Negative binomial likelihood for training data
for (t in 1:N_train)
y_train[t] ~ neg_binomial_2(lambda[t], phi);
}
generated quantities {
// In-sample posterior predictive checks:
array[N_train] int y_rep; // Replicates for training data
vector[N_train] log_lik; // Pointwise log-likelihood for training data
for (t in 1:N_train) {
y_rep[t] = neg_binomial_2_rng(lambda[t], phi);
log_lik[t] = neg_binomial_2_lpmf(y_train[t] | lambda[t], phi);
}
// Out-of-sample predictions for test data using a rolling forecast
array[N_test] int y_test_rep; // Predicted counts for test data
vector[N_test] lambda_test; // Predicted intensities for test data
// Initialization for test period (t = 1): use last three training observations
{
lambda_test[1] = exp(dot_product(X_test[1], beta))
+ alpha1 * y_train[N_train]
+ alpha2 * y_train[N_train - 1]
+ alpha3 * y_train[N_train - 2]
+ delta1 * lambda[N_train]
+ delta2 * lambda[N_train - 1]
+ delta3 * lambda[N_train - 2];
y_test_rep[1] = neg_binomial_2_rng(lambda_test[1], phi);
}
// For t = 2, use true test observation from t = 1 (rolling forecast)
if (N_test >= 2) {
lambda_test[2] = exp(dot_product(X_test[2], beta))
+ alpha1 * y_test[1] // using true y_test for rolling forecast
+ alpha2 * y_train[N_train]
+ alpha3 * y_train[N_train - 1]
+ delta1 * lambda_test[1]
+ delta2 * lambda[N_train]
+ delta3 * lambda[N_train - 1];
y_test_rep[2] = neg_binomial_2_rng(lambda_test[2], phi);
}
// For t = 3, use true test observations for available lags:
if (N_test >= 3) {
lambda_test[3] = exp(dot_product(X_test[3], beta))
+ alpha1 * y_test[2]
+ alpha2 * y_test[1]
+ alpha3 * y_train[N_train]
+ delta1 * lambda_test[2]
+ delta2 * lambda_test[1]
+ delta3 * lambda[N_train];
y_test_rep[3] = neg_binomial_2_rng(lambda_test[3], phi);
}
// For t >= 4, unroll the three lags explicitly using true test data (rolling forecast)
for (t in 4:N_test) {
lambda_test[t] = exp(dot_product(X_test[t], beta))
+ alpha1 * y_test[t-1]
+ alpha2 * y_test[t-2]
+ alpha3 * y_test[t-3]
+ delta1 * lambda_test[t-1]
+ delta2 * lambda_test[t-2]
+ delta3 * lambda_test[t-3];
y_test_rep[t] = neg_binomial_2_rng(lambda_test[t], phi);
}
}
Thanks for reading! Any and all help is appreciated