Count Time Series | INGARCH(1,1) Fits | INGARCH(2,2) Does Not

JaredWinslow · March 12, 2025, 5:43pm

Hi all, I’m working with count time series models (specifically, INGARCH). I’ve tried a lot of things to no avail, but maybe you all can help!

Mathematically, the very simplest version of INGARCH is a time-dependent Poisson distribution with a mean that depends on the previous mean and previous observation.

\lambda_t = \beta_0 + \alpha y_{t-1} + \delta \lambda_{t-1}

y_t| \lambda_t \sim Poi(\lambda_t)

Just as some background: As an observation-driven state space model, INGARCH is almost like an ARMA, except that the noise is nonlinear since it distributed as Poisson (or Negative Binomial). If it was normally distributed additive noise, it would be an ARMA! It’s not, and since variance is a function of the mean for Poisson and Negative Binomial, it ends up being a GARCH. The general form, with exogenous covariates x, looks like this:

\lambda_t = \beta_0 + \sum^p_{l=1}\alpha_k y_{t-k} + \sum^q_{l=1}\delta_l \lambda_{t-l} + \sum^J_{j=1} \beta_j x_{tj}

y_t| \lambda_t \sim Poi(\lambda_t) or NegBin(\lambda_t,\phi)

When p=1 and q=1, the model convergences and fits correctly, predicting out-of-sample data very well (as shown above). As soon as we allow second-order autoregression, the Stan model falls apart and I can’t figure out why (it compiles, but does not coverge). I’ve attached the simulation code ingarch(1,1)_analysis.R (3.2 KB), and the Stan for the working p=1 and q=1 version is here:

data {
  int<lower=1> K;           // Number of covariates
  int<lower=1> N_train;           // Number of time points
  matrix[N_train, K] X_train;           // Covariate matrix
  array[N_train] int y_train;           // Observed counts
  int<lower=1> N_test;      // Number of test time points
  matrix[N_test, K] X_test; // Covariate matrix for test data
  array[N_test] int y_test;
  
}

parameters {
  vector[K] beta;           // Regression coefficients for covariates
  real<lower=0,upper=1> alpha;  // Weight on previous count
  real<lower=0,upper=1> delta;  // Weight on previous intensity
  real<lower=0> phi;    // NB dispersion parameter
}

transformed parameters {
  vector[N_train] lambda;         // Time-varying intensity
  // Initial intensity: using stationary mean given the covariates at t=1
  lambda[1] = exp(dot_product(X_train[1], beta)) / (1 - alpha - delta);
  
  // Recursive definition of lambda
  for (t in 2:N_train) {
    lambda[t] = exp(dot_product(X_train[t], beta)) + alpha * y_train[t - 1] + delta * lambda[t - 1];
  }
}

model {
  // Enforce stationarity
  if (alpha + delta >= 1)
    target += negative_infinity();
  
  // Priors
  beta ~ normal(0, 5);
  alpha ~ beta(2, 2);
  delta ~ beta(2, 2);
  phi ~ gamma(2, 0.1);
  
  // Likelihood: Poisson likelihood using the computed intensities
  for (t in 1:N_train)
   // variance = mu + mu^2/phi_nb.
    y_train[t] ~ neg_binomial_2(lambda[t], phi);
}

generated quantities {
  // In-sample posterior predictive checks:
  array[N_train] int y_rep;   // Replicate for training data
  vector[N_train] log_lik;    // Pointwise log-likelihood for training data
  
  for (t in 1:N_train) {
    y_rep[t] = neg_binomial_2_rng(lambda[t], phi);       // Draw a replicate count
    log_lik[t] = neg_binomial_2_lpmf(y_train[t] | lambda[t], phi); // Compute log likelihood for y_train[t]
  }
  
  // Out-of-sample predictions for test data:
  array[N_test] int y_test_rep;  // Predicted counts for test data
  vector[N_test] lambda_test;    // Predicted intensities for test data
  
  // For test data, use the last training observations as initial conditions:
  lambda_test[1] = exp(dot_product(X_test[1], beta)) + alpha * y_train[N_train] + delta * lambda[N_train];
  y_test_rep[1] = neg_binomial_2_rng(lambda_test[1], phi);
  
  for (t in 2:N_test) {
    lambda_test[t] = exp(dot_product(X_test[t], beta)) + alpha * y_test[t - 1] + delta * lambda_test[t - 1];
    y_test_rep[t] = neg_binomial_2_rng(lambda_test[t], phi);
  }
}

I’ve made the general form by looping over p and q, but the simplified version with explicit p=3 and q=3 is better for debugging. Simulations for this work as well (given small parameters to prevent blow-up)!
ingarch(3,3)_analysis.R (5.4 KB)
But the Stan model does not.

The main four things I’ve tried to do are:

Log parametrization to prevent blow-up
Extremely strong priors to get convergence
Explicit initialization (in R)
Unit root safeguards (sum_ar variable)

The errors I get are:

Chain 4 Rejecting initial value:
Chain 4   Error evaluating the log probability at the initial value.
Chain 4 Exception: neg_binomial_2_lpmf: Location parameter is -0.499742, but must be positive finite! (in 'C:/Users/<redacted>', line 76, column 4 to column 48)
Chain 4 Exception: neg_binomial_2_lpmf: Location parameter is -0.499742, but must be positive finite! (in 'C:/Users/<redacted>', line 76, column 4 to column 48)
Chain 4 Initialization between (-2, 2) failed after 100 attempts. 
Chain 4  Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
Chain 4 Initialization failed.
Warning: Chain 1 finished unexpectedly!

Warning: Chain 2 finished unexpectedly!

Warning: Chain 3 finished unexpectedly!

Warning: Chain 4 finished unexpectedly!

> read_cmdstan_csv()
Error in assert_file_exists(files, access = "r", extension = "csv") : 
  argument "files" is missing, with no default
In addition: Warning messages:
1: In doTryCatch(return(expr), name, parentenv, handler) :
  display list redraw incomplete
2: In doTryCatch(return(expr), name, parentenv, handler) :
  invalid graphics state
3: In doTryCatch(return(expr), name, parentenv, handler) :
  invalid graphics state

data {
  int<lower=1> K;                 // Number of covariates
  int<lower=1> N_train;           // Number of training time points
  matrix[N_train, K] X_train;     // Covariate matrix for training data
  array[N_train] int y_train;     // Observed training counts
  int<lower=1> N_test;            // Number of test time points
  matrix[N_test, K] X_test;       // Covariate matrix for test data
  array[N_test] int y_test;       // (True test counts; used for rolling forecasts)

  vector<lower=0>[3] alpha_rate;
  vector<lower=0>[3] delta_rate;
}

parameters {
  vector[K] beta;                         // Regression coefficients
  // Fixed to three lags:
  real<lower=0, upper=1> alpha1;            // Lag 1 weight for counts
  real<lower=0, upper=1> alpha2;            // Lag 2 weight for counts
  real<lower=0, upper=1> alpha3;            // Lag 3 weight for counts
  real<lower=0, upper=1> delta1;            // Lag 1 weight for intensity
  real<lower=0, upper=1> delta2;            // Lag 2 weight for intensity
  real<lower=0, upper=1> delta3;            // Lag 3 weight for intensity
  real<lower=0> phi;                        // NB dispersion parameter
}

transformed parameters {
  vector[N_train] lambda;  // Time-varying intensity for training data
  real sum_ar = alpha1 + alpha2 + alpha3 + delta1 + delta2 + delta3;
  
  
  // For t = 1, use stationary initialization:
  lambda[1] = exp(dot_product(X_train[1], beta)) / (1 - (alpha1 + alpha2 + alpha3) - (delta1 + delta2 + delta3));
  
  // For t = 2, use a one‐step recursion (only lag 1 available):
  if (N_train >= 2)
    lambda[2] = exp(dot_product(X_train[2], beta))
                + alpha1 * y_train[1]
                + delta1 * lambda[1];
  
  // For t = 3, use lags 1 and 2:
  if (N_train >= 3)
    lambda[3] = exp(dot_product(X_train[3], beta))
                + alpha1 * y_train[2] + alpha2 * y_train[1]
                + delta1 * lambda[2] + delta2 * lambda[1];
  
  // For t >= 4, unroll the three lags explicitly:
  for (t in 4:N_train) {
    lambda[t] = exp(dot_product(X_train[t], beta))
                + alpha1 * y_train[t-1]
                + alpha2 * y_train[t-2]
                + alpha3 * y_train[t-3]
                + delta1 * lambda[t-1]
                + delta2 * lambda[t-2]
                + delta3 * lambda[t-3];
  }
}

model {
  // Enforce stationarity: sum of all AR weights must be less than 1.
  if (sum_ar >= 1)
    target += negative_infinity();
  
  // Priors:
  beta ~ normal(0, 5);
  alpha1 ~ exponential(alpha_rate[1]);
  alpha2 ~ exponential(alpha_rate[2]);
  alpha3 ~ exponential(alpha_rate[3]);
  delta1 ~ exponential(delta_rate[1]);
  delta2 ~ exponential(delta_rate[2]);
  delta3 ~ exponential(delta_rate[3]);
  phi ~ gamma(2, 0.1);
  //phi ~ normal(10,10000);
  
  // Likelihood: Negative binomial likelihood for training data
  for (t in 1:N_train) 
    y_train[t] ~ neg_binomial_2(lambda[t], phi);
}

generated quantities {
  // In-sample posterior predictive checks:
  array[N_train] int y_rep;    // Replicates for training data
  vector[N_train] log_lik;     // Pointwise log-likelihood for training data
  for (t in 1:N_train) {
    y_rep[t] = neg_binomial_2_rng(lambda[t], phi);
    log_lik[t] = neg_binomial_2_lpmf(y_train[t] | lambda[t], phi);
  }
  
  // Out-of-sample predictions for test data using a rolling forecast
  array[N_test] int y_test_rep;   // Predicted counts for test data
  vector[N_test] lambda_test;     // Predicted intensities for test data
  
  // Initialization for test period (t = 1): use last three training observations
  {
    lambda_test[1] = exp(dot_product(X_test[1], beta))
                     + alpha1 * y_train[N_train]
                     + alpha2 * y_train[N_train - 1]
                     + alpha3 * y_train[N_train - 2]
                     + delta1 * lambda[N_train]
                     + delta2 * lambda[N_train - 1]
                     + delta3 * lambda[N_train - 2];
    y_test_rep[1] = neg_binomial_2_rng(lambda_test[1], phi);
  }
  
  // For t = 2, use true test observation from t = 1 (rolling forecast)
  if (N_test >= 2) {
    lambda_test[2] = exp(dot_product(X_test[2], beta))
                     + alpha1 * y_test[1]      // using true y_test for rolling forecast
                     + alpha2 * y_train[N_train]
                     + alpha3 * y_train[N_train - 1]
                     + delta1 * lambda_test[1]
                     + delta2 * lambda[N_train]
                     + delta3 * lambda[N_train - 1];
    y_test_rep[2] = neg_binomial_2_rng(lambda_test[2], phi);
  }
  
  // For t = 3, use true test observations for available lags:
  if (N_test >= 3) {
    lambda_test[3] = exp(dot_product(X_test[3], beta))
                     + alpha1 * y_test[2]
                     + alpha2 * y_test[1]
                     + alpha3 * y_train[N_train]
                     + delta1 * lambda_test[2]
                     + delta2 * lambda_test[1]
                     + delta3 * lambda[N_train];
    y_test_rep[3] = neg_binomial_2_rng(lambda_test[3], phi);
  }
  
  // For t >= 4, unroll the three lags explicitly using true test data (rolling forecast)
  for (t in 4:N_test) {
    lambda_test[t] = exp(dot_product(X_test[t], beta))
                     + alpha1 * y_test[t-1]
                     + alpha2 * y_test[t-2]
                     + alpha3 * y_test[t-3]
                     + delta1 * lambda_test[t-1]
                     + delta2 * lambda_test[t-2]
                     + delta3 * lambda_test[t-3];
    y_test_rep[t] = neg_binomial_2_rng(lambda_test[t], phi);
  }
}

Thanks for reading! Any and all help is appreciated

Bob_Carpenter · March 12, 2025, 8:19pm

Hi, @JaredWinslow, and welcome to the Stan forums.

Is this with real data or simulated data from the model? If a model’s badly misspecified and complex, it can have trouble fitting (a simple linear regression might be misspecified, but it’s usually easy to fit).

Chain 4 Initialization failed.

This basically means you’re giving it parameter values that are not in support, so it can’t even get off the ground. I gave you some general advice until I found this, which is almost certainly your problem:

This is probably the root of your problem:

```stan
  real sum_ar = alpha1 + alpha2 + alpha3 + delta1 + delta2 + delta3;
  ...
  if (sum_ar >= 1)
    target += negative_infinity();

The six values are initialized roughly uniformly within (0, 1), so are very likely to overflow on initialization, which gives you an initial -infinity density, which fails.

If you want to keep that sum under zero, you can do this:

parameters {
  simplex[7] theta;
  ...
transformed parameters {
  vector[3] alpha = theta[1:3];
  vector[3] delta = theta[4:6];

I took the liberty of converting to vectors. This way, 0 < sum(alpha) + sum(delta) < 1 by construction and they all get an implicit uniform prior over the simplex (which is not marginally uniform). So it does what you were trying to do, I think. But is that right? Is that either necessary or sufficient for making the whole process stationary?

When you fi the model, how close is lambda[t] to y_train[t]? It looks like you’re fitting a lambda for each observation, which means they should get very very close and wind up trying to drive phi down to zero. When you fix the above constraint problem, this may cause another failure.

I would also start by removing all the generated quantities—if those fail, it takes down the whole iteration and you want to get everything else working first.

nicholasjclark · March 13, 2025, 1:16am

I wrote the {mvgam} R package because I was also struggling with this kind of problem. I tried many different parameterizations but ultimately found that forecasts will often blow up (even after logging) in these autoregressive count time series models. There are the added challenges of how to deal with missing observations and how to deal properly with zeros. What {mvgam} does is to formulate State-Space models so that the temporal dynamics occur within a hidden “state” process that is unconstrained and real-valued. This allows for many types of observation processes through appropriate link functions (i.e. using a Negative Binomial with log link in this case) and makes it easy to enforce stationarity of the state process. It can also readily deal with missing values in the observations, zeros, overdispersion etc…

Below I show a brief reprex using your simulation code, but I only fit an AR(1) for the temporal dynamics rather than trying the moving average ARMA process. You will see in the outputs how challenging it can be to simultaneously estimate autoregressive coefficients, dispersion parameters and latent state process error parameters in the same model (these can all “compete” to capture the extra dispersion in the data). More careful priors would be needed in this example. Obviously you don’t need to use the package for your analysis if it can’t handle the exact model(s) that you are interested in, but it might be useful to autogenerate the relevant Stan code and data objects that you could then modify to suit your specific needs. This code is fairly optimized for performance, but that unfortunately does make it harder to read and digest. But hopefully it can help in some way.

library(cmdstanr)
library(posterior)
library(tidyverse)
library(mvgam)

set.seed(123)
# Number of days to simulate (changed to 500 for training and
# 50 for testing just so it is easier to visualise model predictions)
N_train <- 500
N_test <- 50
K <- 4

# Simulate 3 external covariates
X_train <- cbind(
  1, 
  matrix(rnorm(N_train * (K - 1)), 
         nrow = N_train, ncol = K - 1)
)
X_test <- cbind(
  1, 
  matrix(rnorm(N_test * (K - 1)), 
         nrow = N_test, ncol = K - 1)
)

# True parameters for the covariate effects
beta <- c(1, 0.5, -0.3, 0.7)

# Autoregressive parameters (ensure alpha + delta < 1)
alpha <- 0.3 # influence of previous day's count
delta <- 0.2 # influence of previous day's intensity
phi <- 0.5 # nb dispersion parameter

# Initialize vectors for intensity and counts and covariates
lambda_train <- numeric(N_train)
y_train <- numeric(N_train)
lin_pred_train <- X_train %*% beta

# Initialize the first day's intensity using the stationary mean:
# Here, we set lambda[1] = exp(lin_pred[1]) / (1 - alpha - delta)
lambda_train[1] <- exp(lin_pred_train[1]) / 
  (1 - alpha - delta)
y_train[1] <- rnbinom(1, 
                      mu = lambda_train[1], 
                      size = phi)

# Simulate the remaining days recursively
for (t in 2:N_train) {
  lambda_train[t] <- exp(lin_pred_train[t]) + 
    alpha * y_train[t - 1] + 
    delta * lambda_train[t - 1]
  y_train[t] <- rnbinom(1, 
                        mu = lambda_train[t], 
                        size = phi)
}

# Do the same for the test data
lin_pred_test <- X_test %*% beta
lambda_test <- numeric(N_test)
y_test <- numeric(N_test)
lambda_test[1] <- exp(lin_pred_test[1]) + 
  alpha * y_train[N_train] + 
  delta * lambda_train[N_train]
y_test[1] <- rnbinom(1, 
                     mu = lambda_test[1], 
                     size = phi)
for (t in 2:N_test) {
  lambda_test[t] <- exp(lin_pred_test[t]) + 
    alpha * y_test[t - 1] + 
    delta * lambda_test[t - 1]
  y_test[t] <- rnbinom(1, 
                       mu = lambda_test[t], 
                       size = phi)
}

# Configure data into data.frames (column of ones not needed)
data_train <- cbind(
  data.frame(
    y = y_train,
    time = 1:N_train
  ),
  as.data.frame(X_train[, -1])
)

data_test <- cbind(
  data.frame(
    y = y_test,
    time = (N_train + 1):(N_train + N_test)
  ),
  as.data.frame(X_test[, -1])
)
colnames(data_test) <- 
  colnames(data_train) <- 
  c("y", "time", "cov1", "cov2", "cov3")

# Plot features of the series
plot_mvgam_series(
  data = data_train,
  newdata = data_test
)

# Configure a State-Space model using mvgam()
mod <- mvgam(
  # Observation formula empty as we don't have
  # any covariates that influence observation error
  formula = y ~ -1,

  # Latent process model is where the covariate effects
  # and the temporal dynamics happen; intercept is implicit
  trend_formula = ~
    cov1 + cov2 + cov3,

  # Noncentred AR(1) process for the state dynamics; learning both
  # phi and the moving average coefficient theta will be
  # very difficult
  trend_model = AR(p = 1),
  noncentred = TRUE,

  # Regularising prior on regression coefficients
  priors = prior(
    std_normal(),
    class = b
  ),

  # Training data, testing data and observation family
  data = data_train,
  newdata = data_test,
  family = nb(),

  # Use cmdstan for efficiency
  backend = "cmdstanr",
)
#> Compiling Stan program using cmdstanr
#> Start sampling
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 3 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 4 Iteration:   1 / 1000 [  0%]  (Warmup) 
#> Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 3 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 4 Iteration: 100 / 1000 [ 10%]  (Warmup) 
#> Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 3 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 4 Iteration: 200 / 1000 [ 20%]  (Warmup) 
#> Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 3 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 4 Iteration: 300 / 1000 [ 30%]  (Warmup) 
#> Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 3 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 4 Iteration: 400 / 1000 [ 40%]  (Warmup) 
#> Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 4 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 4 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 3 Iteration: 500 / 1000 [ 50%]  (Warmup) 
#> Chain 3 Iteration: 501 / 1000 [ 50%]  (Sampling) 
#> Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 4 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 3 Iteration: 600 / 1000 [ 60%]  (Sampling) 
#> Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 4 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 3 Iteration: 700 / 1000 [ 70%]  (Sampling) 
#> Chain 4 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 4 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 3 Iteration: 800 / 1000 [ 80%]  (Sampling) 
#> Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 4 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 1 finished in 3.8 seconds.
#> Chain 4 finished in 3.5 seconds.
#> Chain 3 Iteration: 900 / 1000 [ 90%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 2 finished in 4.0 seconds.
#> Chain 3 Iteration: 1000 / 1000 [100%]  (Sampling) 
#> Chain 3 finished in 4.4 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 3.9 seconds.
#> Total execution time: 4.7 seconds.

# Autoconfigured Stan code
stancode(mod)
#> // Stan model code generated by package mvgam
#> functions {
#>   vector rep_each(vector x, int K) {
#>     int N = rows(x);
#>     vector[N * K] y;
#>     int pos = 1;
#>     for (n in 1 : N) {
#>       for (k in 1 : K) {
#>         y[pos] = x[n];
#>         pos += 1;
#>       }
#>     }
#>     return y;
#>   }
#> }
#> data {
#>   int<lower=0> total_obs; // total number of observations
#>   int<lower=0> n; // number of timepoints per series
#>   int<lower=0> n_lv; // number of dynamic factors
#>   int<lower=0> n_series; // number of series
#>   matrix[n_series, n_lv] Z; // matrix mapping series to latent states
#>   int<lower=0> num_basis; // total number of basis coefficients
#>   int<lower=0> num_basis_trend; // number of trend basis coefficients
#>   matrix[total_obs, num_basis] X; // mgcv GAM design matrix
#>   matrix[n * n_lv, num_basis_trend] X_trend; // trend model design matrix
#>   array[n, n_series] int<lower=0> ytimes; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
#>   array[n, n_lv] int ytimes_trend;
#>   int<lower=0> n_nonmissing; // number of nonmissing observations
#>   array[n_nonmissing] int<lower=0> flat_ys; // flattened nonmissing observations
#>   matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
#>   array[n_nonmissing] int<lower=0> obs_ind; // indices of nonmissing observations
#> }
#> transformed data {
#>   
#> }
#> parameters {
#>   // raw basis coefficients
#>   vector[num_basis] b_raw;
#>   vector[num_basis_trend] b_raw_trend;
#>   
#>   // latent state SD terms
#>   vector<lower=0>[n_lv] sigma;
#>   
#>   // negative binomial overdispersion
#>   vector<lower=0>[n_series] phi_inv;
#>   
#>   // latent state AR1 terms
#>   vector<lower=-1, upper=1>[n_lv] ar1;
#>   
#>   // dynamic factors
#>   matrix[n, n_lv] LV_raw;
#> }
#> transformed parameters {
#>   // raw latent states
#>   vector[n * n_lv] trend_mus;
#>   matrix[n, n_series] trend;
#>   
#>   // basis coefficients
#>   vector[num_basis] b;
#>   
#>   // latent states
#>   matrix[n, n_lv] LV;
#>   vector[num_basis_trend] b_trend;
#>   
#>   // observation model basis coefficients
#>   b[1 : num_basis] = b_raw[1 : num_basis];
#>   
#>   // (Intercept) fixed at zero
#>   b[1] = 0;
#>   
#>   // process model basis coefficients
#>   b_trend[1 : num_basis_trend] = b_raw_trend[1 : num_basis_trend];
#>   
#>   // latent process linear predictors
#>   trend_mus = X_trend * b_trend;
#>   LV = LV_raw .* rep_matrix(sigma', rows(LV_raw));
#>   for (j in 1 : n_lv) {
#>     LV[1, j] += trend_mus[ytimes_trend[1, j]];
#>     for (i in 2 : n) {
#>       LV[i, j] += trend_mus[ytimes_trend[i, j]]
#>                   + ar1[j] * (LV[i - 1, j] - trend_mus[ytimes_trend[i - 1, j]]);
#>     }
#>   }
#>   
#>   // derived latent states
#>   for (i in 1 : n) {
#>     for (s in 1 : n_series) {
#>       trend[i, s] = dot_product(Z[s,  : ], LV[i,  : ]);
#>     }
#>   }
#> }
#> model {
#>   // prior for (Intercept)...
#>   b_raw[1] ~ student_t(3, 0.7, 2.5);
#>   
#>   // priors for AR parameters
#>   ar1 ~ std_normal();
#>   
#>   // priors for overdispersion parameters
#>   phi_inv ~ student_t(3, 0, 0.1);
#>   
#>   // priors for latent state SD parameters
#>   sigma ~ inv_gamma(1.418, 0.452);
#>   to_vector(LV_raw) ~ std_normal();
#>   
#>   // dynamic process models
#>   
#>   // prior for (Intercept)_trend...
#>   b_raw_trend[1] ~ student_t(3, 0, 2);
#>   
#>   // prior for cov1_trend...
#>   b_raw_trend[2] ~ std_normal();
#>   
#>   // prior for cov2_trend...
#>   b_raw_trend[3] ~ std_normal();
#>   
#>   // prior for cov3_trend...
#>   b_raw_trend[4] ~ std_normal();
#>   {
#>     // likelihood functions
#>     vector[n_nonmissing] flat_trends;
#>     array[n_nonmissing] real flat_phis;
#>     flat_trends = to_vector(trend)[obs_ind];
#>     flat_phis = to_array_1d(rep_each(phi_inv, n)[obs_ind]);
#>     flat_ys ~ neg_binomial_2(exp(append_col(flat_xs, flat_trends)
#>                                  * append_row(b, 1.0)),
#>                              inv(flat_phis));
#>   }
#> }
#> generated quantities {
#>   vector[total_obs] eta;
#>   matrix[n, n_series] mus;
#>   vector[n_lv] penalty;
#>   array[n, n_series] int ypred;
#>   matrix[n, n_series] phi_vec;
#>   vector[n_series] phi;
#>   phi = inv(phi_inv);
#>   for (s in 1 : n_series) {
#>     phi_vec[1 : n, s] = rep_vector(phi[s], n);
#>   }
#>   penalty = 1.0 / (sigma .* sigma);
#>   
#>   matrix[n_series, n_lv] lv_coefs = Z;
#>   // posterior predictions
#>   eta = X * b;
#>   for (s in 1 : n_series) {
#>     mus[1 : n, s] = eta[ytimes[1 : n, s]] + trend[1 : n, s];
#>     ypred[1 : n, s] = neg_binomial_2_rng(exp(mus[1 : n, s]), phi_vec[1 : n, s]);
#>   }
#> }

# Model summary
summary(mod)
#> GAM observation formula:
#> y ~ 1
#> 
#> GAM process formula:
#> ~cov1 + cov2 + cov3
#> 
#> Family:
#> negative binomial
#> 
#> Link function:
#> log
#> 
#> Trend model:
#> AR(p = 1)
#> 
#> 
#> N process models:
#> 1 
#> 
#> N series:
#> 1 
#> 
#> N timepoints:
#> 550 
#> 
#> Status:
#> Fitted using Stan 
#> 4 chains, each with iter = 1000; warmup = 500; thin = 1 
#> Total post-warmup draws = 2000
#> 
#> 
#> Observation dispersion parameter estimates:
#>        2.5% 50% 97.5% Rhat n_eff
#> phi[1] 0.46 0.6   1.3 1.09    43
#> 
#> GAM observation model coefficient (beta) estimates:
#>             2.5% 50% 97.5% Rhat n_eff
#> (Intercept)    0   0     0  NaN   NaN
#> 
#> Process model AR parameter estimates:
#>        2.5%  50% 97.5% Rhat n_eff
#> ar1[1] 0.34 0.63  0.86 1.02   128
#> 
#> Process error parameter estimates:
#>          2.5%  50% 97.5% Rhat n_eff
#> sigma[1] 0.25 0.56   1.2 1.08    44
#> 
#> GAM process model coefficient (beta) estimates:
#>                    2.5%   50%  97.5% Rhat n_eff
#> (Intercept)_trend  1.00  1.40  1.700 1.05    61
#> cov1_trend         0.13  0.28  0.440 1.00  1910
#> cov2_trend        -0.34 -0.20 -0.059 1.00  1995
#> cov3_trend         0.14  0.29  0.440 1.00  1874
#> 
#> Stan MCMC diagnostics:
#> ✔ No issues with effective samples per iteration
#> ✖ Rhats above 1.05 found for some parameters
#>     Use pairs() and mcmc_plot() to investigate
#> ✔ No issues with divergences
#> ✔ No issues with maximum tree depth
#> 
#> Samples were drawn using sampling(hmc). For each parameter, n_eff is a
#>   crude measure of effective sample size, and Rhat is the potential scale
#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)
#> 
#> Use how_to_cite() to get started describing this model

# Regression coefficients
mcmc_plot(mod,
  variable = "trend_betas",
  type = "combo"
)

beta
#> [1]  1.0  0.5 -0.3  0.7

# AR(1) coefficient, process error (sigma) and phi
mcmc_plot(mod,
  variable = c(
    "ar1",
    "sigma",
    "phi"
  ),
  regex = TRUE,
  type = "combo"
)

alpha; phi
#> [1] 0.3
#> [1] 0.5

# Challenges of identifying phi, sigma and alpha (ar1 coefficient)
pairs(
  mod,
  variable = c('ar1',
               'sigma',
               'phi'),
  regex = TRUE
)

# Innovations
plot(
  mod, 
  type = 'residuals'
)

# Posterior predictive checks
pp_check(
  mod,
  type = 'rootogram'
)
#> Using all posterior draws for ppc type 'rootogram' by default.

pp_check(
  mod,
  type = 'pit_ecdf'
)
#> Using all posterior draws for ppc type 'pit_ecdf' by default.

# Covariate effects
conditional_effects(
  mod,
  type = 'link'
)

# Hindcast and forecast distributions
fc <- forecast(mod)
plot(fc)
#> Out of sample DRPS:
#> 493.189362

^{Created on 2025-03-13 with reprex v2.1.1}

JaredWinslow · March 13, 2025, 11:17pm

Thanks for the welcome Bob! The simplex idea is helpful.

I tried log parametrization but using different constraints after rethinking stationarity, and that actually worked! At least for p=3 and q=3, it converges. (And removing generated quantities first was important–thanks.)

I’ve only tried using simulations but will eventually get to empirical data.

JaredWinslow · March 13, 2025, 11:21pm

Thanks for the in-depth examples Nicholas. It seems we’re thinking about very similar models, since I want zero-inflation as well! (I just hadn’t gotten there yet.)

At least for small parameters, the INGARCH is currently working. I’ll try out your package when I start on the zero-inflation and larger lags.

Topic		Replies	Views
Fitting an autoregressive model and Poisson process interdependently Modeling rstan , fitting-issues , specification	12	142	November 17, 2024
Simple model that converges in JAGS but not in Stan Modeling	18	1629	June 9, 2019
Issues with running survival model in Stan Modeling rstan , techniques , fitting-issues	4	467	May 24, 2021
Poisson autoregressive model PAR(1) gives E-BFMI warning Modeling	8	1099	March 16, 2021
1-step-ahead forecasting model fitting Modeling rstan , fitting-issues	22	2123	July 21, 2020

Count Time Series | INGARCH(1,1) Fits | INGARCH(2,2) Does Not

Related topics