Fitting AR parameters using complex conjugate zeros?

Hi all,

I was able to write Stan for an ARMA(p,q) model (for p,q>0) that sometimes works (see my current stan code below). It is based on the ARMA page in the Stan reference manual. From what I understand, the ARMA model is:

\Phi(L)X_t = \Theta(L)\epsilon_t,

where

\Phi(L) = 1 - \displaystyle\sum_{i=1}^p\phi_iL^i, \qquad \text{and}\qquad \Theta(L) = 1 + \displaystyle\sum_{i=1}^q\theta_iL^i,

and L is the lag operator defined by L^iX_t = X_{t-i}.

I also understand that in order for the process to be stable and invertible, all zeros of both \Phi(L) and \Theta(L) must lie within the unit circle. For real-valued ARMA processes, all coefficients \phi_i and \theta_i must be real-valued, and therefore any complex zeros of \Phi(L) and \Theta(L) must come in conjugate pairs.

My current implementation of ARMA(p,q) encodes each \phi_i and each \theta_i as real valued and contained in (-1,1). But this is not enough to guarantee that \Phi(L) and \Theta(L) have zeros contained in the unit circle on each iteration. I would like to be able to just make the parameters the zeros of the polynomials and use the transformed parameters block to calculate the coefficients for the model block. However, that would require allowing for complex conjugate zeros, but I don’t know how many zeros will be real-valued and how many will be complex conjugate pairs. Even if I did know how many complex conjugate pairs I had, I would still have a model identification problem because associativity of the factors of the polynomials (i.e. (1-\phi_i)(1-\phi_j) = (1-\phi_j)(1-\phi_i)).

My current code is below. Does anyone have any suggestions for me? Thank you so much for your time!

data {
  int<lower=1> T;            // num observations
  vector[T] y;               // observed outputs
  int<lower=1> p;            // ar degree
  int<lower=1> q;            // ma degree
}
transformed data {
  int<lower=1> max_p_q = max(p, q);
}
parameters {
  vector<lower=-1,upper=1>[p] phi;         // ar coeffs
  vector<lower=-1,upper=1>[q] theta;       // ma coeff
  real<lower=0> sigma;   // noise scale
}
model {
  vector[T] nu;               // prediction for time t
  vector[T] err;              // error for time t
  real phi_y;
  real theta_err;
  for (t in 1:max_p_q){
    nu[t] = 0;
    err[t] = y[t] - nu[t];
  }
  for (t in (2 + max_p_q - 1):T) {
    phi_y = dot_product(phi, y[(t-p):(t-1)]);
    theta_err = dot_product(theta, err[(t-q):(t-1)]);
    nu[t] = phi_y + theta_err;
    err[t] = y[t] - nu[t];
  }
  phi ~ normal(0, 2);
  theta ~ normal(0, 2);
  sigma ~ cauchy(0, 5);
  err[(2 + max_p_q - 1):T] ~ normal(0, sigma);    // likelihood
}

EDIT: it turns out that constricting all \phi_i to have |\phi_i| < 1 for all i is not correct. Even for the degree 2 case \Phi(L) = 1 - \phi_1L - \phi_2L^2, the conditions for stability are

\begin{cases} |\phi_2| < 1 \\ |\phi_1| < 1 - \phi_2 \end{cases}

(calculated using the Bistritz stability criterion). So, a counterexample to my initial guess is \phi_1 = \pm2(1 - \epsilon) and \phi_2 = \epsilon-1 for some small \epsilon. I do not believe that calculating the exact Bistritz stability criteria for a general \Phi(L) = 1 - \displaystyle\sum_{i=1}^p\phi_iL^i is feasible, so my question here still stands. Thanks again for your time.

1 Like

I think I’ve answered my own question. I think the best way to move forward here is to leave all \phi_i and \theta_i unconstrained, but just initialize them all at 0. This guarantees that the initialized values for \phi_i make \Phi(L) stable, and the sampler will (likely) never move these parameters into a regime that makes \Phi(L) unstable (given a stable timeseries).

1 Like

Thanks for posting an answer, @samuelf. As you’ve found, we can run into stability issues if initializations are too far into the tails. Constraining parameters can make the embedded differential equation solver for the Hamiltonian trajectory stiff and lead to sampling issues. We usually see this with nested differential equations.

Also, I’m really happy to see people finding uses for complex numbers!

1 Like

There was a presentation at StanCon 2020 from Sarah Heaps on the subject as it relates to Vector Autoregressions. Paper was published here: Full article: Enforcing Stationarity through the Prior in Vector Autoregressions

2 Likes