Hi all,
I was able to write Stan for an ARMA(p,q) model (for p,q>0) that sometimes works (see my current stan code below). It is based on the ARMA page in the Stan reference manual. From what I understand, the ARMA model is:
where
and L is the lag operator defined by L^iX_t = X_{t-i}.
I also understand that in order for the process to be stable and invertible, all zeros of both \Phi(L) and \Theta(L) must lie within the unit circle. For real-valued ARMA processes, all coefficients \phi_i and \theta_i must be real-valued, and therefore any complex zeros of \Phi(L) and \Theta(L) must come in conjugate pairs.
My current implementation of ARMA(p,q) encodes each \phi_i and each \theta_i as real valued and contained in (-1,1). But this is not enough to guarantee that \Phi(L) and \Theta(L) have zeros contained in the unit circle on each iteration. I would like to be able to just make the parameters the zeros of the polynomials and use the transformed parameters block to calculate the coefficients for the model block. However, that would require allowing for complex conjugate zeros, but I don’t know how many zeros will be real-valued and how many will be complex conjugate pairs. Even if I did know how many complex conjugate pairs I had, I would still have a model identification problem because associativity of the factors of the polynomials (i.e. (1-\phi_i)(1-\phi_j) = (1-\phi_j)(1-\phi_i)).
My current code is below. Does anyone have any suggestions for me? Thank you so much for your time!
data {
int<lower=1> T; // num observations
vector[T] y; // observed outputs
int<lower=1> p; // ar degree
int<lower=1> q; // ma degree
}
transformed data {
int<lower=1> max_p_q = max(p, q);
}
parameters {
vector<lower=-1,upper=1>[p] phi; // ar coeffs
vector<lower=-1,upper=1>[q] theta; // ma coeff
real<lower=0> sigma; // noise scale
}
model {
vector[T] nu; // prediction for time t
vector[T] err; // error for time t
real phi_y;
real theta_err;
for (t in 1:max_p_q){
nu[t] = 0;
err[t] = y[t] - nu[t];
}
for (t in (2 + max_p_q - 1):T) {
phi_y = dot_product(phi, y[(t-p):(t-1)]);
theta_err = dot_product(theta, err[(t-q):(t-1)]);
nu[t] = phi_y + theta_err;
err[t] = y[t] - nu[t];
}
phi ~ normal(0, 2);
theta ~ normal(0, 2);
sigma ~ cauchy(0, 5);
err[(2 + max_p_q - 1):T] ~ normal(0, sigma); // likelihood
}
EDIT: it turns out that constricting all \phi_i to have |\phi_i| < 1 for all i is not correct. Even for the degree 2 case \Phi(L) = 1 - \phi_1L - \phi_2L^2, the conditions for stability are
(calculated using the Bistritz stability criterion). So, a counterexample to my initial guess is \phi_1 = \pm2(1 - \epsilon) and \phi_2 = \epsilon-1 for some small \epsilon. I do not believe that calculating the exact Bistritz stability criteria for a general \Phi(L) = 1 - \displaystyle\sum_{i=1}^p\phi_iL^i is feasible, so my question here still stands. Thanks again for your time.