ARMA(1,1) as part of hierarchical model term

Hi all, I am trying to build a model for my research, which is
y_{i,t} = x_{i,t}\beta + v_i + \eta_t + e_{i,t}
where:
e_{i,t}|\sigma \sim N(0,\sigma);
y_{i,t} | \beta, v_i,\eta_t,\sigma \sim N(x_{i,t}\beta+v_i+\eta_t, \sigma);
v_i | \sigma_v \sim N(0, \sigma_v);
\eta_t | \phi,\ theta, \sigma_\eta \sim ARMA(1,1).

And following is my code for r stan:


data{
  int<lower=0> T;
  int<lower=0> N;
  matrix[N,T]  x;
  matrix[N,T]  y;
}

parameters{
  real                    beta;
  real<lower=-1, upper=1> phi;
  real                    theta;
  real<lower=0>           sig2;
  real<lower=0>           sig2eta;
  real<lower=0>           sig2v;
  vector[N]               v;
  vector[T]               eta;
}

model{
  vector[T] a;      // error for time t
  for(i in 1:N){
    for(t in 1:T){
      y[i,t] ~ normal(x[i,t]*beta+v[i]+eta[t], sqrt(sig2));
    }
  }
  sig2 ~ uniform(0,5);
  for(i in 1:N){    //v[i] ~ normal(0,sig2v), so for each state i, v changes
    v[i] ~ normal(0,sqrt(sig2v));
  }
  sig2v ~ uniform(0,5);  
//ARMA(1,1) part
  a[1] = eta[1]; //this is ARMA(1,1) part and assume eta[0]=0, a[0]=0, mean coeff=0.??? what is eta[1]?
  for (t in 2:T) {
    a[t] = eta[t] - phi * eta[t-1] - theta * a[t-1]; //a[t]=eta[i,t]-eta.hat[i,t|t-1]
  }
  phi ~ uniform(-1,1);
  sig2eta ~ uniform(0,5);
  a ~ normal(0, sqrt(sig2eta));   
}

I have a question about the ARMA(1,1) part, is a[1]=eta[1] correct? there are 116 divergent transitions and the largest R-hat is 1.6, Bulk Effective Samples Size (ESS) is too low. Any suggestions are welcome! Thanks.

Hi!
I’m following the docs as a guide here so forgive me if I’ve overstepped, if a is your error vector make sure to specify that:

  1. a[0] = 0
  2. a[1] = y[1] - eta[1]
    otherwise everything checks out to me, kindly let me know if this helps at all

Hi, thanks for the comments. I did assume a[0]=0 here, and since eta follows ARMA(1,1) rather than y, so I think a[1] should be eta[1]-eta[1|0] and I assume eta[0]=0 as well. Thus, I wrote a[1]=eta[1]. Does this make sense?

Ah this is on me - I forgot R starts indexing at 1 rather than 0, your initial conditions are correct.
My follow up question would be where does eta gets declared after the 1st index? You’ve declared it as a vector with length T, but it doesn’t look like its components get populated elsewhere per the distribution you specified - was that deliberate?

That is also my question here. I searched everywhere about ARMA(1,1) model in Stan online and all of them are assume eta to be presented by some known values. I also tried eta[i,t]=w[i,t]-x[i,t]*beta-v[I], but that just doesn’t seem right to me. I am trying to find a way to generate ARMA(1,1) like N(0,sig2). Do I state the question clear?

Yes, I think I follow now - we need to find a way to generate eta but it’s slightly complicated by the fact that it follows a hierarchal ARMA which is atypical of most guides. I’ll get back to you shortly.

1 Like

I think you can write an AR(1) model like this:

y_t = \alpha + \eta_t \\ \eta_{t + 1} \sim N(\beta y_t, \sigma)

With that assumption \eta explicitly with y_t - \mu_t, or if you write it like it is in the Stan manual the \eta variables disappear:

y_t \sim N(\alpha + \beta y_t, \sigma);

They’re equivalent but the notation is confusing.

For the v parameter here:

v[i] ~ normal(0,sqrt(sig2v));

Switch to a non-centered parameterization (Diagnosing Biased Inference with Divergences).

I think some priors are missing in this.

For a random walk model (and this transfers to the stuff you’re writing here), there are two ways to write these. You can write them in terms of distributions on the random walks themselves:

x_{t + 1} \sim N(x_t, \sigma)

Or you can write them in terms of innovations \eta_t:

x_{t + 1} = x_{t} + \eta_t \sigma \\ \eta_t \sim N(0, 1)

Doing the second thing basically corresponds to a non-centered parameterization, and is usually the thing you want to do (non-centered works better than centered with less data, and this is often the case).

The first thing translates to Stan code like:

transformed data {
  int N = 10;
}
parameters {
  vector[N] x;
  real<lower = 0.0> sigma;
}
model {
  x[1] ~ normal(0, sigma); // The 1st element needs a prior
  for(i in 2:N) {
    x[i] ~ normal(x[i - 1], sigma);
  }
}

And then in terms of innovations:

transformed data {
  int N = 10;
}
parameters {
  vector[N] eta;
  real<lower = 0.0> sigma;
}
model {
  vector[N] x;
  eta ~ normal(0, 1);
  x[1] = eta[1];
  for(i in 2:N) {
    x[i] = x[i - 1] + sigma * eta[i];
  }
}

So when you look at these time series models you might see the equations go either way and you might see the code go either way.

It looks like this is written in terms of innovations and the bit that is missing are priors on eta (Edit: or just missing priors on the innovations – I think the eqs. are written where eta is the ARMA variable but the code is written where eta are the innovations), but I am not totally sure.

Edit: Also there’s the section in the manual if you haven’t found it yet, though these models can always get confusing: 2.4 Autoregressive moving average models | Stan User’s Guide

1 Like

Thanks for the suggestions.
For the first one, which write \eta as the difference of known values, that means the model consider the regression residuals as ARMA(1,1), that would be y_{i,t} = x_{i,t}\beta+v_{i}+\eta_{t}, I think it’s different from my model.

I also tried non-centered parameterization to fix the divergence issue, however, it didn’t work.

As for the missing priors, since I assume \theta \sim ( -\infty, +\infty) and so as \beta, but I tried to put these priors as normal and even change uniform distribution to half-Cauchy (stronger priors), the issue didn’t go away either.

I didn’t quite get this part as innovations. Are you suggesting that I should initial eta[1] as some normal distribution? I will definitely try this and back to you.

And yes, thanks for sharing this, I did the ARMA(1,1) code following this page.

Yeah it does look different, but that is how the ARMA model in the Stan manual is coded (search for err[t] = y[t] - nu[t];) (Edit: I don’t mean to say the Stan manual model is right or this is wrong – just that it shows up both ways and confuses me so I was just pointing out the difference)

If there are multiple causes of divergences in a model, you might need multiple fixes.

The easiest thing to do is start small. In this case I think that means sampling \eta_t assuming \phi, \theta, \sigma_n are known.

Well basically I’m worried the code doesn’t correspond to the equations.

Could you write out in more detail what you mean by: \eta_t | \phi, \theta, \sigma_n \sim ARMA(1, 1)? I’m not familiar with these AR/MA notation things.

That is correct, but I think what they coded is different from how normal distribution generated. And in \eta_t |\phi,\theta,\sigma_n \sim ARMA(1,1), \phi is the AR coefficient and \theta is the MA coefficient and \sigma_\eta is the variance of error term in ARMA. That is \eta_t = \phi\beta+\theta a_{t-1}+a_t, and we assume a_t \sim N(0, \sigma_\eta) for all the t from 1:T. Does this make sense?

Yeah thanks, so how about:

data {
  int T;
  real phi;
  real beta;
  real theta;
}
parameters {
  real<lower=0>  sig_a;
  vector[T] a_std;
}
transformed parameters {
  vector[T] eta;
  vector[T] a = a_std * sig_a; // non-centered innovations a

  eta[1] = ?; // Not sure for boundary
  for(t in 2:N) {
    // assume phi/beta/theta are given as data
    eta[t] = phi * beta + theta * a[t - 1] + a[t];
  }
}
model {
  a_std ~ normal(0, 1);
  sig_a ~ normal(0, 1);
}

Something I noticed as well, you’ll want the constraints of your parameters to match the support or you’ll get divergences.

So if you put a [0, 5] uniform prior on sig2eta, then make matching constraints like:

real<lower = 0, upper = 5> sig2eta;

Without the constraints the parameter can wonder off into 0 probability space and log density evaluations blow up.

Edit: Updated constrain to be correct

1 Like

Hi Ben, thanks for the suggestions, in the transformed part,

it should be phi*eta[t-1]. And I tried your suggestions, if I put eta in the transformed block, it has error saying that “Location parameter is nan, but must be finite!” for the part:

So I think we have to put eta as an unknown parameter in the parameter block. And It’s great to know that we need to match constraints on priors! Thanks.

That just means that at some point x[i,t] * beta + v[i] + eta[t] is a nan. It is probably an element of eta and it’s probably just a programming bug.

eta should not be a parameter if a[t] is in the model. If you write this not in terms of innovations, then eta can be in the parameter block but then a[t] shouldn’t be in the model. Can you post the latest model you’re working with?

sure.


data{
  int<lower=0> T;
  int<lower=0> N;
  matrix[N,T]  x;
  matrix[N,T]  y;
}

parameters{
  real                    beta;
  real<lower=-1, upper=1> phi;
  real                    theta;
  real<lower=0,upper=5>   sig2;
  real<lower=0,upper=5>   sig2v;
  vector[N]               v;
  real<lower=0>           sig_a;
  vector[T]               a_std;
}

transformed parameters {
  vector[T] eta;
  vector[T] a = a_std * sig_a; // non-centered innovations a

  eta[1] = a[1]; // Not sure for boundary
  for(t in 2:N) {
    // assume phi/beta/theta are given as data
    eta[t] = phi * eta[t-1] + theta * a[t-1] + a[t];
  }
}

model{
  for(i in 1:N){
    for(t in 1:T){
      y[i,t] ~ normal(x[i,t]*beta+v[i]+eta[t], sig2);
    }
  }
  sig2 ~ uniform(0,5);
  for(i in 1:N){    
    v[i] ~ normal(0,sig2v);
  }
  sig2v ~ uniform(0,5);
  phi ~ uniform(-1,1);
  theta ~ normal(0,2);
  a_std ~ normal(0, 1);
  sig_a ~ normal(0, 1);
}

And this is my simulation data part:

N = 10
T = 120
phi = 0.3
theta = 0.5
sig2 = 0.4
sig2v = 0.1
sig2eta = 0.2
beta = 0.3
set.seed(123)
eta = arima.sim(n = T, list(ar = phi, ma = theta),sd = sig2eta)
v = rnorm(N,mean=0,sd=sig2v)
epsilon = matrix(NA,nrow=N,ncol=T)
for(i in 1:N){
  epsilon[i,] = rnorm(T, mean=0, sd=sig2)}
x = matrix(NA,nrow=N,ncol=T)
truew = matrix(NA,nrow=N,ncol=T)
for(i in 1:N){
  for(t in 1:T){
    x[i,t] = rnorm(1,mean=1,sd=1)
    truew[i,t] = x[i,t]*beta+v[i]+eta[t]+epsilon[i,t]
  }
}
state_test = list(N=N,T=T,y=truew,x=x)
test = stan(file = "try.stan",dat = state_test,  
             pars = c("phi","theta","sig2","sig2v","beta"),
             iter=2000,warmup=500,control = list(adapt_delta=0.99,max_treedepth = 10))
test_result = extract(test,permute=TRUE)  
test
1 Like

The time series thing looks good.

If that sig2v is a variance add a sqrt:

v[i] ~ normal(0, sqrt(sig2v));

If you parameterize it in terms of standard deviation (sigv or something), then you can easily switch between centered and non-centered parameterization using:

vector[N] v; // centered

And

vector<multiplier = sigv>[N] v; // non-centered

(Updated non-centered parametrization with offset and multiplier - #3 by daniel_h)

Hi Ben, thanks for the comments. I think the issue here is I don’t think Stan can allow eta ~ ARMA(1,1) be an unobservable variable such as v ~ N(0,1).

Yeah that sorta syntax isn’t possible in Stan right now.

1 Like