Syntax for time series

I’m new to Stan and not an expert on statistics, so I hope my question doesn’t seem too trivial.

I’m mostly interested in time series analysis, so I’m starting with a simple AR(K) model to learn the basics of Stan. The user’s guide shows this example of a model:

data {
  int<lower=0> K;
  int<lower=0> N;
  real y[N];
}
parameters {
  real alpha;
  real beta[K];
  real sigma;
}
model {
  for (n in (K+1):N) {
    real mu = alpha;
    for (k in 1:K)
      mu += beta[k] * y[n-k];
    y[n] ~ normal(mu, sigma);
  }
}

which is slightly different than the model I originally wrote:

data {
  int<lower=0> N;
  int<lower=0> K;
  vector[N] y;

}
parameters {
  real alpha;
  row_vector[K] beta;
  real<lower=0> sigma;
}
model {
  for (n in K+1:N)
    y[n] ~ normal(alpha + beta * y[n-K:n-1], sigma);
}

The results are the same on my test data, but I want to make sure that I’m not developing bad habits:

  • is there a fundamental difference between real y[N] and vector[N] y ? Which one should be preferred for time series data?
  • same question for real beta[K] and row_vector[K] beta: the latter lets me avoid having two nested for loops when defining the model. Maybe this is not relevant?
3 Likes

Hi there and welcome to Stan!

As far as I know the the main difference between real and vector is that a ‘vector’ allows for matrix-vector multiplication while a ‘real’ doesn’t. So this answers your second question as well - while it’s possible to do a row_vector * vector multiplication (as long as they are the same length), this is not possible for real and therefore requires an additional loop.

By the way, I assume P in your model is a typo and should be K, right?

EDIT: removed incorrect statement

I’ll just add that, there are functions like to_vector, to_row_vector and to_array that let you translate between the types.

I’d just say that in most cases, it is customary to prefer vector and row_vector over real[], but it is mostly an aesthetic choice.

The dot product of vector and row_vector will likely be slightly faster to compute than the for loop, but for most use cases it should not matter much.

Best of luck with your model!

(EDIT: Removed reaction to Maurits’ statement that he removed :-D )

2 Likes

Thanks for jumping in - my mistake! Not sure why I thought it wasn’t allowed…

1 Like

Thank you for your answers! I’m from a Python background where we don’t bother about declaring variables and everything is an array…

By the way, I assume P in your model is a typo and should be K , right?

Indeed, I corrected it

1 Like

I also happen to be experimenting with the same model – I found it quite appealing to learn to write custom functions for the model so that you can just write something like y ~ ar_model(alpha, beta, sigma);. The relevant docs are here.

2 Likes