Syntax for time series

srouchier · June 19, 2020, 6:59am

I’m new to Stan and not an expert on statistics, so I hope my question doesn’t seem too trivial.

I’m mostly interested in time series analysis, so I’m starting with a simple AR(K) model to learn the basics of Stan. The user’s guide shows this example of a model:

data {
  int<lower=0> K;
  int<lower=0> N;
  real y[N];
}
parameters {
  real alpha;
  real beta[K];
  real sigma;
}
model {
  for (n in (K+1):N) {
    real mu = alpha;
    for (k in 1:K)
      mu += beta[k] * y[n-k];
    y[n] ~ normal(mu, sigma);
  }
}

which is slightly different than the model I originally wrote:

data {
  int<lower=0> N;
  int<lower=0> K;
  vector[N] y;

}
parameters {
  real alpha;
  row_vector[K] beta;
  real<lower=0> sigma;
}
model {
  for (n in K+1:N)
    y[n] ~ normal(alpha + beta * y[n-K:n-1], sigma);
}

The results are the same on my test data, but I want to make sure that I’m not developing bad habits:

is there a fundamental difference between real y[N] and vector[N] y ? Which one should be preferred for time series data?
same question for real beta[K] and row_vector[K] beta: the latter lets me avoid having two nested for loops when defining the model. Maybe this is not relevant?

MauritsM · June 19, 2020, 10:13am

Hi there and welcome to Stan!

As far as I know the the main difference between real and vector is that a ‘vector’ allows for matrix-vector multiplication while a ‘real’ doesn’t. So this answers your second question as well - while it’s possible to do a row_vector * vector multiplication (as long as they are the same length), this is not possible for real and therefore requires an additional loop.

By the way, I assume P in your model is a typo and should be K, right?

EDIT: removed incorrect statement

martinmodrak · June 19, 2020, 11:45am

I’ll just add that, there are functions like to_vector, to_row_vector and to_array that let you translate between the types.

I’d just say that in most cases, it is customary to prefer vector and row_vector over real[], but it is mostly an aesthetic choice.

The dot product of vector and row_vector will likely be slightly faster to compute than the for loop, but for most use cases it should not matter much.

Best of luck with your model!

(EDIT: Removed reaction to Maurits’ statement that he removed :-D )

MauritsM · June 19, 2020, 11:51am

Thanks for jumping in - my mistake! Not sure why I thought it wasn’t allowed…

srouchier · June 19, 2020, 12:47pm

Thank you for your answers! I’m from a Python background where we don’t bother about declaring variables and everything is an array…

By the way, I assume P in your model is a typo and should be K , right?

Indeed, I corrected it

RJTK · June 19, 2020, 8:06pm

I also happen to be experimenting with the same model – I found it quite appealing to learn to write custom functions for the model so that you can just write something like y ~ ar_model(alpha, beta, sigma);. The relevant docs are here.

Topic		Replies	Views
Newbie question: difference between real variable[N]; and vector[N] variable; General	3	809	December 19, 2018
Vector definitions for data Modeling	4	519	January 29, 2022
Multiplication between real, vector and array in Stan General	4	1868	June 19, 2020
Vectorized != loop Modeling performance	12	953	December 29, 2020
Time-series in Stan, I am new to Stan and need hints to develop the model. THANKS Modeling rstan , specification	43	2839	June 12, 2020

Syntax for time series

Related topics