Vector definitions for data

Is there a difference/advantage in defining data vectors either of the two ways in the data block of a stan program?

data{
 vector[N]  y;
} 

versus

data{
real y[N];
}

I am not seeing a difference in simple example. From what I’m seeing in the documentation it seems that either is acceptable.

Mike A

It’s generally better to use vector types:

data {
 vector[N]  y;
}

As these are more easily compatible with other matrix/vector operations, and are able to used more efficiently in the underlying c++.

2 Likes

Thanks for your answer. I was wondering the same thing.

So in this model : (found from Aki Vehtari study case Gaussian process demonstration with Stan ); real xn[N] = to_array_1d((x - xmean)/xsd); should be change to vector[N] xn = (x- xmean)/xsd ? Or do you think there is a reason why xn and x2n are array but yn is a vector here ?

data {
  int<lower=1> N;      // number of observations
  vector[N] x;         // univariate covariate
  vector[N] y;         // target variable
  int<lower=1> N2;     // number of test points
  vector[N2] x2;       // univariate test points
}
transformed data {
  // Normalize data
  real xmean = mean(x);
  real ymean = mean(y);
  real xsd = sd(x);
  real ysd = sd(y);
  real xn[N] = to_array_1d((x - xmean)/xsd);
  real x2n[N2] = to_array_1d((x2 - xmean)/xsd);
  vector[N] yn = (y - ymean)/ysd;
  real sigma_intercept = 0.1;
  vector[N] zeros = rep_vector(0, N);
}

In that case study, xn and x2n are specified as arrays because they’re to be passed to the gp_exp_quad_cov functions, which requires an array input for particular parameters

2 Likes

Great thanks!