Efficient/compact code for simulating data for a simply state space model

I am new and would appreciate a quick feedback on writing efficient and compact code for a simple state space model being developed (full stan program for the data simulation part copied at the bottom).

Specifically, I wanted to use the code below that are commented out but can only make things work using a for loop:

generated quantities {
  vector[N] y; // underlying unbiased accounting figures  
  vector[N] m; // misreporting extent in the reported figures 
  vector[N] r; // reported figures 
  y[1] = normal_rng(a/(1-b), sd_y);
//  y[2:N] = normal_rng(a + b * y[1:(N - 1)], sd_y);
//  m = normal_rng(X*c, sd_m);   //  matrix[N,K] X are covariates capturing the dis/incentives of misreporting
//  r = y + m;  
  m[1] = normal_rng(X[1]*c, sd_m); 
  r[1] = y[1] + m[1];
  for (n in 2:N) {
    y[n] = normal_rng(a + b * y[n-1], sd_y);
    m[n] = normal_rng(X[n]*c, sd_m); 
    r[n] = y[n] + m[n];
}

Questions:

  1. Is there a better way to write the code above?
  2. Relating to the data simulation part, I also have an estimation .stan file containing the following lines
...
transformed parameters {
  vector[N] m; // misreporting extent in the reported figures 
  m = r - y;
}
model {
  y[2:N] ~ normal(a + b * y[1:(N - 1)], sd_y);
  m ~ normal(X*c, sd_m);
}

with the last line triggering a non-fatal warning:

DIAGNOSTIC(S) FROM PARSER:
Info (non-fatal):
Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.
If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform.
Left-hand-side of sampling statement:
    m ~ normal(...)

Which example in the documentation will give a quick, basic, not so technical explanation of when and how to use the target += statement?

Look forward to your advice. Many thanks in advance.

data {
  int<lower=0> N; // number of observations (for different years)
  int<lower=0> K; // number of coefficents for the column vectors in covariate matrix X
  matrix[N,K] X; // covariates (capturing the dis/incentives of misreporting)

  real a; // intercept coefficient (drift) of the AR(1) process of the underlying unbiased accounting figure y[n]
  real b; // slope coefficient of the AR(1) process of y[n]
  vector[K] c; // coefficients of the K covariates in matrix X
  real<lower=0> sd_y; // sd of the underlying unbiased figure (vector y)
  real<lower=0> sd_m; // sd of the misreporting extent (vector m = r - y)
}
generated quantities {
  vector[N] y; // underlying unbiased figures  
  vector[N] m; // misreporting extent in the reported figures 
  vector[N] r; // reported figures (scaled by Rev, total revenue)

  y[1] = normal_rng(a/(1-b), sd_y);
//  y[2:N] = normal_rng(a + b * y[1:(N - 1)], sd_y);
//  m = normal_rng(X*c, sd_m);
//  r = y + m;  
  m[1] = normal_rng(X[1]*c, sd_m); 
  r[1] = y[1] + m[1];
  for (n in 2:N) {
    y[n] = normal_rng(a + b * y[n-1], sd_y);
    m[n] = normal_rng(X[n]*c, sd_m); 
    r[n] = y[n] + m[n];
  }
}
  1. I don’t think you can make the generated quantities much more efficient. It’s not possible yet (I think) to vectorize the _rng functions. On the bright side, vectorizing these statements would not lead to faster sampling anyway. The only thing I see is that you strictly speaking don’t need y and m. You can directly calculate r[n] = normal_rng(a + b * y[n-1], sd_y) + normal_rng(X[n]*c, sd_m)
  2. The warning is about the Jacobian adjustment. You can read more about it in the stan documentation.

The warning is given because m is a function of r and y and I guess at least one of them is parameter. Because m is a linear function of the r and y, the Jacobian adjustment is log(1) and you don’t need to worry about it.

1 Like

Thanks for the explanation and pointer. Appreciated.