# Efficient/compact code for simulating data for a simply state space model

I am new and would appreciate a quick feedback on writing efficient and compact code for a simple state space model being developed (full stan program for the data simulation part copied at the bottom).

Specifically, I wanted to use the code below that are commented out but can only make things work using a for loop:

``````generated quantities {
vector[N] y; // underlying unbiased accounting figures
vector[N] m; // misreporting extent in the reported figures
vector[N] r; // reported figures
y[1] = normal_rng(a/(1-b), sd_y);
//  y[2:N] = normal_rng(a + b * y[1:(N - 1)], sd_y);
//  m = normal_rng(X*c, sd_m);   //  matrix[N,K] X are covariates capturing the dis/incentives of misreporting
//  r = y + m;
m[1] = normal_rng(X[1]*c, sd_m);
r[1] = y[1] + m[1];
for (n in 2:N) {
y[n] = normal_rng(a + b * y[n-1], sd_y);
m[n] = normal_rng(X[n]*c, sd_m);
r[n] = y[n] + m[n];
}
``````

Questions:

1. Is there a better way to write the code above?
2. Relating to the data simulation part, I also have an estimation .stan file containing the following lines
``````...
transformed parameters {
vector[N] m; // misreporting extent in the reported figures
m = r - y;
}
model {
y[2:N] ~ normal(a + b * y[1:(N - 1)], sd_y);
m ~ normal(X*c, sd_m);
}
``````

with the last line triggering a non-fatal warning:

``````DIAGNOSTIC(S) FROM PARSER:
Info (non-fatal):
Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.
If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform.
Left-hand-side of sampling statement:
m ~ normal(...)
``````

Which example in the documentation will give a quick, basic, not so technical explanation of when and how to use the target += statement?

``````data {
int<lower=0> N; // number of observations (for different years)
int<lower=0> K; // number of coefficents for the column vectors in covariate matrix X
matrix[N,K] X; // covariates (capturing the dis/incentives of misreporting)

real a; // intercept coefficient (drift) of the AR(1) process of the underlying unbiased accounting figure y[n]
real b; // slope coefficient of the AR(1) process of y[n]
vector[K] c; // coefficients of the K covariates in matrix X
real<lower=0> sd_y; // sd of the underlying unbiased figure (vector y)
real<lower=0> sd_m; // sd of the misreporting extent (vector m = r - y)
}
generated quantities {
vector[N] y; // underlying unbiased figures
vector[N] m; // misreporting extent in the reported figures
vector[N] r; // reported figures (scaled by Rev, total revenue)

y[1] = normal_rng(a/(1-b), sd_y);
//  y[2:N] = normal_rng(a + b * y[1:(N - 1)], sd_y);
//  m = normal_rng(X*c, sd_m);
//  r = y + m;
m[1] = normal_rng(X[1]*c, sd_m);
r[1] = y[1] + m[1];
for (n in 2:N) {
y[n] = normal_rng(a + b * y[n-1], sd_y);
m[n] = normal_rng(X[n]*c, sd_m);
r[n] = y[n] + m[n];
}
}
``````
1. I donâ€™t think you can make the generated quantities much more efficient. Itâ€™s not possible yet (I think) to vectorize the `_rng` functions. On the bright side, vectorizing these statements would not lead to faster sampling anyway. The only thing I see is that you strictly speaking donâ€™t need `y` and `m`. You can directly calculate `r[n] = normal_rng(a + b * y[n-1], sd_y) + normal_rng(X[n]*c, sd_m)`
The warning is given because `m` is a function of `r` and `y` and I guess at least one of them is parameter. Because `m` is a linear function of the `r` and `y`, the Jacobian adjustment is log(1) and you donâ€™t need to worry about it.