Inspecting data fit of stan model/ problems with generated quantities code block

Dear Stan community,

I would like to (visually) compare the fit of my Stan model to the input data. To do so, I have added a generated quantities block to my code, but there seems to be an issue with my syntax in this code block that I don’t understand:

data {
  vector[1860] X_changedet;
  vector[1860] y;
}

parameters {
   vector[1] beta;
   real<lower = 0, upper = (2*10*(0.04/1.96))> sigma;
}
 
model {
   // priors:
   target += normal_lpdf(beta | 0, (0.04/1.96));
   target += uniform_lpdf(sigma | 0, (2*10*(0.04/1.96)));

   // likelihood:
   target += normal_id_glm_lpdf(y | to_matrix(X_changedet), 0, beta, sigma);
}

generated quantities {
  vector[1860] y_sim;
  
  for (n in 1:1860) {
      y_sim[n] = normal_rng(beta * X_changedet[n], sigma);
  }
}

I get the following error:

Dimension mismatch in assignment; variable name = y_sim, type = real; right-hand side type = real[ ].
Illegal statement beginning with non-void expression parsed as
  y_sim[n]
Not a legal assignment, sampling, or function statement.  Note that
  * Assignment statements only allow variables (with optional indexes) on the left;
  * Sampling statements allow arbitrary value-denoting expressions on the left.
  * Functions used as statements must be declared to have void returns

 error in 'model74f67b0b86d9_checkfit_changedet' at line 142, column 6
  -------------------------------------------------
   140:   
   141:   for (n in 1:1860) {
   142:       y_sim[n] = normal_rng(beta * X[n], sigma);
             ^
   143:   }
  -------------------------------------------------

PARSER EXPECTED: "}"

and don’t understand how to fix it. (I also tried using multi_normal_rng instead of the for-loop, but I got syntax errors with that as well).

Any help would be greatly appreciated!

As a follow up, I got the above to run by changing it as follows:

generated quantities {
   vector[1860] y_sim;
   real mu;
  
   for (n in 1:1860) {
        mu = beta[1] .* X_changedet[n];
        y_sim[n] = normal_rng(mu, sigma);
   }
}

But I would actually prefer to do this:

generated quantities {
   vector[1860] y_sim;
   vector[1860] mu = beta .* X_changedet;

   y_sim = normal_rng(mu, sigma);
}

Based on my understanding of the documentation this should be possible (with y_sim and mu as vectors and sigma as a real), but I still get:

Dimension mismatch in assignment; variable name = y_sim, type = vector; right-hand side type = real[ ].
Illegal statement beginning with non-void expression parsed as
  y_sim

Can someone please point out to me what the mistake in my understanding might be?

On another note, is there a way to ensure that the same seed is used by the rng in generated quantities as that used in the sampler? I would like to inspect the actual fit of the model to the data, as opposed to the posterior predictive.

When using the vectorised rng the return type is a real array, not a vector:

generated quantities {
   real y_sim[1860];
   vector[1860] mu = beta .* X_changedet;

   y_sim = normal_rng(mu, sigma);
}

Thank you, that solved it! :)

I have one more question regarding generated quantities. As I have already run my sampler, I would like to run the standalone version. (The original code for sampling did not contain the generated quantities block.) When I run

f2 <- gqs(m2, data = datalist, draws = as.matrix(fit))

(where m2 is the new model code containing generated quantities, datalist contains all data required for the original model and fit is the output of my original model) I get the error message:

Exception: Variable beta missing  (in 'model1283b4cd54baf_7cba5efb7ebeb121394361987c454d55' at line 119)

but beta is one of the parameters that is estimated and is contained in the stanfit object. What needs to be done differently? It’s probably another lack of understanding on my part (I am a new Stan user).