Recently I am modifying some existing Stan code by adding some additional quantities in ‘generated quantities’ section. The partial orginal code is as following: (mu and sigma are paramters they specified in the ‘parameter’ chunk)

generated quantities{
vector[n] alpha;
for (i in 1:n){
alpha[i] = normal_rng(mu, sigma);
}
}

I would like to add a new quantity as following: (I would like to do posterior predictive simulation and you could think ‘x’ and ‘indicator’ are two vectors of length m in the input data, also b is a vector of length two definied in the ‘parameter’ chunk’)

generated quantities{
vector[n] alpha;
vector[m] p_pred = b[1] + b[2]*x + alpha[indicator];
for (i in 1:n){
alpha[i] = normal_rng(mu, sigma);
}
}

However the above code will return an error stating that there is no sample in the result. My guess was that I used alpha[indicator] before I define alpha and after some messy modification I finally made my code run. But I do believe that there is more efficient way to overcome this problem. My questions are as following:

Will order of code have effect in Stan as shown in my situation?

is trying to make two lines of code into one line of code (instead of defining p_pred in one line and write another line for the explict expression). But the constaint in Stan is you need to define all the paramters used in the section, so you cannot write something like:

generated quantities{
for (i in 1:n){
alpha[i] = normal_rng(mu, sigma);
}
vector[m] p_pred = b[1] + b[2]*x + alpha[indicator];
}
}

The order here will definitely have an effect (see comments).

generated quantities{
vector[n] alpha;
vector[m] p_pred = b[1] + b[2]*x + alpha[indicator]; // alpha here is undefined meaning that p_pred is also undefined
for (i in 1:n){
alpha[i] = normal_rng(mu, sigma);
}
}

You can write this as:

generated quantities{
vector[n] alpha;
vector[m] p_pred;
for (i in 1:n){
alpha[i] = normal_rng(mu, sigma);
}
p_pred = b[1] + b[2]*x + alpha[indicator];
}

or

generated quantities{
vector[n] alpha;
for (i in 1:n){
alpha[i] = normal_rng(mu, sigma);
}
vector[m] p_pred = b[1] + b[2]*x + alpha[indicator];
}

generated quantities{
vector[n] alpha;
for (i in 1:n){
alpha[i] = normal_rng(mu, sigma);
}
vector[m] p_pred = b[1] + b[2]*x + alpha[indicator];
}

is not valid since you need to first define everything before you start any calculation. But in this case you calculate alpha first and then define another vector p_pred, will this be valid? (I believed that I have tried similar tricks but it failed)

Yeah, I think this is still true in the version of Stan used by RStan, but in more recent versions this requirement has been relaxed. So if you try a more recent version of Stan (e.g., via the new CmdStanR or CmdStanPy interfaces) then this should work.