 # Problem Simulating Data with Generated Quantities (Dimension mismatch in assignment; type = real; right-hand side type = real[ ])

I am trying to start working directly with Stan code by simulating some data from priors using the `generated quantities` block. I’m struggling to match vectors and reals

``````# Fake starting data frame
n <- 20
data <- data.frame(
SOGF = sample(c(0,1), n, replace = TRUE),
STGF = sample(c(0,1), n, replace = TRUE),
PK = rnorm(n, mean = 50, sd = 2),
RTN = rnorm(n, mean = 0, sd = 1))
X <- model.matrix(data = data,
RTN ~ SOGF + STGF + PK)
``````
``````#Stan model
model_code = "
data {
int n; // number of observations
int k; // number of predictors
matrix[n,k] X; // predictor matrix
vector[n] Y; // outcome vector
int<lower = 0, upper = 1> run_estimation; // evaluate the likelihood?
}
parameters {
real alpha; // intercept
vector[k] beta; // coefficients for predictors
real<lower=0> sigma; // error scale
}
model {
alpha ~ normal(0, 10);
beta ~ normal(0, 10); // prior for SOGF
beta ~ normal(0, 10); // prior for STGF
beta ~ normal(0, 10); // prior for PK
sigma ~ std_normal();
// conditionally run likelihood
if(run_estimation==1){
Y ~ normal(alpha + X * beta, sigma);
}
}
generated quantities {
vector[n] Y_sim;
for(i in 1:n) {
Y_sim[i] = normal_rng(alpha + X * beta, sigma);
}
}
"
``````
``````# Sample from model
sim_out <- stan(model_code = model_code,
data = list(n = n,
k = ncol(X),
X = X,
Y = data\$RTN,
run_estimation = 0),
iter = 1000,
chains = 1)
``````

This gives me the following error:

``````SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Dimension mismatch in assignment; variable name = Y_sim, type = real; right-hand side type = real[ ].
Illegal statement beginning with non-void expression parsed as
Y_sim[i]
Not a legal assignment, sampling, or function statement.  Note that
* Assignment statements only allow variables (with optional indexes) on the left;
* Sampling statements allow arbitrary value-denoting expressions on the left.
* Functions used as statements must be declared to have void returns

error in 'modelb69803c68a26d_6e8a02c9f71e8fbd8d0f5be02af0006c' at line 28, column 4
-------------------------------------------------
26:  vector[n] Y_sim;
27: for(i in 1:n) {
28:     Y_sim[i] = normal_rng(alpha + X * beta, sigma);
^
29:   }
-------------------------------------------------

PARSER EXPECTED: "}"
``````

If I comment out the `generated quantities` block I can sample from the fake data.

What am I doing incorrectly in that `generated quantities` block?

The error here is:

``````variable name = Y_sim, type = real; right-hand side type = real[ ]
``````

Because the result of `alpha + X * beta` is a vector of length `n`, the vectorised version of `normal_rng` is being called, and attempting to return an array of reals (i.e., `real[]`), except you’re trying to assign the result to a single value (`Y_sim[i]`).

You can instead just allocate the vectorised return directly:

``````generated quantities {
real Y_sim[n] = normal_rng(alpha + X * beta, sigma);
}
``````

Also, stan has a built-in distribution for linear regression which will sample more efficiently (and is GPU accelerated if you’re using `cmdstanr`).

You would specify it like so:

``````  if(run_estimation==1){
Y ~ normal_id_glm(X, alpha, beta, sigma);
}
``````

Thank you once again, @andrjohns !!