Simulating fake data for regression in Stan

Hello Stan users,
I am trying to simulate some fake data for regression and I have come up with two codes:

The first code is:

data {
int<lower=0> N; // Number of obs

int<lower=0> P; // Number of expla
matrix[N,P] X; // matrix for  fake explanatory variables
}

generated quantities {
vector[N] yhat;
vector[P] beta;
real alpha;
real<lower = 0> sigma;
alpha = normal_rng(0.05,0.02);

beta[1] = normal_rng(-0.40,0.15); 
beta[2] = normal_rng(-0.60,0.15); 
beta[3] = normal_rng(0.15,0.05); 
beta[4] = normal_rng(0.10,0.03); 
beta[5] = normal_rng(0.05,0.02); 
beta[6] = normal_rng(0.06,0.02); 
beta[7] = normal_rng(-0.01,0.01); 
beta[8] = normal_rng(-0.001,0.02); 
beta[9] = normal_rng(0.02,0.01); 
beta[10] = normal_rng(0.01,0.02); 
beta[11] = normal_rng(-0.02,0.001);



sigma = gamma_rng(1,1); 

for (i in 1:N) {
yhat[i] = normal_rng(alpha + X[i]*beta , sigma);
}

}
****

**The second code is** 


data {
int<lower=0> N; // Number of obs

int<lower=0> P; // Number of expla
matrix[N,P] X; // matrix for  fake explanatory variables
}

parameters {
vector[P] beta;
real alpha;
real<lower = 0> sigma;
}

  model {

beta[1] ~ normal(-0.40,0.15); 
beta[2] ~ normal(-0.60,0.15); 
beta[3] ~ normal(0.15,0.05); 
beta[4] ~ normal(0.10,0.03); 
beta[5] ~ normal(0.05,0.02); 
beta[6] ~ normal(0.06,0.02); 
beta[7] ~ normal(-0.01,0.01); 
beta[8] ~ normal(-0.001,0.02); 
beta[9] ~ normal(0.02,0.01); 
beta[10] ~ normal(0.01,0.02); 
beta[11] ~ normal(-0.02,0.001); 


sigma ~ gamma(1, 1); 


}

generated quantities {
vector[N] yhat;
for (i in 1:N) {
yhat[i] = normal_rng(alpha + X[i]*beta - sigma);
}
}
****
I am wondering which of these two simulation code should I use if I intend to recover the model parameters in the next step.

Any suggestion will be greatly appreciated.

Cheers
AA

The first, using rng functions in the GQ block, is the best for simulating data.

The rng functions generate proper samples from their respective distributions directly, just like the rnorm()/rweibull()/rexp()/etc in R.

Your second approach, using the model block and ~ statements, may yield proper samples from the desired distributions, but it employs the full HMC sampling machinery to do so, and this takes both much more compute time and checks afterwards to verify no pathologies cropped up in the Monte Carlo sampling.

2 Likes

Thanks Mike. You are truly devoted helping others.
Would you also kindly comment on my other post Sampling from truncated normal distribution - #2 by ants007

Cheers.
AA

1 Like