Simulating fake data for regression in Stan

ants007 · October 7, 2021, 3:01pm

Hello Stan users,
I am trying to simulate some fake data for regression and I have come up with two codes:

The first code is:

data {
int<lower=0> N; // Number of obs

int<lower=0> P; // Number of expla
matrix[N,P] X; // matrix for  fake explanatory variables
}

generated quantities {
vector[N] yhat;
vector[P] beta;
real alpha;
real<lower = 0> sigma;
alpha = normal_rng(0.05,0.02);

beta[1] = normal_rng(-0.40,0.15); 
beta[2] = normal_rng(-0.60,0.15); 
beta[3] = normal_rng(0.15,0.05); 
beta[4] = normal_rng(0.10,0.03); 
beta[5] = normal_rng(0.05,0.02); 
beta[6] = normal_rng(0.06,0.02); 
beta[7] = normal_rng(-0.01,0.01); 
beta[8] = normal_rng(-0.001,0.02); 
beta[9] = normal_rng(0.02,0.01); 
beta[10] = normal_rng(0.01,0.02); 
beta[11] = normal_rng(-0.02,0.001);



sigma = gamma_rng(1,1); 

for (i in 1:N) {
yhat[i] = normal_rng(alpha + X[i]*beta , sigma);
}

}
****

**The second code is** 


data {
int<lower=0> N; // Number of obs

int<lower=0> P; // Number of expla
matrix[N,P] X; // matrix for  fake explanatory variables
}

parameters {
vector[P] beta;
real alpha;
real<lower = 0> sigma;
}

  model {

beta[1] ~ normal(-0.40,0.15); 
beta[2] ~ normal(-0.60,0.15); 
beta[3] ~ normal(0.15,0.05); 
beta[4] ~ normal(0.10,0.03); 
beta[5] ~ normal(0.05,0.02); 
beta[6] ~ normal(0.06,0.02); 
beta[7] ~ normal(-0.01,0.01); 
beta[8] ~ normal(-0.001,0.02); 
beta[9] ~ normal(0.02,0.01); 
beta[10] ~ normal(0.01,0.02); 
beta[11] ~ normal(-0.02,0.001); 


sigma ~ gamma(1, 1); 


}

generated quantities {
vector[N] yhat;
for (i in 1:N) {
yhat[i] = normal_rng(alpha + X[i]*beta - sigma);
}
}
****
I am wondering which of these two simulation code should I use if I intend to recover the model parameters in the next step.

Any suggestion will be greatly appreciated.

Cheers
AA

mike-lawrence · October 9, 2021, 1:36pm

The first, using rng functions in the GQ block, is the best for simulating data.

The rng functions generate proper samples from their respective distributions directly, just like the rnorm()/rweibull()/rexp()/etc in R.

Your second approach, using the model block and ~ statements, may yield proper samples from the desired distributions, but it employs the full HMC sampling machinery to do so, and this takes both much more compute time and checks afterwards to verify no pathologies cropped up in the Monte Carlo sampling.

ants007 · October 9, 2021, 1:40pm

Thanks Mike. You are truly devoted helping others.
Would you also kindly comment on my other post Sampling from truncated normal distribution - #2 by ants007

Cheers.
AA

Topic		Replies	Views
Problem Simulating Data with Generated Quantities (Dimension mismatch in assignment; type = real; right-hand side type = real[ ]) RStan rstan , techniques , specification	3	860	April 29, 2021
Sampling from a matrix in STAN Modeling rstan , techniques , performance	28	1323	September 1, 2021
Conducting Simulation Studies in RStan RStan	2	927	July 4, 2019
Fake data - _rng where is the error from? Modeling rstan	7	290	April 18, 2024
Generating quantities from Gaussian processes Developers gaussian-process	2	280	March 28, 2024

Simulating fake data for regression in Stan

Related topics