Optimize Stan code - hierarchical model


Is there any better way to parametrize this hierarchical model?

Thank you,

data {
 int<lower=0> N;  //number of observations
 real<lower=0> CPS[N];  //Cost per service (response variable) 
 int<lower=0> Njkm;  //total number of state-service-market (168 total, 28 do not have observed CPS)
 int<lower=0> Nkm;  //number of market services (8)
 //int<lower=1> jkm_identifier[N]; //state-service-market identifier level 1
 //int<lower=1> km_identifier[N]; //service-market identifier level 1
 //int<lower=0> gender_identifier[N]; //population gender effect
 matrix[N,Njkm] design_matrix_1;
 matrix[N,Nkm] design_matrix_2;
 matrix[Njkm,Nkm] design_matrix_3;
 matrix[Njkm,24] design_matrix_4;
transformed data {
parameters {
// real beta_0;
 vector[Njkm] mu_jkm;
 vector[Nkm] gamma_km;
 vector[24] beta_km;
 vector<lower=0>[Nkm] sigma_cps_km;
 vector<lower=0>[Nkm] sigma_mu_km;
 //real theta_g; //Population gender effect
transformed parameters {
     vector[Njkm] gamma_km_index = design_matrix_3 * gamma_km + design_matrix_4 * beta_km ; 
     vector<lower=0>[Njkm] sigma_mu_index = design_matrix_3 * sigma_mu_km;
     vector[N] mu_index =design_matrix_1 * mu_jkm;
     vector<lower=0>[N] sigma_cps_index = design_matrix_2 * sigma_cps_km;

    for (n in 1:N){
model { 
//target += normal_lpdf(beta_0jkm | beta_km, sigma_u0jkm);

//for (i in 1:N){
//CPS[i] ~ normal (mu[i], sigma_cps[i]);
target += normal_lpdf(CPS |mu_index, sigma_cps_index);
target += normal_lpdf(mu_jkm |gamma_km_index, sigma_mu_index);
generated quantities{
   real yrep[N];
   for (i in 1:N)
   yrep[i]= normal_rng(mu_index[i], sigma_cps_index[i]);

So in this thread @avehtari gives OP the following advise:

if you’re looking to reparameterise because the model runs slow or doesn’t converge then @avehtari advise RE priors might apply here too since none of the parameters seem to have priors in your model unless I’ve misunderstood something.


Yes, thank you.
Another point, is this matrix notation the most efficient way to write the problem?

Just IMHO, and assuming your objective is performance, I would tend to think about it in terms of (1) Am I performing any operations that I don’t need to? (2) Have I vectorised as much as possible?

According to this, the exp function is vectorised. so you can write :


instead of:

for (n in 1:N){

Similarly, this says that normal_rng is vectorised so you can write this:

yrep=normal_rng(mu_index, sigma_cps_index);

instead of this:

 for (i in 1:N)
   yrep[i]= normal_rng(mu_index[i], sigma_cps_index[i]);

Except for that, I don’t see anything else personally, but others might.

I can’t vectorize yrep:

Variable definition base type mismatch, variable declared as base type vector variable definition has base type real[ ] error in ‘model49c4729690b_stan_CPS_11’ at line 50, column 54

48: }
49: generated quantities{
50: vector[N] yrep = normal_rng(mu_index, sigma_cps_index);
51: }

Error in stanc(file = file, model_code = model_code, model_name = model_name, :
failed to parse Stan model ‘stan_CPS_11’ due to the above error.

Just make it an array instead ?

1 Like