Chains getting stuck - Please forgive me, I am very new to STAN

Hello, All,
I am very new to STAN, just started to use to fit stochastic frontier model.
The model itself is quite complex, as I am using truncated normal distribution to capture inefficiency, with its mean and variance both depends upon same exogenous variables. At the same time, I am also controlling for random effects. All the independent variables are normalised by their respective geometric means.
Here is my model:

data {
int<lower=0> N;
int<lower=0> p;
int<lower=0> j;
int<lower=0> v;
real LN_NEGATIVE_MED_FTE_IDF[N];
real LN_INPATIENTS[N];
real LN_OUTPATIENTS[N];
real LN_OTHER_STAFF_FTE_IDF[N];
real LN_CLINICAL_EXP_IDF[N];
real LN_OUT_EXP_IDF[N];
real LN_CAP_ASSETS_IDF[N];
real LN_OTHER_OUT_IDF[N];
real LN_OTHER_CAP_IDF[N];
real LN_OTHER_INPATIENTS_IDF[N];
real LN_OTHER_OUTPATIENTS_IDF[N];
real LN_OTHER_STAFF_FTE_IDF_SQ[N];
real LN_CLIN_OUT_IDF[N];
real LN_CLIN_CAP_IDF[N]; 
real LN_CLIN_INPATIENTS_IDF[N];
real LN_CLIN_OUTPATIENTS_IDF[N]; 
real LN_CLINICAL_EXP_IDF_SQ[N];
real LN_OUT_CAP_IDF[N];
real  LN_OUT_INPATIENTS_IDF[N];
real  LN_OUT_OUTPATIENTS_IDF[N];
real  LN_OUT_EXP_IDF_SQ[N]; 
real  LN_CAP_INPATIENTS_IDF[N];
real LN_CAP_OUTPATIENTS_IDF[N];
real  LN_CAP_ASSETS_IDF_SQ[N];
real  LN_INPATIENTS_OUTPATIENTS[N];
real  LN_INPATIENTS_SQ[N];
real  LN_OUTPATIENTS_SQ[N];
real  TREND_STANDARDISED[N];
real  TREND_SQUARE[N]; 
int<lower=0,upper=1> q1[N]; 
int<lower=0,upper=1>  q2[N];
int<lower=0,upper=1>  q3[N];
real ln_ALOS[N];
real ln_proportion_otsource_labour[N];
real ln_capital_labour_ratio[N];
int<lower=0,upper=1> rural[N];
int<lower=0,upper=1> tertiary[N];
int<lower=0> dhb_id[N];



}

parameters {
 real beta[p];
 real ceta[j];
 real teta[v];
   real<lower=0> sigma;
   real<lower=0> taur;// Error standard deviation
    
   vector<lower=0>[N] u;  // Inefficiency term
    vector [N] r;
}


// The model to be estimated. We model the output
// 'y' to be normally distributed with mean 'mu'
// and standard deviation 'sigma'.
   model {
   beta ~ normal(0, 100);
   ceta ~ normal(0, 100);
   teta ~ normal(0, 100);
   sigma ~ gamma(1, 0.001);
   taur ~ gamma(1, 0.001);


  
   for (i in 1:N) {
     
      r[dhb_id[i]] ~ normal(0,taur);
   }
 
   for (i in 1:N) {
 LN_NEGATIVE_MED_FTE_IDF[i] ~ normal(beta[1] + beta[2]*LN_INPATIENTS[i] + beta[3]*LN_OUTPATIENTS[i] + beta[4]*LN_OTHER_STAFF_FTE_IDF[i] + beta[5]*LN_CLINICAL_EXP_IDF[i] 
 + beta[6]*LN_OUT_EXP_IDF[i] + beta[7]*LN_CAP_ASSETS_IDF[i] + beta[8]*LN_OTHER_OUT_IDF[i] + beta[9]*LN_OTHER_CAP_IDF[i] + beta[10]*LN_OTHER_INPATIENTS_IDF[i] + 
beta[11]*LN_OTHER_OUTPATIENTS_IDF[i]+
beta[12]*LN_OTHER_STAFF_FTE_IDF_SQ[i]+
beta[13]*LN_CLIN_OUT_IDF[i]+
beta[14]*LN_CLIN_CAP_IDF[i]+ 
beta[15]*LN_CLIN_INPATIENTS_IDF[i]+
beta[16]*LN_CLIN_OUTPATIENTS_IDF[i]+
beta[17]*LN_CLINICAL_EXP_IDF_SQ[i]+
beta[18]*LN_OUT_CAP_IDF[i]+
beta[19]*LN_OUT_INPATIENTS_IDF[i]+
beta[20]*LN_OUT_OUTPATIENTS_IDF[i]+
beta[21]*LN_OUT_EXP_IDF_SQ[i]+
beta[22]*LN_CAP_INPATIENTS_IDF[i]+
beta[23]*LN_CAP_OUTPATIENTS_IDF[i]+
beta[24]*LN_CAP_ASSETS_IDF_SQ[i]+
beta[25]*LN_INPATIENTS_OUTPATIENTS[i]+
beta[26]*LN_INPATIENTS_SQ[i]+
beta[27]*LN_OUTPATIENTS_SQ[i]+
beta[28]*TREND_STANDARDISED[i]+
beta[29]*TREND_SQUARE[i]+
 beta[30]*q1[i]+
beta[31]*q2[i]+
beta[32]*q3[i]
 -u[i] + r[dhb_id[i]], sigma); 
   }
   
for (i in 1:N) {
u[i] ~ normal(ceta[1]+ceta[2]*ln_capital_labour_ratio[i]+  ceta[3]*ln_ALOS[i]+
                             ceta[4]*ln_proportion_otsource_labour[i]+
                             ceta[5]*rural[i] + ceta[6]*tertiary[i], exp(teta[1]+teta[2]*ln_capital_labour_ratio[i]+  teta[3]*ln_ALOS[i]+
                             ceta[4]*ln_proportion_otsource_labour[i]+
                             teta[5]*rural[i] + teta[6]*tertiary[i]))T[0,];
}
}
generated quantities {
vector[N] eff;
vector[N] re;
eff = exp(-u);
re = r;
}

Any guadiance for this new convert will be appreciated.

Antony

Edit: Stan code formatted (@Max_Mantei)

Hi Antony!

You can vectorize this

for (i in 1:N) {
  LN_NEGATIVE_MED_FTE_IDF[i] ~ normal(beta[1] + beta[2]*LN_INPATIENTS[i] + .... + r[dhb_id[i]], sigma);
} 

loop, by doing:

LN_NEGATIVE_MED_FTE_IDF ~ normal(beta[1] + beta[2]*LN_INPATIENTS + .... + r[dhb_id], sigma);

I think the first thing you can do is to put all explanatory variables into one big matrix (let say, X) then you can use matrix multiplication, i.e. X*beta. For example:

LN_NEGATIVE_MED_FTE_IDF ~ normal(beta0 + X*beta + r[dhb_id], sigma);

where beta0 is the new intercept (this was beta[1] before).

Also,

for (i in 1:N) {
      r[dhb_id[i]] ~ normal(0,taur);
   }

this is not the best way to include a random effect. And r should not have length N, but length of the group size (that is the number of unique values in dhb_id).

Also, the priors are likely too wide. beta ~ normal(0, 5);, ceta ~ normal(0, 5); and teta ~ normal(0, 1); seems more appropriate. Also, we often recommend Half-Normal priors for things like your sigma and taur. You can still go for Gamma, but I’d go for something close to gamma(1,1) (which is the same as exponential(1)).

There might be more things to sort out, but let’s start with these. :)

Cheers,
Max

2 Likes

Hello Max,

Thank you very much for your kind response. I have taken in your suggestion on matrix, random effects and prior of variance. This time I have just used simple model, so that I get the idea of creating matrix.
So this is just a simple regression, with inefficiency term u, which is half-normal distributed, which is the parameter of interest.

I am getting a small error , in the line where I declare ā€˜u’ to be half normal, the warning says ā€œoutcome in truncated distribution must be univariateā€. I am not sure what that means.
Here is the code. One again, thank you.

data {
int<lower=0> N; // Number of data items
int<lower=0> P; // number of predictors in regression
matrix[N, P] X; // matrix for regression
vector[N] Y; // dependent variable in main regression
int<lower=0> dhb_id[N];// Random effects
}

parameters {
real alpha; // Intercept of the main equation
vector[P] beta;
real<lower=0> sigma; // main regression variance
real<lower=0> taur;// random effect variance
real<lower=0> lambda; // variance of inefficiency term
vector[N] u; // to extract efficiency
vector [N] r; // to extract random effects
}

// The model to be estimated.
model {
alpha ~ normal(0, 5);
beta ~ normal(0, 5);
sigma ~ normal(0, 0.001)T[0,];
taur ~ normal(0, 0.001) T[0,];
lambda ~ normal(0, 0.001)T[0,];

Y ~ normal(alpha + X*beta -u + r[dhb_id], sigma);
r ~ normal(0, taur);
u ~ normal(0,lambda)T[0,]; / IAM GETTING A WARNING HERE SAYING THAT ā€œoutcome in truncated distribution must be univariateā€ DO YOU KNOW WHAT THAT IS?

}

generated quantities {
vector[N] eff;
eff = exp(-u);
}

Hello @Max_Mantei,

I think , I figured why STAN was giving me that error. Here is the correct code. Let me know, what you think.
cheers

data {
int<lower=0> N; // Number of data items
int<lower=0> P; // number of predictors in regression
matrix[N, P] X; // matrix for regression
vector[N] Y;
int<lower=0> dhb_id[N];// dependendent variable in main regression
}

parameters {
real alpha; // Intercept of the main equation
vector[P] beta;
real<lower=0> sigma;
real<lower=0> taur;// Error standard deviation
real<lower=0> lambda;
vector<lower=0>[N] u;
vector [N] r; // to extract random effects
}

// The model to be estimated.
model {
alpha ~ normal(0, 5);
beta ~ normal(0, 5);
sigma ~ normal(0, 0.001)T[0,];
taur ~ normal(0, 0.001) T[0,];
lambda ~ normal(0, 0.001)T[0,];

Y ~ normal(alpha + X*beta -u + r[dhb_id], sigma);
for (i in 1:N) {
u[i] ~ normal(0, lambda)T[0,];
}
r ~ normal(0, taur);

}

generated quantities {
vector[N] eff;
eff = exp(-u);
}