Hi, I am currently trying to build Heckman selection model using STAN.
This is my stan code for the model.
data {
int<lower=0> N; //Number of total data
int<lower=0,upper=1> R[N]; //Indicator for the observation
int<lower=0> nx; //Number of coefficients for the outcome model
int<lower=0> nz; //Nmber of coefficients for the selection model
vector[N] y;
matrix[N, nx] x; //Covariates for the outcome model
matrix[N, nz] z; //Covariates for the selection model
int<lower=0> n; //Number of total coefficients
vector[n] pri_mean;
matrix[n,n] pri_var;
}
parameters {
vector[nx] beta;
vector[nz] gamma;
real<lower=0> sigma;
real rho;
}
model {
append_row(beta,gamma) ~ multi_normal(pri_mean,pri_var);
for(j in 1:N){
if (R[j] == 1) {
real err = (y[j] - x[j,] * beta) / sigma;
target += normal_lcdf((z[j,] * gamma + err * rho) / sqrt(1 - square(rho)) | 0, 1) - 0.5 * square(err) - log(sqrt(2 * pi()) * sigma);
}
else target += normal_lcdf(-z[j,] * gamma | 0, 1);
}
}
I don’t see anything wrong in this model but the result is pretty much off for some reason.
Should I tune the prior distribution or something?
Can you please help me building this model?
Thank you.
[edit: added code blocks]