I’m developing Heckman selection model using Rstan. Data for Heckman selection model includes missing values which are NA’s in R and we cannot just delete those values since they play important role to the model. To deal with these missing data with Rstan, I tried to implement Ch.7 of Stan User Manual (Missing data & Partially known parameters). However, when I tried to do that, I got some error. Below is the code I wrote (Practice code using t-distribution to check missing data handling is working),
data {
vector[1000] x;
vector[800] y_obs;
}
parameters {
real beta;
real mu;
real sig;
real v;
vector[200] y_mis;
}
model {
(y_obs-(betax))~ student_t(v, mu, sig);
(y_mis-(betax))~ student_t(v, mu, sig);
}
Below is the code for the R file.
x <- rnorm(1000,0,2)
eps <- rt(1000,3)
y=1.5*x+eps
mis <- sample(1:1000, 200, replace = F)
mis
y[mis] = NA
y_mis = y[is.na(y)]
y_obs = y[!is.na(y)]
y_obs[800]
length(y_mis)
#x = x[!is.na(y)]
x
length(x)
schools_dat <- list(x,y_obs)
fit <- stan(file = ‘test.stan’, data = schools_dat,
iter = 1000, chains = 4)
print(fit)
plot(fit)
Below is the error message that I got.
DIAGNOSTIC(S) FROM PARSER:
Warning (non-fatal):
Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.
If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform.
Left-hand-side of sampling statement:
subtract(y_obs,multiply(beta,x)) ~ student_t(…)
Warning (non-fatal):
Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.
If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform.
Left-hand-side of sampling statement:
subtract(y_mis,multiply(beta,x)) ~ student_t(…)
hash mismatch so recompiling; make sure Stan code ends with a blank line
In file included from C:/Users/g1310/Documents/R/win-library/3.3/BH/include/boost/config.hpp:39:0,
from C:/Users/g1310/Documents/R/win-library/3.3/BH/include/boost/math/tools/config.hpp:13,
from C:/Users/g1310/Documents/R/win-library/3.3/StanHeaders/include/stan/math/rev/core/var.hpp:7,
from C:/Users/g1310/Documents/R/win-library/3.3/StanHeaders/include/stan/math/rev/core/gevv_vvv_vari.hpp:5,
from C:/Users/g1310/Documents/R/win-library/3.3/StanHeaders/include/stan/math/rev/core.hpp:12,
from C:/Users/g1310/Documents/R/win-library/3.3/StanHeaders/include/stan/math/rev/mat.hpp:4,
from C:/Users/g1310/Documents/R/win-library/3.3/StanHeaders/include/stan/math.hpp:4,
from C:/Users/g1310/Documents/R/win-library/3.3/StanHeaders/include/src/stan/model/model_header.hpp:4,
from file1fe47b81e23.cpp:8:
C:/Users/g1310/Documents/R/win-library/3.3/BH/include/boost/config/compiler/gcc.hpp:186:0: warning: “BOOST_NO_CXX11_RVALUE_REFERENCES” redefined
define BOOST_NO_CXX11_RVALUE_REFERENCES
^
:0:0: note: this is the location of the previous definition
cc1plus.exe: warning: unrecognized command line option “-Wno-ignored-attributes”
SAMPLING FOR MODEL ‘test’ NOW (CHAIN 1).
Rejecting initial value:
Error evaluating the log probability at the initial value.
…
Rejecting initial value:
Error evaluating the log probability at the initial value.
Initialization between (-2, 2) failed after 100 attempts.
Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] “Error : Initialization failed.”
[1] “error occurred during calling the sampler; sampling not done”
Is there something wrong with my code?