I’m developing Heckman selection model using Rstan. Data for Heckman selection model includes missing values which are NA’s in R and we cannot just delete those values since they play important role to the model. To deal with these missing data with Rstan, I tried to implement Ch.7 of Stan User Manual (Missing data & Partially known parameters). However, when I tried to do that, I got some error. Below is the code I wrote (Practice code using t-distribution to check missing data handling is working),

data {

vector[1000] x;

vector[800] y_obs;

}

parameters {

real beta;

real mu;

real sig;

real v;

vector[200] y_mis;

}

model {

(y_obs-(beta*x))~ student_t(v, mu, sig);
(y_mis-(beta*x))~ student_t(v, mu, sig);

}

Below is the code for the R file.

x <- rnorm(1000,0,2)

eps <- rt(1000,3)

y=1.5*x+eps

mis <- sample(1:1000, 200, replace = F)

mis

y[mis] = NA

y_mis = y[is.na(y)]

y_obs = y[!is.na(y)]

y_obs[800]

length(y_mis)

#x = x[!is.na(y)]

x

length(x)

schools_dat <- list(x,y_obs)

fit <- stan(file = ‘test.stan’, data = schools_dat,

iter = 1000, chains = 4)

print(fit)

plot(fit)

Below is the error message that I got.

DIAGNOSTIC(S) FROM PARSER:

Warning (non-fatal):

Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.

If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform.

Left-hand-side of sampling statement:

subtract(y_obs,multiply(beta,x)) ~ student_t(…)

Warning (non-fatal):

Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.

If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform.

Left-hand-side of sampling statement:

subtract(y_mis,multiply(beta,x)) ~ student_t(…)

hash mismatch so recompiling; make sure Stan code ends with a blank line

In file included from C:/Users/g1310/Documents/R/win-library/3.3/BH/include/boost/config.hpp:39:0,

from C:/Users/g1310/Documents/R/win-library/3.3/BH/include/boost/math/tools/config.hpp:13,

from C:/Users/g1310/Documents/R/win-library/3.3/StanHeaders/include/stan/math/rev/core/var.hpp:7,

from C:/Users/g1310/Documents/R/win-library/3.3/StanHeaders/include/stan/math/rev/core/gevv_vvv_vari.hpp:5,

from C:/Users/g1310/Documents/R/win-library/3.3/StanHeaders/include/stan/math/rev/core.hpp:12,

from C:/Users/g1310/Documents/R/win-library/3.3/StanHeaders/include/stan/math/rev/mat.hpp:4,

from C:/Users/g1310/Documents/R/win-library/3.3/StanHeaders/include/stan/math.hpp:4,

from C:/Users/g1310/Documents/R/win-library/3.3/StanHeaders/include/src/stan/model/model_header.hpp:4,

from file1fe47b81e23.cpp:8:

C:/Users/g1310/Documents/R/win-library/3.3/BH/include/boost/config/compiler/gcc.hpp:186:0: warning: “BOOST_NO_CXX11_RVALUE_REFERENCES” redefined

# define BOOST_NO_CXX11_RVALUE_REFERENCES

^

:0:0: note: this is the location of the previous definition

cc1plus.exe: warning: unrecognized command line option “-Wno-ignored-attributes”

SAMPLING FOR MODEL ‘test’ NOW (CHAIN 1).

Rejecting initial value:

Error evaluating the log probability at the initial value.

…

Rejecting initial value:

Error evaluating the log probability at the initial value.

Initialization between (-2, 2) failed after 100 attempts.

Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.

[1] “Error : Initialization failed.”

[1] “error occurred during calling the sampler; sampling not done”

Is there something wrong with my code?