Stan struggles with probabilities that evaluate to 0 or 1 and with uniform distributions especially with very constrained ranges (e.g. your beta, and gamma for example). The reason is that flat priors are more difficult to regularise and therefore induces geometries which are more difficult to explore. The issue in your model is the choice of priors. Here is a general piece about prior choice in Stan.
You can get very similar constraints to yours using the normal distribution. Here’s a re-written version that should give you what you want following the above advise.
data {
int<lower=1> N;
int<lower=1> J;
int<lower=0,upper=1> y[N];
int<lower=1,upper=J> ll[N];
vector<lower=1>[N] t;
vector<lower=0>[N] pa;
}
parameters {
real alpha;
vector[J] beta;
vector[J] gamma;
}
model {
alpha ~ normal(0,5);
beta ~ normal(0,0.5);
gamma ~ normal(0,0.5);
for (n in 1:N)
y ~ bernoulli_logit( alpha + beta[ll[n]]*t[n] + gamma[ll[n]]*pa[n] );
}
The other thing that MCMC with limited iterations is generally sensitive to is the starting point. Stan also has the optimizing and vb (variational bayes) functions. You can use these to get decent starting values for your model, which can greatly reduce time to convergence. If you’re using optimizing (L-BFGS under the hood), make sure to run it a few dozen times at least on a dataset your size and pick the MAP with the best log posterior to start from.
Here are the results with 4 chains and just 500 iterations (it looks like you need more).
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
alpha 0.79 0.00 0.01 0.77 0.78 0.79 0.80 0.81 266 1.01
beta[1] 0.00 0.00 0.01 -0.01 0.00 0.00 0.01 0.02 808 1.00
beta[2] 0.00 0.00 0.00 -0.01 0.00 0.00 0.00 0.01 839 1.01
beta[3] 0.00 0.00 0.01 -0.01 0.00 0.00 0.00 0.01 842 1.00
beta[4] 0.00 0.00 0.01 -0.02 -0.01 0.00 0.01 0.02 828 1.00
beta[5] 0.00 0.00 0.00 -0.01 0.00 0.00 0.00 0.01 805 1.01
beta[6] 0.00 0.00 0.00 -0.01 0.00 0.00 0.00 0.01 1110 1.00
beta[7] 0.00 0.00 0.00 -0.01 0.00 0.00 0.00 0.01 1008 1.00
beta[8] 0.00 0.00 0.00 -0.01 0.00 0.00 0.00 0.01 1104 1.00
beta[9] 0.00 0.00 0.00 -0.01 0.00 0.00 0.00 0.01 871 1.00
beta[10] 0.00 0.00 0.00 -0.01 0.00 0.00 0.00 0.01 908 1.00
beta[11] 0.00 0.00 0.00 -0.01 0.00 0.00 0.00 0.01 1532 1.00
beta[12] 0.00 0.00 0.01 -0.01 0.00 0.00 0.00 0.01 1125 1.00
beta[13] 0.00 0.00 0.01 -0.01 0.00 0.00 0.00 0.01 839 1.01
beta[14] 0.00 0.00 0.00 -0.01 0.00 0.00 0.00 0.01 892 1.00
beta[15] 0.00 0.00 0.01 -0.01 0.00 0.00 0.00 0.01 960 1.00
beta[16] 0.00 0.00 0.00 -0.01 0.00 0.00 0.00 0.01 841 1.00
beta[17] 0.00 0.00 0.01 -0.01 0.00 0.00 0.00 0.01 1075 1.00
beta[18] 0.00 0.00 0.00 -0.01 0.00 0.00 0.00 0.01 613 1.01
beta[19] 0.00 0.00 0.00 -0.01 0.00 0.00 0.00 0.01 957 1.00
beta[20] 0.00 0.00 0.01 -0.01 0.00 0.00 0.00 0.01 1102 1.00
beta[21] 0.00 0.00 0.01 -0.01 0.00 0.00 0.01 0.02 755 1.01
beta[22] 0.00 0.00 0.01 -0.02 -0.01 0.00 0.01 0.02 803 1.00
beta[23] 0.00 0.00 0.00 -0.01 0.00 0.00 0.00 0.01 1045 1.00
beta[24] 0.00 0.00 0.00 -0.01 0.00 0.00 0.00 0.01 1013 1.01
beta[25] 0.00 0.00 0.00 -0.01 0.00 0.00 0.00 0.01 1043 1.00
beta[26] 0.00 0.00 0.01 -0.01 0.00 0.00 0.01 0.02 930 1.00
beta[27] 0.00 0.00 0.01 -0.01 0.00 0.00 0.00 0.01 829 1.00
beta[28] 0.00 0.00 0.01 -0.01 0.00 0.00 0.00 0.01 917 1.00
beta[29] 0.00 0.00 0.01 -0.01 0.00 0.00 0.00 0.01 678 1.00
beta[30] 0.00 0.00 0.01 -0.01 0.00 0.00 0.00 0.01 866 1.00
gamma[1] 0.00 0.00 0.01 -0.02 -0.01 0.00 0.01 0.02 967 1.00
gamma[2] 0.00 0.00 0.01 -0.02 -0.01 0.00 0.01 0.02 1489 1.00
gamma[3] 0.00 0.00 0.01 -0.02 -0.01 0.00 0.01 0.02 1010 1.00
gamma[4] 0.00 0.00 0.02 -0.03 -0.01 0.00 0.01 0.03 927 1.00
gamma[5] 0.00 0.00 0.01 -0.01 0.00 0.00 0.00 0.01 1456 1.00
gamma[6] 0.00 0.00 0.01 -0.02 -0.01 0.00 0.01 0.02 1379 1.00
gamma[7] 0.00 0.00 0.01 -0.01 -0.01 0.00 0.00 0.01 1796 1.00
gamma[8] 0.00 0.00 0.01 -0.01 0.00 0.00 0.01 0.01 1539 1.00
gamma[9] 0.00 0.00 0.01 -0.02 -0.01 0.00 0.01 0.02 1066 1.00
gamma[10] 0.00 0.00 0.01 -0.02 -0.01 0.00 0.01 0.02 1167 1.00
gamma[11] 0.00 0.00 0.01 -0.01 0.00 0.00 0.01 0.01 2377 1.00
gamma[12] 0.00 0.00 0.01 -0.02 -0.01 0.00 0.00 0.02 1240 1.00
gamma[13] 0.00 0.00 0.01 -0.02 -0.01 0.00 0.01 0.02 1032 1.00
gamma[14] 0.00 0.00 0.01 -0.02 -0.01 0.00 0.01 0.01 981 1.00
gamma[15] 0.00 0.00 0.01 -0.02 -0.01 0.00 0.01 0.02 1118 1.00
gamma[16] 0.00 0.00 0.01 -0.01 0.00 0.00 0.00 0.01 1062 1.00
gamma[17] 0.00 0.00 0.01 -0.02 -0.01 0.00 0.01 0.02 1218 1.00
gamma[18] 0.00 0.00 0.01 -0.01 0.00 0.00 0.00 0.01 872 1.00
gamma[19] 0.00 0.00 0.01 -0.02 -0.01 0.00 0.01 0.02 1543 1.00
gamma[20] 0.00 0.00 0.01 -0.02 -0.01 0.00 0.01 0.02 1390 1.00
gamma[21] 0.00 0.00 0.01 -0.02 -0.01 0.00 0.01 0.02 892 1.00
gamma[22] 0.00 0.00 0.01 -0.02 -0.01 0.00 0.01 0.02 944 1.00
gamma[23] 0.00 0.00 0.01 -0.01 0.00 0.00 0.00 0.01 1417 1.00
gamma[24] 0.00 0.00 0.01 -0.02 -0.01 0.00 0.01 0.02 1208 1.00
gamma[25] 0.00 0.00 0.01 -0.02 -0.01 0.00 0.01 0.02 1516 1.00
gamma[26] 0.00 0.00 0.01 -0.02 -0.01 0.00 0.01 0.02 981 1.00
gamma[27] 0.00 0.00 0.01 -0.02 -0.01 0.00 0.01 0.02 974 1.00
gamma[28] 0.00 0.00 0.01 -0.02 -0.01 0.00 0.01 0.02 1055 1.00
gamma[29] 0.00 0.00 0.01 -0.02 -0.01 0.00 0.01 0.02 823 1.00
gamma[30] 0.00 0.00 0.01 -0.02 -0.01 0.00 0.01 0.02 980 1.00
lp__ -349169.63 0.35 5.72 -349181.79 -349173.22 -349169.50 -349165.69 -349158.96 265 1.02
Samples were drawn using NUTS(diag_e) at Mon Sep 30 11:55:12 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Warning messages:
1: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess
2: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess
End-to-end code in R if you want to replicate the results:
require(rstan)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)
x=read.csv("data_stan.csv",header=T,stringsAsFactors = F)
model_str="
data {
int<lower=1> N;
int<lower=1> J;
int<lower=0,upper=1> y[N];
int<lower=1,upper=J> ll[N];
vector<lower=1>[N] t;
vector<lower=0>[N] pa;
}
parameters {
real alpha;
vector[J] beta;
vector[J] gamma;
}
model {
alpha ~ normal(0,5);
beta ~ normal(0,0.5);
gamma ~ normal(0,0.5);
for (n in 1:N)
y ~ bernoulli_logit( alpha + beta[ll[n]]*t[n] + gamma[ll[n]]*pa[n] );
}
"
M=stan_model(model_code=model_str)
d=list(N=NROW(x), J=30, y=x$y, ll=x$ll, t=x$t, pa=x$pa)
o=optimizing(M,data=d,verbose=T)$par
inits=function()
list(alpha=o[1], beta=o[2:31], gamma=o[32:length(o)])
sampling(M,data=d,iter=500,init=inits)