Error Chain 1: Initialization between (-2, 2) failed after 100 attempts in beta regression with missing values


#1

I’m learning Stan and I’m having some issues with a beta regression model with missing values in the response variable (y_i). This is a simple example I have adapted from Stan’s user guide.

I’m getting the following error; Chain 1: Initialization between (-2, 2) failed after 100 attempts. I have tried different priors and initial values but I can’t find the issue. I would appreciate any help.

Here the reprex:

 modelBeta <- '
data {
int<lower=1> N; // sample size
int<lower=1> K; // K predictors
matrix[N,K] X;   
int<lower = 0> N_obs; // number observed values
int<lower = 0> N_mis; // number missing values
int<lower = 1, upper = N_obs + N_mis> ii_obs[N_obs]; // ii index of observed
int<lower = 1, upper = N_obs + N_mis> ii_mis[N_mis]; // ii index of missing
real y_obs[N_obs]; //observed y
}

parameters {
vector[K] theta; 
real y_mis[N_mis]; //declaring the missing y
real<lower=0> phi; // precision parameter
}

transformed parameters {
vector[K] beta;
real y[N]; // creating y from the missing and observed values
y[ii_obs] = y_obs;
y[ii_mis] = y_mis;
beta = theta * 10; 
}

model {
vector[N] Xbeta; 
vector[N] mu;    // mean
vector[N] A;     // shape 1
vector[N] B;     // shape 2
Xbeta = X * beta; // 

for (i in 1:N) { 
mu[i] = inv_logit(Xbeta[i]);  
}

// priors
theta ~ normal(0, 5);  
phi ~ normal(50, 6)T[10,]; 

A = mu * phi;
B = (1.0 - mu) * phi;

y ~ beta(A, B); // likelihood

}'


dataBeta = list(K = 2, # reg coeff
            N = nrow(df), # sample size
            N_obs = length(which(!is.na(df$y))), 
            N_mis = length(which(is.na(df$y))),
            ii_obs = which(!is.na(df$y)),
            ii_mis = which(is.na(df$y)),
            y_obs = dffinal[!is.na(df$y),]$y,
            X = cbind(1, dffinal$x)
)

inits <- function(){list(beta = c(1, -2), phi = 50)} 

modelBetaStan <- stan(model_code = modelBeta,
               model_name = "modelBeta",
               pars = c('beta','y'),
               init = inits,
               data = dataBeta,
               iter = 3000, 
               warmup=1000,
               thin=2, 
               chains = 1,
               #init_r = 0.1,
               verbose = F)

The data:

structure(list(x = c(0.315436277608387, 0.280370750720613, 0.486625703331083, 
0.139468205277808, 0.427984498743899, 0.438639514567331, 0.668681832356378, 
0.359224375989288, 0.482591016520746, 0.21918343573343, 0.537497533927672, 
0.71751586268656, 0.296247687982395, 0.378941531083547, 0.633785757608712, 
0.568315198668279, 0.243228513374925, 0.350267397402786, 0.351632580393925, 
0.583203369844705, 0.475067807757296, 0.597562691802159, 0.476844088803046, 
0.624280558805913, 0.394071015366353, 0.21999414882157, 0.639211126836017, 
0.717367511428893, 0.484367697197013, 0.294406629633158, 0.441814195853658, 
0.749953552451916, 0.34408438722603, 0.767910395259969, 0.586691897339188, 
0.722617477248423, 0.226285071624443, 0.540573595301248, 0.792694895621389, 
0.191202209051698, 0.331462367996573, 0.705584382661618, 0.644309107563458, 
0.679112414736301, 0.522327049542219, 0.443862275360152, 0.646250957716256, 
0.718958919192664, 0.245399728463963, 0.314960127789527, 0.331370893679559, 
0.239075349154882, 0.264986011409201, 0.29242066219449, 0.513924737577327, 
0.277373457630165, 0.186441061273217, 0.260934120928869, 0.518302704556845, 
0.247985989181325, 0.424590824660845, 0.552970835892484, 0.772401164472103, 
0.573478720104322, 0.411603615176864, 0.350441648042761, 0.419012019317597, 
0.411789783835411, 0.27156481242273, 0.586045498400927, 0.388565924577415, 
0.329408108070493, 0.500795336882584, 0.776899359119125, 0.563245315453969, 
0.537288401834667, 0.699657129822299, 0.642345221666619, 0.683818969107233, 
0.164057195046917, 0.421667840681039, 0.519578709150665, 0.743805337511003, 
0.78797685415484, 0.126461805542931, 0.504556180629879, 0.613319917721674, 
0.274119681934826, 0.310515567404218, 0.613426691107452, 0.734868062892929, 
0.24687173764687, 0.350696592428722, 0.413809401076287, 0.734498503105715, 
0.372607507021166, 0.462221824633889, 0.18766736141406, 0.121102022007108, 
0.640263846190646), y = c(NA, 0.747509775042333, 0.492394262387609, 
0.59194983914758, 0.652782552519119, 0.366353843121988, 0.537095538431117, 
0.636614387493952, 0.529642579270747, 0.642106922579767, NA, 
0.403107105898393, 0.655971259828925, 0.508284509717981, 0.449552435210377, 
0.460332561550309, 0.63276444274298, 0.538222513302409, 0.658812696511799, 
0.363186565434664, NA, 0.272593779003222, 0.510558814081722, 
0.350410742198868, 0.531139530533241, 0.552163524666354, 0.505317070552646, 
0.389102109448171, 0.621489170761543, 0.54349184016041, NA, 0.394499578405804, 
0.509905683827794, 0.361189594591152, 0.41231339186569, 0.516388036088181, 
0.701484181483151, 0.383815065764163, 0.317664248046283, 0.712063277676156, 
NA, 0.305784576201768, 0.333794607445367, 0.385372550600018, 
0.451175110977932, 0.45862521987426, 0.30827131420256, 0.36127365962365, 
0.684051025575653, 0.620294232201049, NA, 0.635846398765718, 
0.601872315080372, 0.582203070909791, 0.443400734170925, 0.722409679842508, 
0.676727931933188, 0.607281038282272, 0.575344587335419, 0.643035584672108, 
NA, 0.410966722377319, 0.400578600960204, 0.416598789959043, 
0.510685846856633, 0.700942943916562, 0.581792942519034, 0.566208600319954, 
0.527799799461874, 0.419690353625395, NA, 0.617187633060272, 
0.429983648341422, 0.420655112644463, 0.431658133180366, 0.517042976573732, 
0.389708282970836, 0.46595919249334, 0.405962699743444, 0.6262547621744, 
NA, 0.502739711296802, 0.433869602368414, 0.285508591967233, 
0.759553787492099, 0.462592379881512, 0.402641036977976, 0.577706293446536, 
0.583051526001764, 0.530400418017423, NA, 0.62580013882423, 0.554181296563294, 
0.569779458557199, 0.37607566092419, 0.571169742808632, 0.497735390775854, 
0.640604284626682, 0.69872166605371, 0.434997199761835)), .Names = c("x", 
"y"), row.names = c(NA, -100L), class = "data.frame")

#2

This needs to be

real<lower=0,upper=1> y_mis[N_mis]; //declaring the missing y

in your case.


#3

@bgoodri Awesome. Thanks for the quick response. It works.
It’s interesting having to restrict a missing value.
The actual Stan message is not very informative. Is there any way of knowing what variable is causing the issue?


#4

You would need to install the latest version of rstan from source since there are no binaries for it on CRAN yet.


#5

I will give it a go! Thanks again