Error Chain 1: Initialization between (-2, 2) failed after 100 attempts in beta regression with missing values

I’m learning Stan and I’m having some issues with a beta regression model with missing values in the response variable (y_i). This is a simple example I have adapted from Stan’s user guide.

I’m getting the following error; Chain 1: Initialization between (-2, 2) failed after 100 attempts. I have tried different priors and initial values but I can’t find the issue. I would appreciate any help.

Here the reprex:

 modelBeta <- '
data {
int<lower=1> N; // sample size
int<lower=1> K; // K predictors
matrix[N,K] X;   
int<lower = 0> N_obs; // number observed values
int<lower = 0> N_mis; // number missing values
int<lower = 1, upper = N_obs + N_mis> ii_obs[N_obs]; // ii index of observed
int<lower = 1, upper = N_obs + N_mis> ii_mis[N_mis]; // ii index of missing
real y_obs[N_obs]; //observed y
}

parameters {
vector[K] theta; 
real y_mis[N_mis]; //declaring the missing y
real<lower=0> phi; // precision parameter
}

transformed parameters {
vector[K] beta;
real y[N]; // creating y from the missing and observed values
y[ii_obs] = y_obs;
y[ii_mis] = y_mis;
beta = theta * 10; 
}

model {
vector[N] Xbeta; 
vector[N] mu;    // mean
vector[N] A;     // shape 1
vector[N] B;     // shape 2
Xbeta = X * beta; // 

for (i in 1:N) { 
mu[i] = inv_logit(Xbeta[i]);  
}

// priors
theta ~ normal(0, 5);  
phi ~ normal(50, 6)T[10,]; 

A = mu * phi;
B = (1.0 - mu) * phi;

y ~ beta(A, B); // likelihood

}'


dataBeta = list(K = 2, # reg coeff
            N = nrow(df), # sample size
            N_obs = length(which(!is.na(df$y))), 
            N_mis = length(which(is.na(df$y))),
            ii_obs = which(!is.na(df$y)),
            ii_mis = which(is.na(df$y)),
            y_obs = dffinal[!is.na(df$y),]$y,
            X = cbind(1, dffinal$x)
)

inits <- function(){list(beta = c(1, -2), phi = 50)} 

modelBetaStan <- stan(model_code = modelBeta,
               model_name = "modelBeta",
               pars = c('beta','y'),
               init = inits,
               data = dataBeta,
               iter = 3000, 
               warmup=1000,
               thin=2, 
               chains = 1,
               #init_r = 0.1,
               verbose = F)

The data:

structure(list(x = c(0.315436277608387, 0.280370750720613, 0.486625703331083, 
0.139468205277808, 0.427984498743899, 0.438639514567331, 0.668681832356378, 
0.359224375989288, 0.482591016520746, 0.21918343573343, 0.537497533927672, 
0.71751586268656, 0.296247687982395, 0.378941531083547, 0.633785757608712, 
0.568315198668279, 0.243228513374925, 0.350267397402786, 0.351632580393925, 
0.583203369844705, 0.475067807757296, 0.597562691802159, 0.476844088803046, 
0.624280558805913, 0.394071015366353, 0.21999414882157, 0.639211126836017, 
0.717367511428893, 0.484367697197013, 0.294406629633158, 0.441814195853658, 
0.749953552451916, 0.34408438722603, 0.767910395259969, 0.586691897339188, 
0.722617477248423, 0.226285071624443, 0.540573595301248, 0.792694895621389, 
0.191202209051698, 0.331462367996573, 0.705584382661618, 0.644309107563458, 
0.679112414736301, 0.522327049542219, 0.443862275360152, 0.646250957716256, 
0.718958919192664, 0.245399728463963, 0.314960127789527, 0.331370893679559, 
0.239075349154882, 0.264986011409201, 0.29242066219449, 0.513924737577327, 
0.277373457630165, 0.186441061273217, 0.260934120928869, 0.518302704556845, 
0.247985989181325, 0.424590824660845, 0.552970835892484, 0.772401164472103, 
0.573478720104322, 0.411603615176864, 0.350441648042761, 0.419012019317597, 
0.411789783835411, 0.27156481242273, 0.586045498400927, 0.388565924577415, 
0.329408108070493, 0.500795336882584, 0.776899359119125, 0.563245315453969, 
0.537288401834667, 0.699657129822299, 0.642345221666619, 0.683818969107233, 
0.164057195046917, 0.421667840681039, 0.519578709150665, 0.743805337511003, 
0.78797685415484, 0.126461805542931, 0.504556180629879, 0.613319917721674, 
0.274119681934826, 0.310515567404218, 0.613426691107452, 0.734868062892929, 
0.24687173764687, 0.350696592428722, 0.413809401076287, 0.734498503105715, 
0.372607507021166, 0.462221824633889, 0.18766736141406, 0.121102022007108, 
0.640263846190646), y = c(NA, 0.747509775042333, 0.492394262387609, 
0.59194983914758, 0.652782552519119, 0.366353843121988, 0.537095538431117, 
0.636614387493952, 0.529642579270747, 0.642106922579767, NA, 
0.403107105898393, 0.655971259828925, 0.508284509717981, 0.449552435210377, 
0.460332561550309, 0.63276444274298, 0.538222513302409, 0.658812696511799, 
0.363186565434664, NA, 0.272593779003222, 0.510558814081722, 
0.350410742198868, 0.531139530533241, 0.552163524666354, 0.505317070552646, 
0.389102109448171, 0.621489170761543, 0.54349184016041, NA, 0.394499578405804, 
0.509905683827794, 0.361189594591152, 0.41231339186569, 0.516388036088181, 
0.701484181483151, 0.383815065764163, 0.317664248046283, 0.712063277676156, 
NA, 0.305784576201768, 0.333794607445367, 0.385372550600018, 
0.451175110977932, 0.45862521987426, 0.30827131420256, 0.36127365962365, 
0.684051025575653, 0.620294232201049, NA, 0.635846398765718, 
0.601872315080372, 0.582203070909791, 0.443400734170925, 0.722409679842508, 
0.676727931933188, 0.607281038282272, 0.575344587335419, 0.643035584672108, 
NA, 0.410966722377319, 0.400578600960204, 0.416598789959043, 
0.510685846856633, 0.700942943916562, 0.581792942519034, 0.566208600319954, 
0.527799799461874, 0.419690353625395, NA, 0.617187633060272, 
0.429983648341422, 0.420655112644463, 0.431658133180366, 0.517042976573732, 
0.389708282970836, 0.46595919249334, 0.405962699743444, 0.6262547621744, 
NA, 0.502739711296802, 0.433869602368414, 0.285508591967233, 
0.759553787492099, 0.462592379881512, 0.402641036977976, 0.577706293446536, 
0.583051526001764, 0.530400418017423, NA, 0.62580013882423, 0.554181296563294, 
0.569779458557199, 0.37607566092419, 0.571169742808632, 0.497735390775854, 
0.640604284626682, 0.69872166605371, 0.434997199761835)), .Names = c("x", 
"y"), row.names = c(NA, -100L), class = "data.frame")

This needs to be

real<lower=0,upper=1> y_mis[N_mis]; //declaring the missing y

in your case.

1 Like

@bgoodri Awesome. Thanks for the quick response. It works.
It’s interesting having to restrict a missing value.
The actual Stan message is not very informative. Is there any way of knowing what variable is causing the issue?

You would need to install the latest version of rstan from source since there are no binaries for it on CRAN yet.

I will give it a go! Thanks again