Help with misspecified model


data {
int<lower=0> N;//Number of observations
int<lower=1> J;//Number of predictors with random slope
int<lower=1> K;//Number of predictors which are fixed effects
int<lower=1> L;//Number of customers/groups
int<lower=0,upper=1> y[N];//Binary response variable
int<lower=1,upper=L> ll[N];//Number of observations in groups
matrix[N,K] x1;
matrix[N,J] x2;
}
transformed data {
vector[J] ones = rep_vector(1, J);
}
parameters {
row_vector[J] rbeta_mu; //mean of distribution of beta parameters
row_vector<lower=0>[J] rbeta_sigma; //variance of distribution of beta parameters
row_vector[J] beta_raw[L]; //group-specific parameters beta
vector[K] beta;
}
transformed parameters {
matrix[L,J] rbeta;
for (l in 1:L)
rbeta[l] = rbeta_mu + rbeta_sigma .* beta_raw[l]; // coefficients on x
}
model {
vector[N] p;
rbeta_mu ~ normal(0,5);
rbeta_sigma ~ inv_gamma(1,1);
beta~normal(0,5);
for (l in 1:L)
beta_raw[l] ~ std_normal();

p = x1 * beta + (x2 .* rbeta[ll]) * ones; // Multiplication by vector of ones as a row-wise summation of matrix
y~bernoulli_logit§;
}


I am trying to fit a fairly simple multitlevel model. However I feel the model might be misspecified since when i tried to run the model using cmdstanR i get lot of errors.

I am a newbie to Bayesian statistics. So I am looking for help from the community to determine what is the mistake.

Thanks in advance!

Welcome to the Stan forum!

the warning says that the value of the probability parameter is nan, and the relevant section of your model reads

y~bernoulli_logit§;

this should be changed to

y~bernoulli_logit(p);

In case your model already correctly uses y~bernoulli_logit(p);: The warning message is not a concern if it comes up at the beginning. (i.e. nor further warnings later during warm-up and especially during sampling)

Also, your priors look fairly wide (assuming the values in x1 and x2 are not very small).
I would recommend to

  • start with narrower priors (e.g. normal(0,2) throughout, even for rbeta_sigma)
  • do a prior predictive check, i.e. comment out // y~bernoulli_logit(p);, run the model, and look at the distribution of the parameter p values.

Lastly, if you put
"``` Stan
[your stan model]
"````

around your Stan model, it will be easier to read for others

Hi @Guido_Biele, It actually does use it as y~bernoulli_logit( p ). I don’t why it appeared here as so.

I will try prior predictive check and narrower priors. Thank you !