My first Stan model - hierarchical logistic regression

thanks Ben and Bob.

This correspondence is very long and I guess it is tiring for everyone. I am trying hard to make this analysis work. I tried a bunch of stuff and I still encounter problems.

I simplified the data set and used just one level of one binary predictor (taking half of the data, i.e. ~10k observations), leaving me with two predictors (and one interaction) instead of three (with four interaction terms). I still find n_eff < 10% of total steps for my group intercept (when running four chains for 2000 as well as for 4000 iterations). The n_eff for the group intercept was 316/4000, and the individual intercepts were around 500-700/4000. and for the other predictors it was OK: 1650-4000/4000. I am wondering - why should sampling from this relatively simple model be so problematic? Can it be that this data is inappropriate for a hierarchical model?

I tried the following too:

  1. increasing iter
  2. scaling and centering the predictors or not doing so
  3. using prior_intercept=normal(0,2.5), I thought it may be tighter as you suggested. Is it appropriate? in any case the same problem persisted.

Is this the right model for my data? I simply want to assess the group level coefficients for my predictors and I want to do so in a hierarchical model.

Not sure where to take it from here. In case you have time for that, I am attaching my csv data file (5 columns, ~20k rows)AmbiLet_Unconscious_new.csv (233.8 KB)

this is how I read it:

myData = read.csv( file="AmbiLet_Unconscious_new.csv" )

here is the RStanArm model (according to your suggestion):

full model:

post_ambi_let<- stan_glmer(ClassRef ~ (Distance * Context * Style ) + 
                     (Distance * Context * Style || Subject), 
                   data = myData, family = binomial(), QR = TRUE) 

and here is a simpler model with only two predictors:

myData_print = subset(myData, myData$Style==1) # if you want to run with the simple model


post_ambi_let_simple<- stan_glmer(ClassRef ~ (Distance * Context ) + 
                                (Distance * Context || Subject), 
                                data = myData, family = binomial(), 
                                QR = TRUE) 

I am grateful for all the help that I received here in the past month, and would appreciate any help now.