My first Stan model - hierarchical logistic regression

Dan_Biderman · June 1, 2017, 9:29pm

Hi Ben, it took me quite a while to answer and I apologize for that. thanks for your helpful suggestion!

I just listened to your you tube talk about rstanarm and then realized and applied what you suggested here.

up to now I focused my efforts in writing a the model in Stan, which I uploaded here today (bellow) and hope to receive some feedback on it (my group-level params have low n_eff). I took your suggestion and wrote the following model in rstanarm, and no such problem occurred.

options(mc.cores = parallel::detectCores())
post <- stan_glmer(Resp13 ~ (Distance * Context ) + 
                     (Distance * Context || Subject), 
                   data = myData, family = binomial(), QR = TRUE)

comparing my model and the rstanarm model, the apparent differences are:

I had low n_eff for the group parameters and the rstanarm model didn’t have them. all params had high n_eff
I used one sigma for all betas and rstanarm model implemented sigma for each interaction between group params and subject params (is that true? can you explain that?).
yet, the posterior predictive check looks very similar

Do you have any suggestion regarding these essential differences? and how should I improve my model? in parallel I will work with rstanarm, which is very efficient for me…

again, many thanks !

Bob_Carpenter · June 2, 2017, 9:10pm

I’td be more efficient and simpler to write

transformed parameters {
  matrix[NSubj, Nx] beta = betamu + betasigma * beta_raw;
  vector[NSubj] beta0 = beta0mu + beta0sigma * beta0_raw;
}

assuming you tweaked all the rest of the types.

For efficiency and simplicity, you can also write your sampling as

beta_raw ~ normal(0, 1);
...
y ~ bernoulli_logit(beta0[s] + x * beta[s]);

And please, please indent the code so that it’s readable.

RStanArm conditions the predictor matrix, so it’s even more stable than just writing this model directly like this.

Dan_Biderman · June 3, 2017, 7:44am

Hi Bob, thanks for your reply. I will apply the changes in my code and will indent my code in future posts.

Looking at the glmer model in RStanArm, it surely contains much much more than what I have in my simple model. I’d probably prefer to use it in any case. I am checking now whether it can deal with all of my experiments.

Dan

Dan_Biderman · June 3, 2017, 9:49am

Hi again @bgoodri . I am really benefiting from working with RStanArm and ShinyStan. it’s a great experience.

stan_glmer worked great for my experiment with simpler (and better) designs (i.e., continuous * binary) but for the more complex experiment (continuous * binary_1 * binary_2 [resulting in 4 interaction terms]) with 34 subjects, as you suggested, I had a low n_eff (<10%) for the group intercept and individual intercepts, and for one of my binary predictors.

What do you suggest?
My main interest is in the group parameters (that’s what I intend to report and plot in this paper), and my subject parameters do not interest me in themselves, except for their important role in the hierarchical model. Maybe run two separate analyses for each level of one of my binary variables? or to move to complete pooling for one of my parameters?

Bob_Carpenter · June 3, 2017, 5:50pm

If the n_eff / iterations is 10% that’s fine if it remains 10% as you double number of iterations. As the posteriors get more complicated from correlations, it becomes harder and harder to draw nearly independent parameters each iteration. The problem cases show up when you get low n_eff and it stays low even when you up the number of iterations.

If you haven’t converged during warmup, it can help to increase the number of iterations. I’m not sure how RStanArm configs all that.

bgoodri · June 4, 2017, 6:23pm

Specifying QR = TRUE if you have not already done that. Otherwise, it is unlikely that you are going to be able to hand-code something that mixes better, so you are basically down to getting better data, using tighter priors on the intercept(s), running it for lots and lots of chains / iterations, or living with less precise estimates.

Dan_Biderman · June 4, 2017, 7:10pm

Hi Ben, I really appreciate your help and support here!

I already specified QR=TRUE as you initially recommended. After reading your code I am certain I won’t hand-code something like that, not at at this stage…
I won’t be able to collect alternative data - these are two experiments out of 6 where I encounter the mixing problem and I want to report the data in the paper in any case (actually, they have null results, but I have to use them).
Which prior_intercept would you use? I checked the priors help page but not sure which one to chose.
How can I specify the number of chains/ iterations in stan_glmer?
Does it matter if I pre scaled (according to Gelman, 2007) my continuous variables (x-mean(x))/2(sd(x)) and centered my binary variables (x-0.5) in the data frame before calling the function?

thanks!!

bgoodri · June 4, 2017, 8:21pm

three: Hard to say, usually normal() but you will have to think what is a reasonable distribution for log odds of the average observation.
four: Same way as in stan()
five: I never do. It makes no difference when QR = TRUE. It makes it harder to do posterior_predict.

Dan_Biderman · June 6, 2017, 9:05pm

thanks Ben and Bob.

This correspondence is very long and I guess it is tiring for everyone. I am trying hard to make this analysis work. I tried a bunch of stuff and I still encounter problems.

I simplified the data set and used just one level of one binary predictor (taking half of the data, i.e. ~10k observations), leaving me with two predictors (and one interaction) instead of three (with four interaction terms). I still find n_eff < 10% of total steps for my group intercept (when running four chains for 2000 as well as for 4000 iterations). The n_eff for the group intercept was 316/4000, and the individual intercepts were around 500-700/4000. and for the other predictors it was OK: 1650-4000/4000. I am wondering - why should sampling from this relatively simple model be so problematic? Can it be that this data is inappropriate for a hierarchical model?

I tried the following too:

increasing iter
scaling and centering the predictors or not doing so
using prior_intercept=normal(0,2.5), I thought it may be tighter as you suggested. Is it appropriate? in any case the same problem persisted.

Is this the right model for my data? I simply want to assess the group level coefficients for my predictors and I want to do so in a hierarchical model.

Not sure where to take it from here. In case you have time for that, I am attaching my csv data file (5 columns, ~20k rows)AmbiLet_Unconscious_new.csv (233.8 KB)

this is how I read it:

myData = read.csv( file="AmbiLet_Unconscious_new.csv" )

here is the RStanArm model (according to your suggestion):

full model:

post_ambi_let<- stan_glmer(ClassRef ~ (Distance * Context * Style ) + 
                     (Distance * Context * Style || Subject), 
                   data = myData, family = binomial(), QR = TRUE)

and here is a simpler model with only two predictors:

myData_print = subset(myData, myData$Style==1) # if you want to run with the simple model


post_ambi_let_simple<- stan_glmer(ClassRef ~ (Distance * Context ) + 
                                (Distance * Context || Subject), 
                                data = myData, family = binomial(), 
                                QR = TRUE)

I am grateful for all the help that I received here in the past month, and would appreciate any help now.

Topic		Replies	Views
Improving Hierarchical Model Efficiency Modeling rstan , fitting-issues , specification	7	534	February 14, 2023
Speeding up a hierarchical nonlinear model Modeling specification , performance	17	1802	February 28, 2018
[Case-study preview] Speeding up Stan by reducing redundant computation Publicity performance	8	2119	June 6, 2020
Advice on logistic regression Modeling	3	275	December 24, 2023
Stan on computing cluster: strange results CmdStan	11	1602	June 8, 2018

My first Stan model - hierarchical logistic regression

Related topics