Reparameterizing and Convergence with rstanarm

Hello,

I am trying to fit the following model:

stan_glmer(formula(y~logDose*sampleName+(1|block)+(0+logDose|block)),
family=binomial(link=“probit”),
data=dataSet, adapt_delta=0.99)

The simulated data is a logDose/Response dataset with 5 doses and the responses are 0’s and 1’s, with two samples and two blocks, and 20 replicates at each dose.

I am running into convergence issues;
Warning messages:
1: There were 43 divergent transitions after warmup. Increasing adapt_delta above 0.999 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
2: Examine the pairs() plot to diagnose sampling problems

The RHat and effective sample space metrics both give results that are consistent with convergence, whereas the divergent transitions do not. I looked at the pairs plot and the divergent transitions are all “below the diagonal”.

I am happy to provide the data.

I am just not clear on how effective reparameterization happens with an rstanarm model, or if you have any other suggestions.

Thank you in advance,

Ramiro

  • Operating System: Centos 7
  • rstanarm Version: 2.17.3

There are not that many choices when it comes to reparameterizing a model in rstanarm because the parameterizations (which tend to work well for a wide variety of problems) are pre-compiled. One thing that you can do — and should generally be doing unless you have only one predictor or a prior with a non-zero location — is specifying the QR = TRUE argument to reparameterize the part of the linear predictor that is common to all observations in terms of orthogonal vectors. This transformation is inverted at the end to yield coefficients that pertain to the (non-orthogonal) vectors specified in your formula.

@ramiro I definitely agree with what Ben said.

There are also some other things I can recommend but some more details would be helpful. Can you share your code for simulating the data that led to these results? I understand the structure you’re describing but it would be helpful to see e.g. what hyperparameter values you’re using.

Thank you very much!! Will simplify the code a bit (as it’s full of other extra things that would not help) and paste it here.

@jonah and @bgoodri, first of all, thank you for all your help. I have seen your name many times while reading documentation and blog posts and appreciate your work in the community with Stan and have used your insights a lot.

I am attaching the code to simulate some data. I then generate some data and run stan_glmer, getting some divergence errors. Hope this helps, any insight appreciated. I am happy to condense code or help in any way. One important question is whether getting this to converge would mean going to regular Stan, I am happy to do that but must stay rstanarm is very convenient. Thank you very much in advance for any help.

simulateData.R (4.4 KB)

Actually I don’t get any divergences when I run your model with QR=TRUE. If data is your simulated dataset then I can run the following without divergences (I also made some example plots using bayesplot):

library(rstanarm)
library(bayesplot)

data <- simulateData(...)
fit <- stan_glmer(
  y ~ logDose * sampleName + (1 | block) + (0 + logDose | block),
  family = binomial(link = "probit"),
  data = data,
  QR = TRUE,
  init_r = 0.1,
  adapt_delta = 0.999
)    

yrep <- posterior_predict(fit)
ppc_stat(data$y, yrep) + ggtitle("Overall predicted probability")

ppc_stat_grouped(data$y, yrep, group = data$block) + ggtitle("Predicted probability by block")

ppc_stat_grouped(data$y, yrep, binwidth = .01, group = interaction(data$block, data$sampleName)) + ggtitle("Predicted probability by block and sample")

# binned errors
pred_probs <- posterior_linpred(fit, transform = TRUE)
ppc_error_binned(y, pred_probs[1:6, ])

Thank you. Will find a better example shortly. I have some that can’t be fixed by just increasing adapt_delta

Ok, I just ran it, are you sure you did set.seed(1)? The graphs I get are not quite the same. I am also including the pairs plot

set.seed(1)
simulatedData <- simulateData()
fit<-stan_glmer(formula(y~logDose*sampleName+(1|block)+(0+logDose|block)),
family=binomial(link=“probit”),
data=simulatedData,
QR=TRUE,
init_r=0.1,
adapt_delta=0.999)
data<-simulatedData

yrep <- posterior_predict(fit)
ppc_stat(data$y, yrep) + ggtitle(“Overall predicted probability”)

Rplot

pred_probs <- posterior_linpred(fit, transform = TRUE)
ppc_error_binned(data$y, pred_probs[1:6, ])

Rplot02

pairs(fit,pars=c("(Intercept)",“logDose”,“sampleNameT”))

Rplot03

Ah, I’m not sure if I skipped the set.seed line by mistake, so yeah the simulated data may have been different. Even if the data were the same we would still get slightly different results since set.seed doesn’t affect the c++ rng and even specifying the seed argument when running Stan is only exactly reproducible if we have the same compiler (and more).

It could very well be the case here that the shape of the posterior is pretty sensitive to the differences in the simulated data and that a different seed could result in certain pathological reasons that don’t arise when the data is slightly different.

I will try to find some time to run it with the seed a bit later.

Thank you. I am doing simulation experiments where I am running many replicates with these and other conditions, and studying the behavior. I am getting these divergence warnings with some frequency and I am trying to figure out what to do about it.

I haven’t had a chance to run it, but one thing I forgot to mention is that your simulated data only has two different blocks and so it will likely be difficult to estimate the hierarchical variance parameters. It would not at all be surprising if the convergence problems are related to that.

Also, those Sigma values you’re using in the simulation (e.g., 0.02) mean there will be very little variation (on the logit scale, or I guess probit in this case. I usually think in terms of logit).

Dear @jonah, you are touching upon a very important point with respect to the comments on few blocks and small variation. What I am trying to do is run simulations with data generated at varying levels of blocks and variation and study the results (e.g. 2 blocks, 4 blocks, 16 blocks). At first glance, indeed it seems that the issues seem to be appearing with few blocks. I was hoping that insufficient data and tiny variation would get reflected in uncertainty at the level of wider credible intervals, or multiple modes, etc , and that I could see that and compare with situations when there are more blocks and the level of variation is higher What would it take to get convergence when there is insufficient information (for example very few blocks as you pointed out)?