Advi to avoid divergent transitions?

jonsjoberg · May 7, 2017, 9:47pm

Is there any circumstance when using (full-rank) advi can help fit models that NUTS is having issues with divergent transitions on? My math is nowhere near good enough to figure it out on my own, and I can’t remember any paper on ADVI discussing that question.

I try to work a lot with hierarchical glm’s but right now most of my time working with Stan is spent on re-parameterizing models to avoid divergent transitions, and I would really like to cut down on that. Or maybe I should just suck it up and become better/faster at parameterizing models =)

betanalpha · May 8, 2017, 2:26am

No. If NUTS is struggling then ADVI will do even worse.

sakrejda · May 8, 2017, 2:00pm

I recall someone was working on running NUTS and ADVI versus a couple hundred (?) models and data sets from BUGS and other sources. Did that ever get published anywhere? Arxiv?

jonsjoberg · May 8, 2017, 7:38pm

This paper https://arxiv.org/abs/1603.00788, is the only one I know of that does sort of that (but on 10 models) but iirc they never discuss models that are problematic to fit.

But it’s a shame that advi doesn’t help, but that is at least consistent with my experiments. Divergent transitions are my nemeses…

sakrejda · May 8, 2017, 7:49pm

A lot of people seem to do model tests really slowly, if it’s really
sucking up your time you might want to post that as a question, I’m sure
the rest of the Stan team would have good suggestions, especially for
hierarchical glm’s.

betanalpha · May 8, 2017, 8:04pm

Just remember that if you were using another algorithm then you’d very likely be suffering from similar problems, only without the diagnostic. Divergences don’t hurt people, pathological models hurt people.

jonsjoberg · May 9, 2017, 7:50am

Yeah, I know, which is why I don’t feel comfortable with moving away from Stan. But ideally, I would like to specify my hierarchical models in a centered way, and then Stan(or some other library) could automagically transform and use what ever parameterization that is most efficient for the data I have, but I do realise that it may not be possible.

But is there any ongoing work/plan/ideas in this area?

More specifically my problems are usually, I want to fit some regression and I have a bunch hierarchical covariates. Now for every covariate I test which parameterization works (best), or if I sequentially add covariates I get a bunch of models for each I have to make sure their parameterization works. Usually I go through centered -> non-centered (Section 26.6. in the manual) -> hard sum-to-zero (Section 8.7 in the manual) to mitigate divergent transitions. Changing adapt_delta very seldom helps. And this is what takes time…

Is this a good workflow, or are there more efficient ways to work?
And what do I do when I still have divergent transitions after going through this process? I guess the answer to that is very model specific, but in general I have interpreted that as with the current data it is not possible to fit the model, and I must look at changing what covariates I’m trying to fit or what priors I’m using.

(Sorry if this got long and off-topic, and as you probably can tell I’m not a “real” statistician, but rather comes from a design/product development background)

sakrejda · May 9, 2017, 10:48am

jonsjoberg http://discourse.mc-stan.org/u/jonsjoberg
May 9

Yeah, I know, which is why I don’t feel comfortable with moving away from
Stan. But ideally, I would like to specify my hierarchical models in a
centered way, and then Stan(or some other library) could automagically
transform and use what ever parameterization that is most efficient for the
data I have, but I do realise that it may not be possible.

It’s possible, we just aren’t there yet. Know anybody who wants to
contribute? :)

But is there any ongoing work/plan/ideas in this area?

More specifically my problems are usually, I want to fit some regression
and I have a bunch hierarchical covariates. Now for every covariate I test
which parameterization works (best), or if I sequentially add covariates I
get a bunch of models for each I have to make sure their parameterization
works.Usually

I go through centered → non-centered (Section 26.6. in the manual) →

hard sum-to-zero (Section 8.7 in the manual) to mitigate divergent
transitions.

You shouldn’t need to test all these. Use noon-centered unless you have
plenty of observations in all groups. Even then non centered works fine.

Think about identifiability first, it comes up in specific contexts. If you
think the issue will come up just code for it to begin with.

Also, priors in these models are critical. You won’t know what
implications of priors are without simulation in a complex hierarchical
model. Check that simulation from the weak prior yields reasonable values
for parameters. Seriously, check!

Changing

adapt_delta very seldom helps. And this is what takes time…

You should be able to do relatively short runs to check this stuff. A few
dozen iterations at most to see where stepsize end up and if that’s good
you can do a longer run to see divergences.

Is this a good workflow, or are there more efficient ways to work?

And what do I do when I still have divergent transitions after going
through this process?

If you are using gamma cdf or incomplete gamma functions, or the bets
binomial a few math lib calculations weren’t/aren’t as good as they should
be. I have some improvements in that should make the beta binomial better
and a branch that makes gamma models easier to fit. It’s mostly numerical
inaccuracy that messes with adaptation.

I

guess the answer to that is very model specific, but in general I have
interpreted that as with the current data it is not possible to fit the
model, and I must look at changing what covariates I’m trying to fit or
what priors I’m using.

Sometimes it also means we have a problem to fix so don’t be afraid to file
ask questions on the list or just make a reproducible example and file an
issue. Stan should be able to fit hierarchical glm’s.

Sometimes it also means your model doesn’t for your data in a really bad
way. Check that too.

(Sorry if this got long and off-topic, and as you probably can tell I’m not

a “real” statistician, but rather comes from a design/product development
background)

In the age of machine learning and data science I think you’re doing fine.

jonsjoberg · May 9, 2017, 8:19pm

Got any specific tips on how to think about identifiability? I know I should think about it, but I’m not sure how to diagnose in a proper way if the model isn’t identifiable.

The way I’ve been trying checking priors in this context is by running the model without conditioning it on the data, is that the right way of doing it? And what does it mean if I get divergent transitions when doing that?

sakrejda · May 14, 2017, 1:16pm

I don’t think I have an answer to this that’s not workshop-length. In general if you have x=f(a,b) and you only have information about x without any independent information about a and b it’s easy to get in trouble. Then you generally want to model x itself and make a or b transformed parameters, but that’s not always straightforward.

Running the model without data is a good way to figure out the priors, but only if you can check that outputs (parameters and simulated data) fall into reasonable ranges, with reasonable distributions for what your system might produce. Sometimes you have a specific system that will have parameters with interpretable meaning (reproductive rate, tensile strength, something like that) and you can easily check, otherwise if you’re making a generic model for a certain type of data it’s harder.

jonsjoberg · May 18, 2017, 6:50am

Thanks for all the input, I find question like these the hardest to wrap my head around, partly because for most other issues there is a lot of good information on how to solve/work around them. So I guess it’s an indication that it’s not easy to summarise it into something general (or maybe it is just me that are having these issues).

saudiwin · May 18, 2017, 12:10pm

If you’re using relatively straightforward hierarchical GLMs, then you can use the Rstanarm package which has excellent parameterizations for most linear model applications. Some of their defaults are quite sophisticated and often faster than what I can come up with, plus you can use the easy modeling syntax from the R package mle.

In particular, Rstanarm can handle any model mle can, which includes all random intercept/random slope models, plus you can fit some more exotic models if you figure out the mle syntax.

jonsjoberg · May 19, 2017, 11:35am

Thanks for the tip, I didn’t even think about Rstanarm, it seems to generate very efficient models for many of my problems =)

Topic		Replies	Views
Divergent transitions - possible to relate to certain parameters? Algorithms	4	1186	May 26, 2017
Blog: Taming Divergences in Stan Models Publicity	8	1082	February 19, 2018
Divergent transitions in hierarchical model with random effects Modeling ecology , hierarchical-model , divergences	14	1273	June 14, 2021
Are divergences acceptable in this case? Modeling rstan , fitting-issues , divergences	5	857	May 19, 2021
ADVI and NUTS Modeling	2	588	April 2, 2019

Advi to avoid divergent transitions?

Related Topics