Divergent transitions - a primer

First, thanks for taking time to respond in depth.

I’ll start with what I 100% agree with:

I agree, the topic overpromised and failed to set expectations. I also agree that background knowledge is important - here’s what I wrote in the thread on FAQ, which I believe is mostly a rephrasing of your position.

I’ve tried to rewrite the post to reflect this better. In particular, I renamed “strategies” to “hints” (because that’s what they are) and tried to better set expectations.

That’s a great point. Do you believe the wiki is now, after some rewriting less likely to make people assume the advice is “sufficient” ?

There might also be a bit of a difference in values between my and your approaches to pedagogy - from your writing, I get the impression that you put large value on people getting deep understanding of stats/modelling/domains even if this could mean fewer people engage with your writing. I put large value in letting a lot of people improve their stats/modelling, even if this means those improvements are small and incremental. I don’t think it is useful to try to resolve this difference here, I believe the approaches are complementary and there is enough common ground to often find solutions that satisfy both.

With that said I probably should describe my goals with this topic in more details. I really don’t want this to be a checklist. It also cannot be a definitive guide. I want it to be a map. Or a tower you climb to see where you may go - even though you might not be able to resolve the details of all the destinations. Many of the points are the most repeated recommendations I write in response to inquiries here (and I do believe they help, but I admit I do not keep track of ratio of resolved issues). So it felt useful to have this somewhere discoverable.

Speaking of specific goals/usage scenarios there are actually multiple, so maybe the topic could be reorganized along those goals (although I currently don’t see exactly how):

  • Discoverability of resources - most (all?) of the resources I linked are frequently mentioned in answers to user’s questions and also frequently considered valuable by the question askers. Since this is now a first hit when searching for “divergent”, users should be more likely to find the resources themselves.
  • Better questions - We get many questions where beginner users write a multiline brms formula or a 500 lines of Stan code, run, see divergences, incrementally move to adapt_delta = 0.999, max_treedepth = 20, still see divergences and post. If some of them discover this topic and instead try some of the “strategies”, e.g. find minimal model that still has divergences and post it along with a generator of simulated data, I would consider this a major success. As I said, I don’t aim for people being completely able to resolve their issues, but hope this nudges them to make one more step (and hopefully gain some generalizable insight on the way).
  • Discoverability of tricks by advanced users - Many of the hints are things I would have wanted to know while I was learning to use Stan (e.g. the “narrow priors around true values” hint). I believe even some advanced users may not be aware of all of those “attacks” - and they might be able to use them right just from the short hints.

I agree that the ideal situation would be if for each hint we would have a linked case study going into detail. We unfortunately don’t have that. I tried to address some of the particular points (many of which are good and relevant), but I still think this should be a map, not a definitive guide and so I preferred brevity and kept hints for which we currently don’t have good “further reading”. I guess good case studies or at least forum topics for each of the point might already exist, so I hope we will be able to find them and link to them from here.

I also admit that I read some of the points you raised as a bit aggressive, especially when the linked resources provide exactly the details you ask for (e.g. bayesplot vignette on diagnostics discusses how to use and interpret the mcmc_parcoord plot). I believe this was not your intention, but I have to say it didn’t really help. I also repeat that goal is to let users discover resources/tricks and be able to do one more step in diagnosis, not to be a definitive guide (and I understand this was not obvious from the previous version of the wiki).

Some more discussion follows:

I agree “identifibaility” is not best term and I like your usage of “degeneracy” for this kind of issues, but I hesitated to use it as most of the content on the forums uses “identifiability”. If others are for adopting “degeneracy” as the preferred term, this would definitely be a place where it should be changed.

I tried to provide some minimal guidance: “make your parameters independent (uncorrelated), constrained by the data and close to N(0,1)”. I agree it is not exhaustive neither is it straightforwardly applicable, but do you think this is bad guidance?

Also note that this is a wiki post and you are welcome to expand/edit it or add links or correct mistakes directly (it’s OK if you prefer discussing below, though).

Thanks for the input, I believe it made the wiki better.

10 Likes