Text for warning message

If you start a new topic in the Modeling category it pre-fills your post with info about using Stan syntax highlighting and latex, so we could do something similar with info about the existence of a FAQ.

One thing that would be nice (but I have no idea if it’s possible) would be that if the word “divergence” or “divergent transition” is detected in the user’s post it automatically points them to something about divergences. Discourse can already tell you if it thinks your question may have already been asked (“Your topic is similar to…” or whatever it says) but I don’t know if we can customize is to look for keywords. @martinmodrak any idea if that’s possible? (Maybe this should be in a separate topic, sorry.)

when you start a new post, on the right hand side, Discourse displays “Your post is similar to” - is it possible to suggest “The answer to your question might be in the FAQ” - or would that be annoying?

It might be helpful to suggest that users have recommended “default” priors for certain kinds of models and domain areas. For example, in a hierarchical linear model a normal(0, 1) is usually a reasonable “default,” but this is not true in general. Consider a beta(5, 5) as a prior for a dirichlet. If I’m working with a model on vague simulated data where I’m trying to estimate category proportions, I totally wouldn’t choose standard normal for a dirichlet. It just doesn’t really make sense. Point being, we could be more specific with common vanilla starter models as to what priors users should begin with, and that they can tweak after they’ve explored the model a bit more. We could even, for this set of starter models, include a description at how to think about this prior in context of the likelihood. A bit of a thread hi-jack, but having this framework could cut back on a lot of issues with divergences, and also confusion among new users about setting priors.

1 Like

Hi, yes, I agree. At some point we would like to write something comprehensive on this. In the meantime, please add your thoughts to the priors wiki: https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations
Thanks!

1 Like

In my opinion many of the discussions in this thread are missing an important point. Statistical computation cannot be taken for granted.

Modeling is hard. Computation is hard. Yet most people in the Stan community have been told to do an analysis without the proper training. If they’re lucky then their initial model might fit okay, but if their model isn’t fitting well then they have to confront all of the technical debt that they’ve ignored. There is no automated response or simple flowchart or short sequence of recommendations that will guide someone without the proper training to an adequate resolution.

Part of the problem is that the underlying pathologies that cause computational problems are not universal – a divergence is a many-to-one diagnostic – and in order to resolve the problem a user has to be experienced in either
-(1) All of the possible pathologies (impossible since we don’t know them).
-(2) All of the common pathologies for the modeling techniques being employed (requires significant training and/or experience ).
-(3) Bespoke investigations of diagnostic problems to identify their causes and potential resolutions (research level work – for example https://betanalpha.github.io/assets/case_studies/identifiability.html is a short introduction).

Extensive writing can help with (2), but it still requires a tremendous investment from the user that many are not able to afford. It also requires developing far more resources than are available – hierarchical models alone are far more complicated than monolithic centering and non-centering reparameterizations. Developing these resources also requires (3) and the people with the experience and time to implement those studies.

The challenges only compound when the resolution requires modeling considerations in which users have not yet been trained, such as iterative domain expertise elicitation and the discipline to distinguish between refining ones own domain expertise and prior hacking. Beyond my own writing I have found exactly one book that discusses this at any depth – the collected essays of I.J. Good – and it’s not a common read.

Ultimately all we can offer with a warning is any potential quick fixes, such as increasing adapt_delta while also emphasizing that it might help but often won’t, and then an explanation that investigating computational problems requires significant statistical training along with links to the relevant pedagogical material. Any explicit recommendation will be abused by users who do not understand the context of the recommendations, just like users abuse adapt_delta today and users will abuse pithy recommendations to change their priors were that to be offered.

This is not a warning text problem – it’s a pedagogy and user expectation problem and needs to be addressed as such.

2 Likes

I took the liberty to create the Divergent transitions - a primer - General - The Stan Forums permalink, currently pointing to Divergent transitions - a primer, if anyone wanted to followup on this, but the discussion is probably far from done, although it has mostly faded away (e.g. at Discourse - issue/question triage and need for a FAQ)

I also agree with @betanalpha about the important role of pedagogy and user expectation (though probably not about how to address those best). I still think linking to an overview + collection of additional resources is the best we can currently do.

I think that the only potentially really dangerous recommendations that were mentioned are those concerning priors where there is a real risk of biasing the results substantially. I think that all the other recommendations floating around only pose the risk of wasting the user’s time. I agree being very careful in wording the prior advice is warranted.

4 Likes

I agree with Martin that a reference page is a good idea. For one thing, once we have a warning message, users will want to know what to do. If we don’t have a reference page, users are just going to google it. So we might as well give some reasonable recommendations. Such recommendations are not a replacement for pedagogy; they are part of pedagogy and they can assist further pedagogy. (I say this as an author of textbooks.)

Regarding “only post the risk of wasting the user’s time,” let me play economist for a moment and say that wasting time is a real risk. Opportunity cost is real. Some possible results of wasting the user’s time include: (1) fitting fewer models and ending up with a model with serious problems because the user has not had the time to fully explore modeling possibilities, (2) fitting a poor version of an existing model because the user never realized the benefits of strong and informative priors, (3) giving up on Stan entirely and switching to an approximate algorithm such as lme4 which gives fast results but requires that the user work within a narrow menu of modeling possibilities, (4) giving up on Bayesian inference entirely. These are real risks; I’ve seen all of them happen.

5 Likes