Restructuring The Forums - Kick-off

For a while I’ve been aware that the current structure of the forums is not ideal. So I’d like to restructure the forums to make them more usable for everybody. The main tools we have for this are the category/subcategory structure, tags and the default view for the home page.

I think the biggest problem with the current structure is that categories don’t neatly map to what people actually need and it is thus hard to follow just the topics one is interested in. The relatively high volume of questions of users seeking help can also make the rarer but important discussions about development/governance/… harder to notice. From the question-asker side, it is often not clear how to categorize a given inquiry as it may e.g. involve both a specific interface and a modelling problem.

To start the process, I’d like to gather the usage scenarios we have for the forums and some basic considerations on their requirements. Discussion on changing how the forum work should then follow from those scenarios.

I’d be very glad to get your feedback on how you use the forums or how you think other people use the forums (we are unlikely to get a lot of specific feedback from novice users)

Here are the usage scenarios I’ve come up so far:

Installation problems

I have a problem installing Stan. I want to be able to easily find if there is a solution for a similar problem already. If not, I want to share my error message/other problem and get help.

We also want to keep track on how difficult Stan is to install and try to remove the biggest recurring pain points.

Help with a specific model

I am a practitioner/statistician who wants to analyse a specific dataset and has a problem with a specific model I am trying to build/use. It might be a syntax error or convergence issues. I want to get some help.

In many cases the problem is highly specific to the model at hand and searching for solutions tends to be less efficient, although I want to be able to find posts where people discuss models of similar broader class.

We also want such users to easily discover relevant parts of Stan documentation, tutorials, … as quite frequently the main problem is some misunderstanding of Bayesian statistics. The solution will quite frequently not be a guide on how to implement the model the user initially wanted, but rather a suggestion to change the overall approach to the problem.

Help with a general math/modelling problem

I am a researcher/developer in statistics and want to see if I can implement a broader class of models in Stan. I might also have a problem with a specific math part in a bigger model.

Often such questions are quite hard and resolving them may represent a research project of its own. Still it is good to have those out, even if they remain unresolved as others may want to find people who worked on similar problems before.

I want to answer questions

I want to help. I want to be able to easily see questions that I find interesting/important/in need of attention/within my expertise/…

I’ll note that there is a tension in that people answering questions are what makes this site work and we really want them to have the smoothest experience possible. On the other hand, we can much more easily intervene with this group than with users asking questions, because it is people who have been along for a while and we can easily contact them directly.

I want to follow the latest development in Stan

I want to be able to see what is the buzz in Stan development, what cool new stuff people have built with Stan, meetings I can join etc.

Developer discussion

I as a developer/potential contributor want to discuss a specific endeavour in making Stan better, get feedback for proposed changes, …

We want to keep continuity with older discussions of similar topics.


Did I miss something? Do you see other considerations that should feed into further discussions on forum structure?

5 Likes

How do the different interfaces (is that even the right word?) to Stan and the tools for working with results from Stan fit into this taxonomy? A couple of usage scenarios:

Information about tools

I want to keep up with the latest release of some tool and maybe ask about specific features of the tool. Or maybe I want help installing the tool, or get help making the tool do something I think it should be able to do.

Help while using tools

I’m looking for help with a specific model, but I don’t know if the problem is the model, the approach, or specific to the tools I’m using. I might tag my question with all the tools I’m using. Alternatively, I might guess what tool is most relevant and go post wherever I find posts about that tool.

Having both tags and categories with shared names probably isn’t the most useful way to organize posts, but these seem like two different usage scenarios that might need different organizing tools/structure.

3 Likes

Perhaps questions about hardware? Everything from ‘what computer should I buy next thats good for running Stan’ to, ‘will this run on M1 Mac’ or even ‘how do I run this model on a cluster’ ?

3 Likes

Big +1 to @BFiles and @jroon

I have one category to add, and a couple of comments:

New to applied Bayesian modeling

I’m just starting to use Stan, coming from a frequentist background or perhaps no statistics background at all. I’m still getting the hang of what’s general to Bayesian stats, what’s specific to Stan, and what is specific to brms. Maybe something’s going wrong but I don’t have the background to understand whether my question is about my specific model or about Bayesian statistics more generally. Maybe nothing is going wrong but I don’t understand how to interpret the output (What the heck are these so-called CIs in my output?). Maybe I just want to do something simple but I can’t figure out how. Maybe I can say what I want with words (e.g. “posterior distribution for a function of parameters”) but can’t begin to approach the problem in code (whoever heard of a generated quantities block anyway, and who knew that I could just compute my function iteration-wise over an an existing model posterior?). Maybe I’ve fit a model because somebody told me to, but now I don’t actually understand the model that I’ve fit or how to interpret its parameters.

Three additioanl comments

A comment on Help with a general math/modelling problem:

  • I think it would be a mistake to define this category in a way that implies that it’s primarily for experts at the forefront of applied statistics. I think this category would also be appropriate for askers who expect their questions to have well-known answers. Like “I have a question about change-of-variables” or “I have a question about funnel geometries”.

And a parallel comment on Help with a specific model:

  • Perhaps this should also be the place for questions about a specific Stan function or algorithm. For example, “How does reduce-sum slice multidimensional arrays?”; “How should I select the warm-up length for a computationally expensive model”; “Should I expect speedup if I run this class of model on the GPU?”

And a comment on Developer discussion:

  • I think we need to figure out where to put questions/topics of the following flavor.

I have a question about some part of how Stan works under-the-hood. I probably need an answer from a developer or somebody who is developer-adjacent. But I don’t expect that the question/answer will guide or contribute to Stan development. I’m way too shy to even consider posting under a category that is “for developer discussion of proposed improvements to Stan”, and I’d be mortified to clutter such a category if my post would be seen as annoying.

The Stan devs are wonderfully accessible/patient/helpful on this forum for answering questions like this one. But being accessible and patient isn’t enough if users feel too intimidated to ask questions. I think that we either need to redefine “Developer discussion” as “Questions for developers” (which admittedly would clutter actual developer discussion, but maybe everybody is fine with that), or we need a separate category for “Stan under-the-hood” or something similar.

3 Likes

the fundamental problem is the the Discourse forums model is good for asking and answering questions, but not very good for aggregating and indexing questions that get answered. somehow things on StackOverflow get indexed in a very useful way - it would be nice to do this, but we’re not SO and the Discourse API doesn’t give us the access we need to do our own search and indexing. nor do we have the resouces - machines and bandwidth - to do this.

so we see the same topics re-introduced, the same questions asked over and over. the ecosystem seems to be that people get into hanging out on the forums and answering questions for whatever reason (more fun than writing a dissertation, perhaps?), and so questions get answered, but it’s far from efficient.

9 Likes

Thanks @martinmodrak Whenever I have a question about Stan, I search for keywords (and I pretty much always find something, amazingly), and I don’t use categories. The truth hurts, I should be writing a paper right now…but I probably wouldn’t use categories even if they were better formulated, I just browse the latest posts because its often educational.

With @mitzimorris comment in mind, I wonder if it trying to create an exhaustive list of categories is the best way forward. I see that there may be a few categories that get hit a lot, and certain people may want/need to follow those types of questions (installation issues, developer stuff). For other posts, I’m not sure I see what the benefit would be of categorizing more stringently or carefully; that is not to say I doubt there may be benefit, it just isn’t apparent to me based on how I use the site.

2 Likes

Thanks everybody for the input so far. All of the points are very reasonable. A few follow-up thoughts below.

Generally, I want to be very light on possible changes to forum structure at this stage, and first make sure we have an agreement on the goals (usage scenarios + additional constraints) as I think discussing forum structure without a clear view of the goals might just result in a difficult discussions that leads nowhere.

Incorporoting the tools dimension (as suggested by @BFiles) is definitely important. the “how Stan works under-the-hood” is also important to have in mind (thx @jsocolar). I think hardware discussions (thx @jroon) are a bit rarer, but it is definitely a case we need to have some support for.

I think the “New to Bayes” use case (thx @jsocolar) is interesting as it raises some questions about the limits of what this forums are. To what extent do we want to support general Bayesian stats here? The biggest risk to being overly inclusive topic-wise is that we end up having much more topics than the community can give enough attention to (we are already IMHO quite on the edge of our capacity and while the amount of people answering questions seems to increase a bit over time, the increase is quite slow). I think “I want to learn Bayes in Stan” is within our scope, but not sure about just “I want to learn Bayes”.

Good point. My intention was not to suggest this should be a separate category (I don’t want to jump to specific forum structures just yet) but to make it clear that there is a spectrum of questions and the extreme ends of the spectrum may have different needs, especially with regards to how relevant the discussion is for the broder community and how long the discussion can take to reach a conclusion.

This is an important point that I want to address in some length. I agree that we get a certain amount of quite clearly repetitive questions. I am however currently not convinced that this is solvable with better indexing/forum structure. Beyond some very clear duplicates, we also get a lot of questions of the “how to analyze my data” and “why is my model broken” types. In some sense, many of those questions are very similar to previous ones, but quite often seeing the similarity requires a relatively deep knowledge of Stan/Bayesian stats. Another sense in which they are similar is that giving an answer like “You are confused about basic stuff. Read those two textbooks and then come back.” would often be somewhat justified, although probably resulting in no better understanding on the part of the asker (for the record, I very much don’t want this to be the kind of forums where we give this type of answer frequently).

So there is a difference between what “same topic” means for a novice user and what it means for experienced users and my impression (I don’t have actual counts) is that complete duplicates visible to novice users are not common while topics that are immediately similar to an experienced user are very common. And while I think Stack Overflow generally solves this problem quite well for programming and many other fields, I think statistics is a special hell as the gap between what people need and what people are taught is huuuuge. The Cross Validated site (SO for stats) IMHO suffers from a vast amount of repetitive questions despite the clearly superior indexing SO has. And IMHO a big contributor is that askers lack the knowledge/vocabulary to link their specific problem to already answered questions.

So IMHO a better way to respond to repeated questions is to build a repository of links to good introductory material on the topics that are repeatedly needed/misunderstood (e.g. “how to interpret model coefficients/fits”, “how factors are coded in brms”) and use those to quickly build short, but still helpful answers containing primarily a link and a short notice on why this is relevant to the asker. I kind of tried to move this way with the #howto posts, but agree this has not been very successful so far.

I agree there is a real pain point in the default Discourse search not being very good (if I remember correctly Discourse team is on the record that they don’t want to invest too much in search and that for improved search you should just use Google restricted to your domain). So a part of the solution might also be to make searching the site with Google accessible from Discourse UI (that’s almost certainly feasible).

Thanks again for all the thougts!

4 Likes

Discourse can be potentially used for knowledge base. See the discussion here.

Just bumping this up in case anyone wants to add something. Hopefully my long answer didn’t completely stiffle the discussion.

In particular, I would like to know if @mitzimorris is convinced by my view of the problem of repeated questions or whether there is something I am missing.

From my side, I realized that my analysis of duplicates was based primarily on questions about models and modelling and mostly ignored installation problems. However installation problems are the source of a sizeable chunk of traffic on this site and there are indeed quite a lot of more-or-less exact duplicates and we probably should have a better way of handling those.

EDIT one more use case that might need some special treatment (and related to the “Question for developers” use case) is people needing help with developing packages that use Stan under the hood.