Hello all!, I am an undergraduate statistics major at my university. I have taken one statistical inference course, a probability course, and have experience with regression modeling. As a curious undergraduate, I wanted to learn about Bayesian Modeling and through browsing reddit I found out about Stan! As a complete beginner (literally reading the first few chapters explaining bayesian thinking from Statistical Rethinking). I had a few questions about the resources manual that I found on the Stan page.
First off, I want to be able to have a very much so applied approach to learning about Bayesian statistics and Bayesian modeling, the case studies and papers, lectures and other resources sound like a great resource in addition to the book I am reading. My question is, where should I start? The lectures? Papers? Courses?
if so, which papers should I read as a beginner? Should I take a full course? What would be the best approach if you were all learning bayesian modeling and Bayesian statistics from scratch? There are a lot of resources so I want to make sure I spend time where it should be allocated effectively.
The book that you are using (Statistical Rethinking) is in my opinion, the best book for starting from scratch as a beginner to Bayesian modeling. I really like his free lectures on YouTube as well.
In addition, I think a great way for a beginner to start is by using the brms package, which uses Stan on the backend. It is extremely flexible and is also very user friendly, well-documented, and has numerous examples and vignettes. It has it’s own tag on this forum.
In addition, Solomon Kurz has put together a wonderful resource where he has worked through the Statistical Rethinking book using brms, rather than the rethinking package.
Hello! Thanks for the response. So my end goal is to really be able to understand Bayesian modeling inside and out, and so I’ve heard programming in pure stan is the best way to do this. However, your advice I’ve also heard because it’s kind of like “training wheels” before learning stan. Do you think after using brms for awhile I’d be able to use pure Stan at some point? Or maybe even rstanarm? I’ve heard with stan you can really have more customization with your models and lots of flexibility. Of course, my modeling is never gonna get as advanced as how people make it in stan, but will brms eventually lead me to using pure stan?
Statistical Rethinking is great. Michael Betancourt (a Stan developer) has a series of writings (Writing - betanalpha.github.io) that can also serve as a good intro that covers a bit more of the technical underpinnings. As you’re reading, a fun exercise can be to find existing models or papers in your field that have open data and try to re-create them in Stan. If you have time, a course in real analysis (or working through a textbook e.g. Real Analysis, Modern Techniques and Their Applications, by Gerald Folland) and some higher level linear algebra will go a long way towards understanding some of the more nuanced theory that can get glossed over in applied texts.
It depends on what you are doing. Sure, maybe programming the models in rstan can better help with understanding what is going on. And then maybe using a text like BDA3 or something is good too.
As far as starting out with applied stuff though, Statistical Rethinking and using brms is in my opinion much better. Others might disagree.
rstanarm is much like brms but is not as flexible.
brms is pretty flexible and customizable! I use it for work all the time. Look at some of the vignettes Paul has written or some of his papers, and you will be hard pressed to need anything more just starting out in modeling. In addition, you can use the make_stancode() and make_standata() functions in brms to help you with your programming if you want to do it in rstan.
There are multiple reasons to learn and do Bayesian statistics, and the best initial approach depends on your initial goals. Ultimately you might want to do everything, but you have to start somewhere.
One reason for doing Bayesian stats is that the models you want to fit might not be available or tractable in existing frequentist software packages, or you might care about getting accurate uncertainty quantification in a way that frequentist approximations do not deliver. For this goal, brms is an extremely useful entry point and bridge from the frequentist world to the Bayesian world. It is flexible enough to do a whole lot of things that aren’t available to you in a frequentist framework, and it gives exact Bayesian inference for all of them.
Another reason for doing Bayesian stats is pedagogical. Many (most?) treatments of frequentist statistics confuse students in two ways. First, as a frequentist you have to think about the probability distributions that underly the likelihoods in your models, but you ALSO have to think about the (asymptotic) sampling distributions of the test statistics. I have regularly interacted with students who are learning statistics under a frequentist mode of inference and who get so hung up on F-distributions that they lose track of the fact that the likelihood for their ANOVA is Gaussian. Second, frequentist treatments of introductory statistics often create confusion by emphasizing tests for checking assumption violations as opposed to the assumptions themselves. So I see students who know how to spot obvious problems when looking at a plot of model residuals, but who have lost track of the fact that the likelihood involves an iid Gaussian. It gets worse when students begin working with random effects; their heads fill up with details about numerator and denominator degrees of freedom, but they don’t emerge with a clear understanding of what the likelihood function is or what the model means.
As a Bayesian, you won’t have to think about F-distributions or Chi-squared distributions. On the other hand, writing Bayesian models directly in Stan (or BUGS/JAGS for that matter) is the best way in the world to make sure that you understand the likelihood functions and therefore the core assumptions that underpin your models. Go figure; there’s no better way to get a handle on the likelihood function than to consistently write it down using elegant syntax. As you can see from the fact that this second point is much longer than the first, my personal advice is to start with pure Stan to cultivate your understanding of the likelihood functions first, and then to branch back out towards brms for its superb practicality.
There’s also a third reason why it sometimes makes sense to choose Bayesian over frequentist analysis, and that’s if you desire to use informative priors. brms is flexible enough to incorporate a decent variety of informative priors, but my caution here is that you will not be able to consistently formulate a good prior model unless and until you have a handle on the likelihood function. So again, my personal advice would be to start in pure Stan.
But of course your mileage may vary, and this is all subjective.
@v4gadkari Of course, my modeling is never gonna get as advanced as how people make it in stan
You really don’t know that!
I would recommend that you consider starting with RStan rather than brms or any interface (or maybe better, use them both side by side). Its not that difficult to learn to use Stan, if you are highly motivated to learn (which you clearly are). But I think there are advantages to working with the Stan code yourself, in terms of beginning to grasp the concepts at a higher level and learning to be creative. This would be the “slow” route. (@jsocolar just said everything I wanted to say and much more that I could have, while I was writing this.)
As someone with a couple liberal arts degrees, I’m still surprised by the lowly place of theory in courses, books, and discussions on modeling. There’s so little. In my experience, intro courses are all about introducing students to debates about fundamental ideas and the history of the field, even though the ideas are often over their heads. There are lots of interesting and difficult debates about probability theory that, in my view, would be worth your while to start looking at. Trying to dig into the opening chapter of Cox’s Algebra of Probable Inference, for instance, could be fun
Thanks for your advice @cmcd and @jsocolar! I will address the two of you here.
If I were to have an answer to your question as to what my purpose for doing bayesian stats and my goals are, it would be the following:
I recently came off of a semester of frequentist inference, and while the concepts made sense and it was interesting, I really really had the urge to learn the other side of the coin, and that being Bayesian inference. My professor himself does research in Bayesian Optimization, yet he taught a class in frequentist inference. In a discussion with him after class, he said that after learning bayesian inference in his undergrad, his perspective on how he thought about data analysis completely changed, and he said that the bayesian way of thinking is much more intuitive for the average person as oppose to the frequentist thinking. For this reason I had the curiousity to try and learn some of it on my own rather than wait for the class itself, and to enlighten myself on that perspective of data analysis.
I want to really understand the meaning of likelihood a bit better. My professor touched briefly on the concept of likelihood, and we even computed likelihood functions by hand and derived maximum likelihood estimators… and while the underlying concept stuck we never really used it or had much of an emphasis on it when we talked about the frequentist inference. Likelihood is a concept which I get at a surface level, but through bayesian inference I want to really know more about it and make sure I understand it, because it is such an important concept in statistics.
Frequentist Inference did not really make me feel like I was quantifying uncertainty as best as I could. The idea of p values and rejecting assumptions based on them just did not feel that intuitive for me. Sure, I can go through the process of frequentist inference get a p value and tell you whether to reject and fail to reject a hypothesis. But for me making such a statement just felt like I was missing something. In that class, and many times Ive learned frequentist inference, it was always “We found some p value which fell below or above the threshold of alpha .05 and then we can ultimately reject or fail to reject our hypothesis”, I was almost programmed to make such statements after frequentist inference and it didnt feel like I was quantifying uncertainty as best as I could.
So as a final response to you, do you suggest I ultimately start with Stan first then based on my goals? I dont expect myself to learn Bayesian statistics and Bayesian modeling so quickly, and I understand it will be a lengthy process. I want to be able to really understand why I’m doing the things I am doing. And I am almost hesitant to use any software which will do all of the heavy lifting for me.
Thanks for your response as well, I read your comments on the other thread. I am actually reading Kruschke’s book right now as well. But I have been considering Statistical Rethinking as well just because I have heard the author has a unique way of explaining Bayesian statistics and also because it is primarily in Stan, and not jags like Krushkes is. Do you recommend I read krushkes first then statistical rethinking? Or do you think I could comfortably look at both side by side? A lot of the things I mentioned in response to @jsocolar can be said here as well, where I am not in some kind of rush to learn Bayesian statistics and bayesian modeling. Its gonna take time and it will be slow learning if I use stan? so be it. I really want to make sure I understand this at a deep level. At the same time, i also do not want to worry on too much technicalities. Its weird because on one hand I get the advice of “dont worry about the nuts and bolts, just apply!” and i get that, but at the same time if I want to know what I am doing I need to know enough of the nuts and bolts to where writing stan code comes from a mindset of thinking about the problem in a bayesian perspective, and going through a workflow where I am using my own fundamental knowledge and not relying on the package to do all the work for me.
Sorry that was such a long response, but these were just my thoughts.
@v4gadkari Do you recommend I read krushkes first then statistical rethinking? Or do you think I could comfortably look at both side by side?
You can do both at once. I don’t think you’ll benefit too much from Kruschke’s JAGS code, but plenty of his discussion could be interesting and helpful (Ch. 1,2, 4, 5,6,7,9,12), and you can work through your selections from the book while starting on Statistical Rethinking.
I read Kruschke’s first, and then McElreath’s Rethinking. Kruschke’s book is really good and covers some things that are not in Rethinking (and vice versa), but in my opinion, for trying to get to grips with the intuition of Bayesian reasoning and what sort of insights you are able to get from Bayesian models, Rethinking is somewhat better. My impression is that Rethinking starts you off from a Base level and it is almost like you are learning statistics in a really intuitive way from scratch, very quickly getting you to a point of going far beyond what you learned in a typical stats class (at least the ones I took!). Kruschke sort of assumes you already know t-tests and regression and so on and then offers something like ‘here is how to implement this in a Bayesian model’.
Both books are really good though so you can’t go wrong with either. You could watch some of the Rethinking Lectures at high speed and see if the approach fits.
I want to make a few comments but keep in mind that my perspective here is somewhat in the minority, especially amongst the proponents of the popular introductory texts that have been recommended in this thread, so take it with a corresponding grain of salt.
The challenge with understanding “Bayesian modeling” is that it’s not just a single topic but rather a collection of topics, each of which are rich and subtle on their own. In particular before one can understand Bayesian inference one first has to understand probabilisitic modeling and computation, and that requires a solid conceptual foundation in probability theory. Developing an approximately self-contained understanding of these topics requires significant effort and time, and most pedagogical references attempt to reduce this burden by compromising on the material in different way. While some compromises can, in my opinion, be incredibly useful stepping stones towards a more thorough understanding, many are not. In particular I find that far too many pedagogical references give the reader with a false sense of confidence that they’ll know enough to build their own analyses, which then leaves them overwhelming when they encounter the complexities of those real analyses.
For example most frequentist presentations ignore probabilisitic modeling altogether, relying on implicit models assumed by common estimators and tests. Presentations that introduce Bayesian equivalents of these estimators and tests then just propagate the uncritical assumption of those implicit models. Frequentist software tools like lm, and their Bayesian equivalents, built around these approaches make these uncritical assumptions the default in practical implementations.
Computational considerations muddy the water even further; if computational tools cannot faithfully evaluate inferences from a given model then it becomes impossible to determine what features of the estimated inferences are due to the model and which are artifacts of the computational approximation. If one is taken the model for granted then this isn’t even an obvious concern, but when one is trying to understand the modeling assumptions and their consequences it becomes a ubiquitous challenge.
One of the incredible features of the Stan Modeling Language is that it makes it impossible to hide behind implicit assumptions; because there are no defaults every assumption is clearly encoded in each Stan program. By writing your own Stan programs you are forced to acknowledge these assumptions which is the first step towards understanding and then critiquing them. Moreover by using Stan’s computational tools responsibly one can verify whether or not the model-based inferences are accurate, which then allows for faithful implementations of these critiques.
All of that said, while I don’t believe that most of the references mentioned in the thread are complete on their own, some of them can be useful contributions depending on your background. For example references and software tools that translate common frequentist models to reasonably equivalent Bayesian models can be useful guides, but they won’t reveal anything about the underlying modeling assumptions unless you actively look for them. Configuring a tool like brms to run an equivalent lm analysis, which so many introductions emphasize, doesn’t actually help guide you towards a more thorough understanding of Bayesian modeling. Using brms to translate an lm model into a Stan program and then interrogating that Stan program for the model assumptions, however can be an incredibly useful first step. The challenge is that while tools like brms do provide that functionality it’s not always emphasized and again you are ultimately responsible for finding and using those functionalities.
My writing, Writing - betanalpha.github.io, takes a bottom up approach that emphasizes bespoke modeling from the beginning instead of trying to bootstrap off of common frequentist models. It’s long and still very much in development, but it’s been designed to approach an"understand Bayesian modeling inside and out" as quickly as possible. Just no more quickly.
This journal is long and challenging but it can also be extremely fun and stimulating. Keep your eye on your final goal and try to focus on intermediate steps that are always moving you towards that goal. Good luck!
I’m a fan of your writing and have found the various case studies to be extremely useful. I’m wondering, though, for people who are more applied in practice, what would you consider a “solid conceptual foundation” of probability? Are you suggesting that practitioners have a solid grasp of measure-theoretic probability theory? Or is it sufficient to have an intuitive understanding of the conceptual foundations of probabilistic models and computation without diving too deep into the most technical treatments of the topic? Where would you draw the line?
What’s really important is understanding exactly what probabilistic operations are valid, in other words how we actually extract information from a given probability distribution. In particular this informs what our algorithms are actually approximating and how to interpret any approximation error, which is critical to understanding how to implement a proper practical analysis. Although the ultimate rules are pretty straightforward, they’re usually extremely non-intuitive for those who haven’t worked through the probability theory basics which makes them hard to swallow at face value.
For example most introductions don’t even introduce probability distributions but rather probability density functions. Unfortunately most relevant probability density functions, in particular marginal posterior density functions, can’t actually be constructed in realistic problems. Anyone expecting the output of a tool like Stan to be probability density functions is either going to be disappointed. Or, worse, they’ll reach for some tool that promises to translate Stan output to probability density functions (i.e. every visualization library out there) no matter how mathematically ill-posed it might be. So much statistical software has been written for what people think they want instead of what is actually well-defined mathematically, and that only proliferates these confusions.
The goal in my writing has been to motivate all of these subtle issues that propagate all the way to how we actually build and implement analyses in practice. Unfortunately there’s no way to motivate all of those issues in a reasonably self-contained way without starting off pretty deep! Any shortcut relies on people accepting various rules verbatim and I haven’t found that to be all that successful, especially with so many other references stating their own, less well-posed rules and outcomes that superficially seem more appealing/intuitive.