Getting Started with Stan

I realized after a lot of people asked me what to read to get started with Stan that there’s a pretty big gap between our simple hello-world examples for the interfaces and our full-blown User’s Guide which is not at all hands-on.

I’m curious what people do now when someone asks them how to code in Stan. Are there other resources along the lines of what I wrote here? (In retrospect, I should’ve hopped on the forums and asked that before spending a week writing this case study.)

Getting started

To fill the gap, I drafted the following document.

[edit: Replaced download link with proper web link. ChatGPT even gave me step-by-step instructions on how to configure GitHub pages to serve this as a web page from my GitHub repo. I also spellchecked and was a bit more careful with formatting, though I’m curious how it looks on different OSes, which have different code fonts.]

This is just a draft, so I’d be very happy to get feedback.

100% Python

The case study is indeed pure Python. I’m still learning Python, so any help there would be appreciated. plotnine is a direct call-for-call translation of ggplot based on pandas data frames.

If people like this tutorial, we could easily translate it to R and Julia.

ChatGPT knows pandas and plotnine

The real breakthrough is that I wrote this entire tutorial without looking at a single piece of Python or pandas or plotnine documentation—I just asked ChatGPT (3.5 and 4) how to render all the graphs and do all the data frame manipulations I usually struggle with. I do know the basics of the grammar of graphics (the “gg” in ggplot), but can never remember the function names or arguments. So this is a godsend for working in R, too. For example, I did things like this:

I have a pandas data frame in Python and I would like to pull out the first 10 entries of the column draws for values where the column rho is 0.05, is 0.5, and is 0.95. How do I do that?

And ChatGPT gives me working code I can cut-and-paste in a convenient cut-and-paste box.

import pandas as pd

# Assuming you have a pandas data frame called 'df'

# Filter the data frame for rho == 0.05, and get the first 10 draws
draws_rho_005 = df.query('rho == 0.05').head(10)['draws']

# Filter the data frame for rho == 0.5, and get the first 10 draws
draws_rho_05 = df.query('rho == 0.5').head(10)['draws']

# Filter the data frame for rho == 0.95, and get the first 10 draws
draws_rho_095 = df.query('rho == 0.95').head(10)['draws']

It’s even useful for quarto (and I no longer worry about awkward grammar in a query).

how do I write a bibtex entry for an open source software with a url

To which ChatGPT responds with this example.

  author       = {Author Name},
  title        = {Software Title},
  year         = {Publication Year},
  version      = {Software Version},
  howpublished = {Open Source Software},
  url          = {}

ChatGPT knows a bit about Stan, but it’s not nearly as good at Stan as it is at Python.


This is great! I have a student with Python background that is struggling with Stan. Fitted like a glove to him this introduction.



Hi Bob,

Indeed, I have found there is a large gap between hello world examples and doing it in real life. As such, I’ve tried to create a youtube channel dedicated to teaching Stan for absolute beginners (Learn Stan with Ric - YouTube).

In my opinion, teaching the Math Bayesian Inference and learning Stan should not be independent exercises. Practical Bayesian inference should be learnt hand in hand with Code, and also vice versa In the past when someone has asked how to learn Stan, I had told them to learn from the examples in the User’s guide. While the examples are good, those without a solid fundamental grasp of what is happening tend to get stuck quite early. So, I created this playlist (Bayesian Inference with Stan - YouTube) that goes through fundamentals and examples.

Indeed, I think it’s a good idea to focus on Python. Stan is an amazing language, but there is definitely a perception that it is more a “research” tool, than a tool that can be used in production. I know this isn’t the case (I have put Stan models into full production before using cmdstanpy), but having more tutorials out there using Python as the interface is exactly what we need.


I love the production of more python implementation for Stan. You both have done really awesome work. So thank you for that.

@AFL_Gains, I think having more resources that might focus on the basics of making these models more “production-esque” would be awesome. I don’t know how to balance that with the practicality of a tutorial, since engineering is its own hairy beast. How could future tutorials capture part of that without bogging readers down in the details of OOP? Is using something like a @dataclass a halfway bridge between the linear tutorial workflow and bombarding the reader with “proper” classes?

I have been thinking about working on a few posts going through some older time series models and actually going through more diagnostics and thinking about how we might update the parameterizations with all the great tips I’ve seen from the Stan community. But I’m wondering about the pros and cons of including multiple python packages on top of the necessary STAN interfaces. A lot of Bayesian tutorials, not necessarily STAN in terms of .stan files, end with throwing an object into bayesplot or arviz and boom we did bayesian inference end of story. Would it be worthwhile to avoid packages for the sake of having more resources that don’t end that way?

Once you spend a month in a cold dark cave with Matplotlib, you start to like it.

I assume “in production” here means running automatically on a server for a problem someone cares about. I think that general perception extends to all of MCMC, not just Stan specifically, because of its relative instability compared to simple optimization or matrix algebra (at least when the latter is reasonably well conditioned).

Stan’s being used all over in production. Just the Prophet interface is used all over the place, but we’ve cited lots of in-production versions of Stan over the years. Do you think there are other PPLs that are considered more production-ready, and if so, where do you think that perception comes from?

I’m writing this document for people who like to learn this way. The more math-oriented people have taken this intro and jumped straight to reading the appendix first!

I’m writing this tutorial with very minimal dependencies (really just NumPy and plotnine, plus pandas because I need it for input to plotnine).

CmdStanPy itself is written with minimal dependencies. That’s because I wanted to go over the basics from scratch, mainly because I do all my own plotting anyway and I don’t like how heavy ArviZ is (I think they got the dependencies backward by trying to automatically convert other framework data structures into a be-all and end-all data structure and I also don’t like how they’re trying to manage workflow when all I want is some posterior analysis).

Stan is modern C++, so it’s very light on object oriented programming (OOP). Almost everything’s done with templates. The only place we really lean heavily on OOP is for representing the nodes in the low-level automatic differentiation expression graph.

There’s a bit of lightweight OOP in the Python and R interfaces, but it’s not really OOP in the sense that there aren’t virtual functions and complicated subclass arrangments.

1 Like

After getting great feedback from @mitzimorris, @WardBrian, and Barry Smith (at Flatiron Institute), I did a major rewrite and have a second draft ready. I’m pretty happy with this one, but would still love to hear any feedback from people on where things are unclear.

1 Like

The doc from that link says it was created May 1. Is that the most up to date doc?

Yes. I see that Quarto converted my “May 2023” into “May 1, 2023”.

yes, to the question!

regarding matplotlib, I think that plotnine is a whole lot better than a month in a cold dark cave. last summer I put together a tutorial on CmdstanPy and plotnine using the radon data from Gelman and Hill: radon


Updated again May 29, mainly in response to Barry Smith’s comments around the intro to probability theory. It’s pretty complete now. I imagine I’m going to continue the theory with MC, MCMC (including R-hat and ESS). And then with how Stan’s samplers actually work. Then it’ll be a complete theoretical and practical intro to Stan. It’ll just take me a few more months.

Hey Bob. Looks great. My only comment is quite minor, and I don’t know if you’ll agree or not:

Under “Prerequisites” I would replace both mentions of probability theory with plain old probability. I suspect there’s a fairly broad class of python programmers who understand PDFs, CDFs, etc plenty well enough to benefit from this tutorial, but who would assume that they don’t know much about “probability theory” and its rules.

If they want to check if they know enough probability theory (not really knowing what that means), they might well end up in Appendix A, and again might assume that this tutorial isn’t aimed to their level if they don’t recognize terms/concepts like sigma-algebra, etc.

1 Like