Advice on giving two hour intro to Stan?

I’m giving a two hour intro to statistics in a month as part of a data science bootcamp for incoming UCSB grad students. I want to use my time to introduce the students to Bayesian inference and Stan so they can be inspired to probabilistically model their data.

I know two hours isn’t much, so I was wondering if people had thoughts on what would be the best topics to cover and in what order. Ideally I’d like topics that would stick with the students and motivate them to learn more on their own. The students are incoming computer science, engineering, and biology grad students. I doubt these students will have much experience with statistics other than maybe basic intro to stats. They also might know some optimization, linear algebra, and basic mainstream machine learning.

To those who have given tutorials to that kind of audience, which topics did they find most interesting? Also what have you found is the most efficient order to cover topics? If anyone has a rough outline laying around that’d be really helpful.

I have a few courses on youtube you can feel free to crib from; this one is where Bayesian data analysis & Stan really starts in the latest course. If I were to teach these again (finishing my dissertation now and then going to foray into industry for a bit), I’d make a better effort to include posterior predictive checking from the outset (I talk about it several lectures in but then fail to keep it up). This earlier lecture delves more into theory for the frequentist-trained, with an example starting around min 12 that highlights a concrete case (diagnosis amidst imperfect tests) where it’s obvious that you would want to use the math of Bayes’ rule to come to an accurate updated belief (I come back to that example and use Bayes’ rule explicitly at 1:13:00-ish). Might also be useful to end with some models that you don’t actually examine in-depth, but use to pique their interest in the cool stuff you do with Stan, I’d include: mixture models for handling contamination/multi-process data, shrinkage/partial-pooling in hierarchical models, Gaussian processes for handling possibly-non-linear relationships (inc. spatial smoothing). It also might be edifying to highlight that with Stan you’re not limited to Gaussian error (and certainly not homogeneous Gaussian error) models, you can have heavy-tailed models (inc. doing inference on tail-heaviness via student-t) for “robust” models, skewed error models, count models, ordinal models, etc. As an example cool model, I just advised a colleague on a scenario where she collected data on how children share resources, and we found that the data seemed best captured by a mixture model representing one subset of kids that always just defaulted to sharing half of what they had while another subset of kids were Poisson distributed, with experimental manipulations affecting both the proportion of kids falling into one subset or the other as well as affecting the Poisson’s mean.

1 Like

There is lots of stuff on the Stan website, but two hours is very little amount of time. I would just cover what Bayesian inference is and how it is distinct from frequentist inference and supervised learning, that Stan makes it possible to do Bayesian inference in a fairly general way, that we have a bunch of (mostly R) packages that help you get going with Stan, and to do one or two examples of a model that is relevant for that audience.

Also Rasmus has three introductory videos that are about two hours total, starting with

http://sumsar.net/blog/2017/02/introduction-to-bayesian-data-analysis-part-one/

Lots of material here:

http://mc-stan.org/workshops/

and here:

http://mc-stan.org/users/documentation/index.html

including many linked videos of roughly that size.

It’s a tall order, as @bgoodri rightly points out. Going faster won’t help.

If I had two hours with that audience, I’d think the best thing you could do is teach them a bit about uncertainty—it seems to go missing in most people’s understadning, but is absolutely fundamental. Even just simple estimator as average success for a bunch of binomial trials can get the basic idea across. That’s where I always start with beginners. If they learn uncertainty goes down as 1/sqrt(N) with sample size and that 100 or 1000 categorical examples leaves you a lot of uncertainty, that’ll be huge for them going forward.

I agree with @Bob_Carpenter and @bgoodri – two hours really only gives you time to review the basics and whet their appetites, not to go into any modeling or specifics. Even in my 1 day courses I spend at least 2.5 hours on reviewing inference foundations, computational foundations, and then Stan. And that often inflates significantly with questions.

If time is not an issue (I’m traveling at the moment and have limited interest access) then private message or email me and I can send you the slides I use to give you a reference for my pacing and topics covered.