Workshop on Bayesian Data Analysis at the Ecology Across Border meeting

Dear all,

Maxime Dahirel and I will be teaching a 90min-long introductory workshop on Bayesian data analysis using STAN at the Ecology Across Borders meeting in December. This is a joint meeting of 4 different ecological societies which will be attended (mainly) by researchers in ecology.
The idea for the workshop is to have a general introduction into BDA (30min), followed by a coding session going through models of increasing complexities (30min) and finish with small groups discussion on specific themes for the participants (30min). It is targeted towards researchers with working knowledge of statistics and interest in BDA.

We would be very grateful to hear your feedback on the material we already put together (available here) but also about your experience on teaching similar workshop to audiences of varying levels.

Thanks in advance.

2 Likes

Cheapshot comment: It’s Stan not STAN because it’s named after Stanisław Ulam. :)

I will repeat the comment I made via email—this is a very ambitious schedule. You’ve included the amount of material we’d generally spend six hours working through with similar audiences and then we’d only hope to get through simpler exercises than you have outlined.

You seem to have included Bayesian statistics, Stan, brms, and rstanarm in that 30m intro session. That seems completely impossible to me.

The 30 minute schedule for a practical session also seems unworkable even if everyone has RStan or PyStan installed. If they don’t have things installed, that will eat up all of your 30 minutes for many users (and all your staff if you want to help). 30 minutes is time to get through one very simple exercise in my experience. Trying to explain how to do mixture models for zero inflation will be way too complicated.

1 Like

I forgot to thank you for posting the announcement and your course materials.

Thanks for these comments. The Bayesian intro section is loosely based on Rasmus Baath series of videos, we’ll see at our trial session if what can get through from our slides and what should be cut down.
For the practical session I was visualizing going through the lines of code and explaining the output rather than having the audience actually writing code. People who did not manage to install all packages won’t be able to run the code themselves but I hope that they would add notes to the script so that they could come back to it later on, those with the packages installed might play around a bit.
I guess that we’ll try to keep the simple lm example plus the hierarchical model.

Glad to hear it—that’ll certainly cut down on the risk!

For the record, the session was held this Wednesday and here are some of the questions / comments from the audience:

  • how to set priors when we are making a new experiment with little background knowledge
  • can we automatically translate JAGS/BUGS into Stan code
  • what is the advantage of using BDA when using flat priors since the likelihood is the posterior
  • can we fit crossed random terms with brms / rstanarm

We could go through the presentation and 2 example models (linear and overdispersed poisson regression) in the 90min time, but we had to rush through the code and could not really go into the interesting details …

Thanks for reporting back.

  1. See the Stan prior choice recommendation Wiki.

  2. Not automatically. In theory it could be done given the data. There’s an appendix in the manual aimed at users of BUGS/JAGS to help translate to Stan. And also a translation of almost all of the BUGS examples as well as BUGS examples from several books listed on the web page.

  3. That would only be equivalent if we took a posterior mode as a point estimate (sometimes called MAP for “max a posteriori”). If we wanted a point estimate, we’d use the Bayesian posterior mean (or median) which has several advantages. First, it minimizes expected square error in the estimate (or absolute error with median). In symbols, if y is data and theta parameters, then the posterior mode (or MAP) estimate is

theta* = ARGMAX_theta p(theta | y)

If the prior is uniform p(theta) = const, then p(theta | y) propto p(y | theta) and you get the MLE. If the prior isn’t constant, you can think of it as a penalty giving you a penalized maximum likelihood estimate. Or you can think of it as a Bayesian posterior mode.

On the other hand, the standard Bayesian estimator is the posterior expectation or mean,

theta-hat = E[theta | y]
          = INTEGRAL theta * p(theta | y) d.theta

The Bayesian estimate has the property that

theta-hat = ARGMIN_phi E[(phi - theta)^2 | y]

where theta is the true parameter value (usually multivariate). This is one of those cases where Andrew’s overloading of random variables and bound variables gets confusing as the theta in the expectation is a random variable whereas the theta in the integral is a bound variable and the theta in the final statement is the true value of the random variable that’s also written theta. No wonder this is so confusing for beginners! And of course, this all depends on side conditions like the expectations and integrals existing.

The posterior mode theta* doesn’t have a probabilistic interpretation, though you can talk about the sampling distribution of the estimator calculate confidence intervals. Confidence intervals are not distributions of theta conditioned on the data as you get in the Bayesian analysis, but distributions of the estimator theta* as a function of y over alternative choices of y (that lets you keep probability on the data y). You can also talk about asymptotics and whether it converges to the true value as the data size grows.

Second, the Bayesian posterior mean exists in many situations where there is no (penalized) maximum likelihood estimate. The standard example is a hierarchical model, where the posterior density grows without bound as the posterior variance goes to zero and the low-level coefficients go to the posterior mean). As MacKay said in his book, “EM goes boom!”.

Third, the question is based on the false presupposition that we’d be using improper uniform priors. See the reference in the answer to (1) above.

  1. I’m not sure what a crossed random term is. If you mean interactions, then you can certainly do that with RStanArm.

Maybe referring to random-effects? This is common in ecology: https://biologyforfun.wordpress.com/2016/12/08/crossed-and-nested-hierarchical-models-with-stan-and-r/

Yes indeed, this is common terminology in frequentist mixed effect models when there are more than one higher-level source of variation, like for instance:

nb_of_plants = \alpha + \gamma_{site} + \gamma_{year}

An example in rstanarm would be:

library(lme4)
library(rstanarm)
data(grouseticks)
m ← stan_glmer(TICKS ~ 1 + (1 | BROOD) + (1 | LOCATION),grouseticks,family=poisson)

Thanks—we just call those “multilevel models” and sometimes (when Andrew’s not listening) refer to the group-level coefficients as “random effects”.