A new and improved set of notebooks for a course I gave last fall at GeoMED 2024 is available here:
Notebook 2 is useful if you’re new to working with geo-located data.
Notebook 3 is a gentle introduction to the Stan workflow.
Notebooks 4-6 are about spatial models for areal data.
The GeoMED audience was a mix of epidemiologists and geographers, familiar with the models and the data, but not (yet) Stan users. The models use an ICAR component to do spatial smoothing of geo-located count data. At GeoMED, we went through notebooks 3, 4, and 5.
Since then, Stan release 3.36 introduced the sum_to_zero_vector
constrained parameter type,
which just rocks - for more details, see: The Sum-to-Zero Constraint in Stan
As the ICAR model and friends all use a sum-to-zero constraint, I needed to update the spatial models accordingly. And more! I’ve added notebook 6 which shows how to extend the BYM2 model so that it can handle maps of places like New York City - a bunch of larger and smaller land masses and islands - following the INLA folk’s lead: A note on intrinsic Conditional Autoregressive models for disconnected graphs.
What’s key about the seemingly trivial notebook 3 is that it goes through in mind-numbing detail how to set up a bad model that doesn’t fit the data, with a sidebar about a few data transforms that make or break whether or not Stan can fit the model, given the data, and then it sets up an almost as bad model that does fit the data. It’s an exercise in bookkeeping, model naming, showing your work. I’m a software engineer at heart - writing modular, maintainable code requires writing seemingly trivial unit tests and thinking long and hard about the names of things. If you want to develop a model that works now and will work next year when you try to recreate an analysis or apply your model to a new dataset, you need to follow the workflow. That’s all.
One more thing: I wrote both Python and R versions of these notebooks and the HTML notebooks, which run the Python code, show the corresponding R code in parallel. I was happy to find that Python’s libpysal does pretty much everything that R’s sf
and spdep
does, although differently. Previously I did a case study on Python’s plotnine
package. Bottom line: work in the language you like, but if your colleagues use the other language, these notebooks are there to facilitate translation.