Case study on spatial models for areal data - Poisson CAR/IAR


#1

I’ve got a writeup with example models checked into stan-dev/example-models repo:
https://github.com/stan-dev/example-models/tree/IAR-case-study

the case study is in directory example-models/knitr/car-iar-poisson.

It contains a short summary of the math for CAR/IAR models, followed by example data taken from Charles DiMaggio. Feedback, comments, suggestions, and above all corrections welcome!

cheers,
Mitzi


Updates to case study on spatial models for areal data - Poisson CAR/IAR BYM and BYM2
Bugs vs Stan for areal data
#2

Hi Mitzi. Cool! For pedagogical purposes, it might be nice to show (excerpts) of the Stan output at the end. How’d you know that the model didn’t fit? How slow is slow? Etc.


#3

also is there a way for https://github.com/stan-dev/example-models/blob/IAR-case-study/knitr/car-iar-poisson/IAR_Stan.Rmd to render latex when I view it in github?


#4

I only checked in the Rmd - if you run it through knitr, you’ll see the Stan plots. also, if you view the Rmd file from within RStudio, the latex formulas will render nicely.


#5

This looks great!

There is some significant overhead it getting to the HTML as the R requires a ton of domain-specific R packages and some of the Stan models take a few minutes to run. Could you check in the generated HTML and any necessary resources so it can be read immediately without rendering?

Possible to have a figure demonstrating a lattice like model highlighting the Markov blanket?

You note that spatio-temporal data is typically 1-4 dimensions, perhaps it’s worth explicitly noting this is due to there being 1-3 spatial dimensions and 1 time dimension?

To avoid confusion I suggest using “random variable” everywhere instead of shortening to RV.

You have two tau’s missing a proceeding slash after describing the “pairwise difference form”.

Maybe note that INLA is an alternative to MCMC with difference speed and accuracy tradeoffs. ;-)

Using {r, comment=NA} to open a block of R code will drop the “##” that prefix each output line.

I recommend adding these two bits to the end:

writeLines(readLines(file.path(Sys.getenv("HOME"), ".R/Makevars")))
devtools::session_info("rstan")

Also, you can get the references printed in the footer if you add

bibliography: divergences_and_bias.bib

to the knitr header.


#6

Made changes to the case study per Micheal’s suggestions, and merged my branch with master:

As noted, the Rmd document takes a while to build, and it does require a whole bunch of R libraries, so I checked in the generated HTML file as well. But GitHub won’t show it to you, so you’ll have to download it or check out the repo locally.

cheers,
Mitzi


#7

This looks awesome, especially the maps at the end!

If I could make some very minor, pedantic comments: you never define for what the acronyms CAR and IAR stand and the equation for the density of w should use a \exp instead of the raw exp.


#8

thanks! changed made; html regenerated. should we add this to our list of case studies on the website?

cheers,
Mitzi


#9

Please! You can push directly to the website repo or I can do it later if you’d prefer.


#10

added and pushed.
http://mc-stan.org/documentation/case-studies/IAR_Stan.html


#11

case study has undergone a complete rewrite. thanks to help from Andrew and Bob, there’s now an correct and efficient implementation of Besag’s IAR spatial component. comparing the Stan fit with the result of running Brad Carlin’s WinBUGS model gets similar but not matching results, so either the two models differ or the samplers differ. pushed all of the models, data, R scripts, and the writeup to GitHub:

feedback and comments very much appreciated.

the earlier versions of the IAR model had problems, so if you were using them, don’t and please try the new version.


#12

I’ve reworked the case study yet again
executive summary:

  • complete derivation of “pairwise difference” formula
  • correctness check of CAR normal prior against WinBUGS car.normal function
  • mo’ better discussion of the Besag York Mollié model.

I’d love to get some feedback before putting this back up as a Stan case study.

like Bullwinkle says “this time for sure!” https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0ahUKEwiSwa_klvPUAhUBeT4KHcNpCZUQtwIIJDAA&url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3Dpc4IFIXcDcs&usg=AFQjCNGTe3H_TcFWGieWpctuFEwrDYuY6w


#13

This is fabulous!

Some weird thoughts (that may or may not be useful):

  • I thought ICAR was the more common terminology (rather than IAR) for spatial models
  • In the final paragraph of the IAR section, it might be useful to say that the reason it’s non-generative is that it puts a flat prior on the intercept (that’s what mean-invarinace means in this context). So an ICAR model has a Jeffrey’s prior in one dimension (the overall intercept), and an informative prior in all dimensions orthogonal to that. (Writing that down right now, that may be too mathematical to be useful)
  • I don’t know what the Stan case study standard is, but I prefer my matrices to have ( . ) brackets, as the vertical lines make me think of determinants. In LaTeX that’s \begin{pmatrix}\end{pmatrix}
  • BYM section: I have very very very strong feelings about priors in this context (mainly that you shouldn’t use this parameterisation of the BYM). I don’t know if these case studies should document historical practice or best practice, but if the latter, there’s a long history of doing better (https://arxiv.org/abs/1601.01180 and refs within)

If you want some INLA comparisons in there I can do that for you fairly quickly.


#14

Probably my other comment (which is not unique to this case study) is that this is very long and would benefit from a table of contents! Maybe with sections:

  • What is a conditional autoregressive model
  • Adding an ICAR component to a Stan model
  • Example: Disease mapping
    so that people can find the bit that they’re looking for.

#15

Yes, real matrix notation please.

Comparisons with INLA would be fantastic, but that can go in the follow-on case study Mitzi’s going to do comparing to BUGS.

Best practice over historical practice. I’ve gone with some bad priors when trying to exactly replicate another study, but now that this is no longer comparing directly to BUGS, that’s no longer a concern.

Not sure what you meant by “parameterization of the BYM”. You mean the sum-to-zero constraint or the priors? Now that the goal isn’t to reproduce the GeoBUGS results exactly (we now think they are broken), we can just use our own priors.


#16

I’m doing up something now to explain the parameterisation thing, although I’ve got to go see a depressing play this evening so it may take until tomorrow.


#17

I’m not sure how to do this in github. Do I make a pull request with the changes? Or is there a different way.


#18

What’s more depressing that bad priors?

With everything that Mitzi has put into place it should be easy to modify/experiment with priors. The site-specific random effect prior is definitely an odd one!

I agree that a table of contents would be awesome.


#19

The matrices are also big enough that I feel like they would be better as not inline equations.

Also, was the line “indexing-fu plus vectorization!” intentional? It’s italicization and location right after a code block makes it someone ambiguous as to whether it was a left over excited comment or an intentional excited comment.


#20

GitHub is the way to go.
I’m going to make the edits you suggested w/r/t to ToC, ICAR, and matrix formatting.

many thanks for the great feedback. reading the priors paper now.
cheers,
Mitzi