Some papers: Stan models for Australian Rules Football, Monotonic polynomials, and Exercise-induced pulmonary haemorrhage

I’m not very good at this whole self promotion thing, but I thought I might make use of this publicity tag to write a little about a few recent Stan models / papers I’ve worked on.

Australian Rules Football

The first is a model for both match prediction and the rating of offensive and defensive abilities of Australian Rules Football teams. Dynamic (time varying) team ability models are are reasonably popular for Association Football, but the higher scoring (and fundamentally different) nature of Australian Rules means the models require some adaptation.

Manderson, A.A., Murray, K. and Turlach, B.A. (2018). Dynamic Bayesian forecasting of AFL match results using the Skellam distribution, Australian & New Zealand Journal of Statistics. Doi: 10.1111/anzs.12225

Monotonic polynomials

We implemented a parameterisation for monotonic polynomials that my supervisor had previously considered. The Stan implementation was preferable, as it was able to fit higher degree polynomials (things get pretty correlated for sufficiently high degree polynomials, especially when the usual QR decomposition is not immediately applicable). The paper just deals with a single set of (x, y) data, but I also extended the parameterisation, in a hierarchical manner, to fit many similar sets of (x, y) here (which is in my recently submitted Masters thesis, and might make its way onto a blog post someday).

Manderson, A.A., Cripps, E., Murray, K. and Turlach, B.A. (2017). Monotone polynomials using BUGS and Stan, Australian & New Zealand Journal of Statistics 59(4): 353–370. Doi: 10.1111/anzs.12207

Exercise-induced pulmonary haemorrhage

We also developed / implemented two models for Exercise-induced pulmonary haemorrhage (EIPH), one to address the covariates that influenced the transition of the disease from low-states to high states (Latent time inhomogeneous Markov chains for a categorical response), and a typical semi-parametric regression to address disease progression. As in all applied statistics projects, there are a few accomodations that must be made (i.e. the need for “p-values” for parameters in a Bayesian analysis) , but the major one was that the combination of semi parametric model and ordinal response was unable to be fit to real data, and generating data from the model and attempting to fit the model back to the data left me questioning if it would ever be able to be fit. The “workaround” is to pretend the response is numeric instead, but this is very unsatisfying and it would be interesting to see how much information can actually be recovered from an ordinal response.

Crispe, E.J., Secombe, C.J., Perera, D.I., Manderson, A.A., Turlach, B.A. and Lester, G.D. (2018). Exercise-induced pulmonary haemorrhage in Thoroughbred racehorses: A longitudinal study, Equine Veterinary Journal. Doi: 10.1111/evj.12957

I’m currently building another set of Stan models for various oceanography phenomenon, some of which should hopefully appear in the future. Feel free to ask questions about any of the above models, I’d love to talk about them.

6 Likes

Thanks for sharing and the descriptions. Now we can add veterinary medicine to our list of applications and footy (NZ) to our list of sports applications, which already included footy (UK).

Did it fit to simulated data?

What happened when you tried to fit the ordinal model? One issue there is priors on the cutpoints to avoid collapse when there wasn’t data observed.

It might help to mention who you are—it wasn’t clear to which author your handle “hhau” corresponded!

I think you mean footy(AUS) here, the Kiwi in me would be slightly upset to be included in a sport that is predominately (almost exclusively) played in Australia!.

Stretching my memory here, but I think the conservative response is to say no (Edit: I went back and found my notes / simulation study for this, and the model does fit to simulated data, but I wouldn’t say it fits well, low n_eff etc etc), but I suspect I was simulating data that was far too similar to real data. I think the model would fit if some combination of the following things were added to the either the data generating process or the model:

  • The number of ordinal response categories was increased from 3
  • More observations in the data set per individual
  • Less noise on the covariates
  • Fewer noisy, fixed, covariates in general
  • Better priors for the cutpoints, at least something that can identify the scale of them.

I started with a generated data set that looked as similar as I could get it to the real data set, but the money / time ran out before I could explore all the rest of the of the data generative process to see if I could get the model to fit. There was no possibility of collecting more data / changing the study size anyway, but it would be nice to know from an academic perspective if such models are identifiable, and at which point things start going awry.

That’s a good point, I am Manderson A. A. in the above papers, and I have a very uninformative website at: https://hhau.github.io/ . The repos for the first two papers are on my github account at hhau (Andrew Manderson) · GitHub as well as some other monotonic polynomial things.

Oops, sorry about the misattribution. I thought it was played in NZ too for some reason! I grew up in the U.S. and then lived in the U.K for a few formative years, so you can imagine my confusion around the term “football.”