Renaming example-models

During todays meeting one topic @andrewgelman brought up was the status of the example-models repo

The consensus seems to be that there is still value in this repo, but a lot of the models no longer follow best practices and shouldn’t necessarily be touted as “examples” per se. I suggested that renaming the repo could give us the best of both worlds, keeping it around while communicating it might not always be the best examples.

@mitzimorris suggested renaming the repo to “model-attic”, which most attendees liked as a name. We could update the README to explain that it is sort of an attic - there are some valuable things up there, but also a lot of history.

@PhilClemson and @s.maskell both mentioned that they had worked on some improved versions of some of the models, including some that made it into posteriorDB, and it would be good to add links to those versions in the repo.

This serves as a place to discuss further thoughts on renaming. @jonah also brought up his recent thread about how currently this repo isn’t linked to from our website much: Link to example models missing from Stan website?

4 Likes

The example-models repo was originally just a dumping ground to put models that didn’t fit anywhere else. While “model-attic” describes what we have now, I would prefer to actually fix the problem.

In outline, I think we should leave the BUGS and book translations, move code for the user’s guide and testing to the appropriate repos, and relocate all the case studies (I’m OK with a new case-studies repo that would have reproducible code along with models).

Here’s a complete survey of all the top-level directories and where I think they should go.

To Keep

  • BPA: keep (Bayesian Population Analysis book)

  • Bayesian_Cognitive_Modeling: keep (book of same name)

  • bugs_examples: keep and rename to BUGS and add warnings that none of the models are up to our current best practices for either Stan programs or modeling—we do not want people blindly copying the practices in these models. If you check out the vol1/rats example, you see what we had intended, which was to do a direct translation, an efficient direct translation, and then a best practices modeling translation. it never got there and there’s no good version of that model to inspect anywhere.

To Remove

  • ARM: poor quality, buggy, and outdated model translations (models derived from Gelman and Hill’s first regression book—the new book’s coded with Stan from the get go, so this is no longer the project’s problem, it’s the authors’)

  • applications: this one should’ve cited the license and original copyright; instead, we should just remove and if desired, point to the Imperial College repo, which has a copy of this model.

  • basic_distributions: this is just trivial models to fit basic distributions with data; the only non-trivial examples are binormal, triangle, and normal_mixture and I’d be happy to make sure those were covered in the user’s guide.

  • basic_estimators: this is poorly named and contains things that should’ve been in basic_distributions; the only non-trivial examples are normal_censored, normal_truncated, and normal_mixture_k, all of which are described with code in the user’s guide

  • misc: this is the attic to the attic; some of this is salvageable but most of it’s outdated like the old written-by-hand HMM code

    • cluster: move to support file for user’s guide
    • dlm: cool Kalman-filtering example; this should go into the user’s guide
    • ecology: I think these are redundant with the book translation and the user’s guide
    • eight-schools: in BDA, example getting started code in R packages, etc.
    • funnel: in user’s guide
    • gam: this one was contributed by a user, but has a bogus license for text for the code; less clear on what to do with these really complicated models that users contribute with data but the script bombs out with some kind of C++ exception in stanc through Rstan, which I can’t easily get running any more.
    • garch: models from @bgoodri, but way out of date on coding practice; this should go into the manual time-series chapter
    • gaussian-process: presumably we have enough GP coverage elsewhere; I think this may be stuff @rtrangucci used for the users’ guide chapter
    • hier_multivariate: redundant with user’s guide, etc.
    • hmm: outdated now that we have built-ins other than maybe the simple sufficient-statistic version and semi-supervised versions, which could also go into the user’s guide; this stuff might’ve been used for the HMM examples in posteriordb from Imad Ali (I can’t find his forum handle)
    • irt: presumably redundant with all the education case studies; way out of date in coding
    • linear-regression: just delete
    • moving-average: this may be stuff from the user’s guide; strange error-based coding and out of date; maybe move in stochastic volatility model into time-series chapter of user’s guide
    • multi-logit: discussed in user’s guide
    • multivariate-probit: also in user’s guide
    • nnmf: non-neg matrix factorization should go in user’s guide in a new factor models chapter
    • sur: also redundant with user’s guide
  • regression_tests: this should go into a code repo if it’s still being used

Case Studies to Relocate

I don’t think they should go in example-models as they have data, R markdown, Jupyter (spell-check bait names are the worst), code, etc. We could create a new case-studies repo and locate the code for our case studies there

  • jupyter: this looks like a translation of Gelman and Carpenter’s paper; I have a repo for that where we can move this

  • education: this is some of the education models and should be moved to the respective authors’ GitHub

  • knitr: there are a lot of these, but I still think they should be moved back to the authors’ repos and the web site updated to match; or, we could build a case-studies repo to add this and the education examples to; there’s a bunch of residual junk like chapter1 and chapter2, which have been moved into the workflow book/paper repo; I’m guessing the IRT stuff all went into those case studies

1 Like

@WardBrian We fixed / added data for around 80 or so models in the example-models repository. These versions are the ones currently there.

I just checked and it looks like 18 models also made it into posteriorDB. Most of them were added by Kane Lindsay (our summer intern student) who also added additional example models into his fork of posteriorDB but these didn’t make it into a pull request. I’ll try to see if we can get these added too.

@Bob_Carpenter A lot of the ARM models have already made it into posteriorDB so I’m not sure if it makes sense to remove all of these. I seem to remember @mans_magnusson having issues specifically with the radon models (e.g. there seems to be lots of duplicates).

I definitely think if we remove anything we should still include a link to where the model still exists. Even if we have a new repo of “junk models” I think it’s still a useful resource to have all the models in one place.

1 Like

Hi all,

I have been out the last three months due to teaching, covid etc. I know that @avehtari are of the same opinion that we should remove models in posteriorDB that are old, or very similar to each other. I think we discussed that in posteriordb we would add a tag on a model being deprecated instead. That would make a clear signal, but the models would still be kept. So one solution could be to keep the models as deprecated?

We have also added models with a benchmark tag after the summer of code that should be in the posteriorDB.

@PhilClemson If you have more models to add to posteriordb - feel free to do so. I will start working with this now again, probably starting next week.

1 Like

New book uses rstanarm and brms, so no need for hand-written Stan code.

That was an unfortunate accident. I’ve proposed to remove most of them, but leave a few which cover some interesting range of simple models to include.

As we don’t have release 1.0 yet, maybe it would be cleaner to drop a lot?

for the record, we have 39 case studies listed here: Stan - Case Studies

there are 21 case studies under example-models - 8 education, 13 knitr

finally, a tally of case study authors, sorted by number of contributions

6 Michael Betancourt
6 Bob Carpenter
5 Sophia Rabe-Hesketh
5 Daniel C. Furr
3 Ben Bales
2 Seung Yeon Lee
2 Nicholas Sim
2 Mitzi Morris
2 JoonHo Lee
2 Feng Ji
2 Charles Margossian
2 Andrew Gelman
1 T.Trevor Caughlin
1 Sebastian Weber
1 Milad Kharratzadeh
1 Mick Cooney
1 Max Joseph
1 Lu Zhang
1 Leo Grinsztajn
1 Jungin Choi
1 Julien Riou
1 Joon-Ho Lee
1 Imad Ali
1 Hyunji Moon
1 Elizaveta Semenova
1 Cristina Barber
1 Chris Fonnesbeck
1 Cara Applestein
1 Brian Gin
1 Aybolek Amanmyradova
1 Avi Feller
1 Aravind S (Python translation)
1 Andrii Zaiats
1 Anders Skrondal
1 Aki Vehtari

That’d be great.

That’s a lot. Should we create a case-studies repo and move them all there?

yes, that would be a good move. hoping that would to encourage people to write new ones.