Best Practice 'software engineering' in Analytics - good sources of advice/discussion etc

I’m currently helping (actually not precisely currently, rather as of next week) set up a new internal consulting analytics group, and I would like to get good practices established from day one (previous experience tells me that if you don’t do this on day one, then it never happens).

I am thus interested if anyone has pointers to detailed discussion of best practice for this sort of thing.
I’m interested in things like the extent to which best general software engineering practice, e.g. model management, regression testing, continuous integration etc. apply, or can be modified to apply, etc.

All pointers welcome, and thanks in advance,

Sean Matthews

2 Likes

I really like discourse around principled Bayesian workflows:
https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html
and

There is also a recorded youtube talk that I don’t remember the link to.

Yes this is good (as are the other workflow papers produced by the Stan group), but it is good on workflow from an analytic point of view. What I’m currently searching for is anything on workflow from a software development point of view (if you like, what is the optimal way to use Git in an analytics project). There does not seem to be a lot of that around.

But thanks anyway,

Sean

1 Like

Oh ok! That makes sense to me. Thanks for clarify.

I seen some papers about why you might want to use a Bayesian approach in software development. Other papers about how to build a Bayesian model to study software development.

But not much from a software development POV.

This is a great video on that topic by @richard_mcelreath: Science as Amateur Software Development - YouTube
This is a great (and also fun) starting point for this discussion and also points at some of further resources.

A project I also really like on this topic is GitHub - ropensci/stantargets: Reproducible Bayesian data analysis pipelines with targets and cmdstanr by @wlandau

If we are talking basic Git strategies, it really depends on the structure of the group and the pace of the project. For me personally, I like the approach that we use in Stan Math and other stan-dev repositories (some more than others ): Every change (regardless of how small it is) is done through a PR and someone needs to review and understand it. That makes a project much less dependent on a single person. The process is time consuming and occasionally takes awhile to get a reviewer but definitely worth it. Stan Math is a software project, but I dont see a big difference if doing an analytics project for Git strategies.

3 Likes