Best Practice 'software engineering' in Analytics - good sources of advice/discussion etc

Sean_Matthews · June 3, 2021, 10:38am

I’m currently helping (actually not precisely currently, rather as of next week) set up a new internal consulting analytics group, and I would like to get good practices established from day one (previous experience tells me that if you don’t do this on day one, then it never happens).

I am thus interested if anyone has pointers to detailed discussion of best practice for this sort of thing.
I’m interested in things like the extent to which best general software engineering practice, e.g. model management, regression testing, continuous integration etc. apply, or can be modified to apply, etc.

All pointers welcome, and thanks in advance,

Sean Matthews

Ara_Winter · June 10, 2021, 6:21pm

I really like discourse around principled Bayesian workflows:
https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html
and

There is also a recorded youtube talk that I don’t remember the link to.

Sean_Matthews · June 10, 2021, 6:26pm

Yes this is good (as are the other workflow papers produced by the Stan group), but it is good on workflow from an analytic point of view. What I’m currently searching for is anything on workflow from a software development point of view (if you like, what is the optimal way to use Git in an analytics project). There does not seem to be a lot of that around.

But thanks anyway,

Sean

Ara_Winter · June 10, 2021, 6:36pm

Oh ok! That makes sense to me. Thanks for clarify.

I seen some papers about why you might want to use a Bayesian approach in software development. Other papers about how to build a Bayesian model to study software development.

But not much from a software development POV.

rok_cesnovar · June 10, 2021, 7:04pm

This is a great video on that topic by @richard_mcelreath: Science as Amateur Software Development - YouTube
This is a great (and also fun) starting point for this discussion and also points at some of further resources.

A project I also really like on this topic is GitHub - ropensci/stantargets: Reproducible Bayesian data analysis pipelines with targets and cmdstanr by @wlandau

If we are talking basic Git strategies, it really depends on the structure of the group and the pace of the project. For me personally, I like the approach that we use in Stan Math and other stan-dev repositories (some more than others ): Every change (regardless of how small it is) is done through a PR and someone needs to review and understand it. That makes a project much less dependent on a single person. The process is time consuming and occasionally takes awhile to get a reviewer but definitely worth it. Stan Math is a software project, but I dont see a big difference if doing an analytics project for Git strategies.

Topic		Replies	Views
Practical iterative model building with stan General	1	358	March 14, 2023
Post-doc to work on developing Bayesian workflow tools Jobs	1	544	June 3, 2021
Stan applications in business/marketing analytics General	8	1375	December 11, 2018
Experiment Frameworks for Running/Comparing Lots of models/configs/results General	9	1043	October 4, 2021
Participation Needed: Probabilistic Programming Study General	6	253	March 18, 2025

Best Practice 'software engineering' in Analytics - good sources of advice/discussion etc

Related topics