Experiment Frameworks for Running/Comparing Lots of models/configs/results

breckbaldwin · September 30, 2021, 3:42pm

Running large numbers of models/configs etc to explore a problem can easily get overwhelming. This problem is shared with creating/tuning machine learning systems and I am aware of at least one tool that attempts to help with it: ML Flow (https://mlflow.org/)

Some questions:

Are there other packages that people particularly like/don’t like etc…?
Any feedback on MLFlow appreciated. My initial experiments in a Databricks environment seem ok. Does anyone have experience with MLFlow?

thanks
Breck

Bob_Carpenter · September 30, 2021, 7:09pm

I think @rybern is working on related issues of organizing a family of models for his dissertation.

I can never figure out what these web products do from their home pages. Their doc page is better, and I think their use of “lifecycle” corresponds to our use of “workflow”. It looks like some kind of web app for managing and sharing results. It says it’s application agnostic, so I’m curious how hard it was to integrate Stan into it and what you used it for.

rok_cesnovar · September 30, 2021, 7:21pm

Not sure, but if think Stantargets maintained by @wlandau is somewhere along the lines of what you are looking for: GitHub - ropensci/stantargets: Reproducible Bayesian data analysis pipelines with targets and cmdstanr

rybern · September 30, 2021, 7:28pm

Thanks @Bob_Carpenter, yes, I’m exploring an abstraction that should be helpful for organizing and automating model exploration. Andrew posted a video of it here (it’s a very bad video, I’ll replace it soon!)

mike-lawrence · September 30, 2021, 8:38pm

You might also check out the infrastructure used by the SBC package. I also made an SBC framework of my own, using targets directly (after finding stantargets too limited at the time)

mbjoseph · September 30, 2021, 9:04pm

Funny timing, I just gave a talk on Stan + MLFlow at StanConnect Ecology part 1 – we’ve used this extensively over the past few months to great effect. I find it works well once you get things set up, and fits neatly in a Bayesian workflow/MLOps pipeline.

Edit: The compelling use case for me is that a Bayesian workflow involves a lot of experiments. Tracking experiments helps organize your work and more systematically see whether your development effort on a model is resulting in improvements. That said, diligent tracking of experiments is hard when it must be done manually. The value proposition of tools like MLFlow is it automates this tracking, which makes it easier to navigate the Bayesian workflow. As a nice side effect, MLFlow also provides a way to share results and deploy models more easily.

Some very minimal examples with cmdstanr, brms, and lm here, along with slides: GitHub - mbjoseph/mlflow-stan: MLFlow with cmdstanr

bparbhu · October 2, 2021, 7:09pm

You should check out GitHub - d6t/d6tflow: Python library for building highly effective data science workflows

ariddell · October 2, 2021, 8:42pm

I was just about to post the same question. I’m mainly interested in tools for keeping track of different parameter settings over time.

One that caught my attention: GitHub - google/gin-config: Gin provides a lightweight configuration framework for Python

Gin provides a lightweight configuration framework for Python, based on dependency injection. Functions or classes can be decorated with @gin.configurable , allowing default parameter values to be supplied from a config file (or passed via the command line) using a simple but powerful syntax. This removes the need to define and maintain configuration objects (e.g. protos), or write boilerplate parameter plumbing and factory code, while often dramatically expanding a project’s flexibility and configurability.

Gin is particularly well suited for machine learning experiments (e.g. using TensorFlow), which tend to have many parameters, often nested in complex ways

mbjoseph · October 3, 2021, 6:26pm

To add some more thoughts based on our experience so far with MLFlow + Stan:

Pros

MLFlow is lightweight, doesn’t require major changes to existing code
MLFlow has both an R and Python API
Easily share model results by pointing people to a run’s URL
Integration with Azure Databricks is easy, but MLFlow does not lock you into using Azure
MLFlow works with any modeling framework (Stan, brms, lm, pytorch, random forests, scikit-learn, whatever)
Model registry is nice to have, as a way to promote an experimental model to production
Open source, and actively maintained
MLFlow has been around for a while, is on version 1+, and seems to be in a sweet spot as far as maturity + community + active development/maintenance

Cons

mlflow R package installation can be challenging, depending on your python environment hygiene (requires reticulate, and an mlflow conda environment)
Because MLFlow is so generic, it lacks some features that would be nice to have in a Bayesian setting (e.g., there’s no built-in support for comparing distributions from one experiment to the next).
I have not found the built-in visualizations in the web interface to be particularly useful
The R and Python APIs are not unified (e.g., it’s not like Earth Engine’s one-to-one mapping from javascript to python), which can be confusing and sometimes requires reading both language docs to figure out how to do things
I initially found the R documentation to be hard to follow
R API seems to be somewhat of a second-tier priority relative to the Python API (though this is probably a fair prioritization given the composition of MLFlow users)

I would love to see more built in support for Stan in particular and PPLs more generally - taking a look at the examples in their GitHub repo there’s definitely a focus on languagues/frameworks that are used more in a machine learning context (but see the Prophet example): mlflow/examples at master · mlflow/mlflow · GitHub

bparbhu · October 4, 2021, 2:15am

Also check out Weights and Biases Weights and Biases · GitHub

Topic		Replies	Views
Practical iterative model building with stan General	1	343	March 14, 2023
Reviewing `stantargets` R package Other	8	806	April 8, 2021
Stantargets: a new workflow automation package for cmdstanr projects Publicity cmdstanr	4	816	November 30, 2020
Bayesian Benchmarking 1.0 General	6	663	July 20, 2021
Check your models! SBC 0.1.1 is out! Publicity simulation-based-calibration	1	848	February 15, 2022

Experiment Frameworks for Running/Comparing Lots of models/configs/results

Related topics