Posteriordb evaluation


It sounds as if we need some scripts which allow us to evaluate models for our various projects (warmup / any other tuning).

I would propose to use the R package batchtools for this for these reasons:

  • splits data generation and fitting into separate steps => you can evaluate, for example, different warmup strategies on exactly the same data generated replication. So per replication you get paired data for different warmup schemes.
  • the execution engine is modular. So this can run on a single machine with/without multi-core support - or it can run on various clusters (SGE/LSF/slurm)
  • relatively good debugging support

The only negative thing is that batchtool creates some overhead such that each job should represent substantial computational effort.

Anyone else has thoughts on this?


1 Like

ping @mans_magnusson


This sounds very promising! I have not tried it myself, but I have actually built the same functionality multiple times before - so this will probably make me very happy! I still are focused on the posteriordb to get the right stuff in there to get something to handle by the batchtools, but this seem promising for the experimental sections.


1 Like

I’ve used (and liked) batchtools for my own simulation studies, but it is not clear how much the developers intend to maintain or add to it going forward. There are some clear missing features for cloud workflows (e.g., explicit AWS ParallelCluster and/or AWS Batch), and that makes me wonder about future support.

It might be worth it to reach out and specifically ask about the developers’ intentions before getting too locked in.

Is batchtools like map-reduce or Spark for R? What’s SGE/LSF/slurm and do they run on AWS or might they run on our clusters at Columbia?

I didn’t understand the first point about splitting data generating and fitting into separate steps. I don’t see how they could be combind unless we put the data generator into transformed data in the Stan model.

Yes…its like map reduce for numerical simulations in some sense.

These are cluster schedulers which are usually used on clusters, probably also at columbia.

The point is that you simulate in one step data for a given replication which you then analyze with separate multiple methods…giving you a paired comparison. This frees you from tedious seed handling; this is already baked into the process.

Good point about aws…the last commit is on 10th of jan. So its still active.

For the purpose of doing now very flexible numerical studies, this package is really straightforward I would say. We are not intending to build a legacy on it, i think. My work usually happens on a cluster, so I am less worried about direct cloud support (and as I am aware, slurm could handle the cloud piece if i am not wrong).