Rstan ESS calculation from CmdStan files

I ran a simulation study on a cluster so I have hundreds of CmdStan output files. If I load them with read_stan_csv I only get one chain at a time so I can’t calculate the multi-chain ESS/Rhat. Is there a way to do these calls using rstan? I got the single-chain rhat but that’s not all that meaningful.

+1, using my own custom buggy scripts now

I don’t recall if the calculation is implemented at the interface level currently, it shouldn’t be hard to get it into the core I just want to finish my simulation study first!

Speaking of custom buggy scripts, can you share yours? I don’t feel like re-implementing my own custom buggy script right now.

What’s wrong with stan_summary from CmdStan? It also lets you save as csv.

1 Like

Oh hey, I forgot about it because it crashes on really big models but for the current case it’ll work. Thanks!

We can rewrite it so it doesn’t crash. That was literally a 4 hour weekend
coding project that hasn’t been touched since written. It was done really
poorly when I didn’t know how to use Eigen.

I didn’t mean to imply otherwise. Really it should be a pretty thin wrapper since we can push most of the meaningful calculations to stan::math

=). Didn’t read it that way. I just don’t like things crashing and I
remember writing that code in a haze and was happy it compiled. Last time I
looked, it looked really bad, but I didn’t want to spend time fixing it.

Wow I didn’t notice that stan_summary took in multiple files…

Document reading skills -1

Lemme add to that incomplete (only rhat), slow (Python), and for a slightly different format (from custom HMC) haha. Not something I’d wish on the outside world.

I’ve been trying to avoid doing stuff like that too but… deadlines… :)

My life has become a series of “system(cmd)” calls held together with duct tape so I don’t judge. :P

There’s not too much we can do without moving away from the FFT calculation of the effective sample sizes which will always require keeping all of the samples in memory.

We could certainly make the failure more graceful and I since the number of parameters matters to whether the code fails or not I believe we are currently loading all parameters at the same time whereas it could be parameter-by-parameter (for everything except the global R-hat).

read_stan_csv can read multiple files. why not using it? Did I miss anything?

Like @Maverick said, read_stan_csv accepts multiple file names, so you can do multiple chains and then use RStan as you usually would with a stanfit object.

hey, that’s true, I just missed it completely. Oh well.

1 Like

It would be great to have something in stan::services for this or for stansummary in general, so we didn’t have 3 different implementations.

I’ll eventually get around to submitting a PR but I wouldn’t complain of someone beat me to it!

1 Like