Rstan ESS calculation from CmdStan files


#1

I ran a simulation study on a cluster so I have hundreds of CmdStan output files. If I load them with read_stan_csv I only get one chain at a time so I can’t calculate the multi-chain ESS/Rhat. Is there a way to do these calls using rstan? I got the single-chain rhat but that’s not all that meaningful.


#2

+1, using my own custom buggy scripts now


#3

I don’t recall if the calculation is implemented at the interface level currently, it shouldn’t be hard to get it into the core I just want to finish my simulation study first!


#4

Speaking of custom buggy scripts, can you share yours? I don’t feel like re-implementing my own custom buggy script right now.


#5

What’s wrong with stan_summary from CmdStan? It also lets you save as csv.


#6

Oh hey, I forgot about it because it crashes on really big models but for the current case it’ll work. Thanks!


#7

We can rewrite it so it doesn’t crash. That was literally a 4 hour weekend
coding project that hasn’t been touched since written. It was done really
poorly when I didn’t know how to use Eigen.


#8

I didn’t mean to imply otherwise. Really it should be a pretty thin wrapper since we can push most of the meaningful calculations to stan::math


#9

=). Didn’t read it that way. I just don’t like things crashing and I
remember writing that code in a haze and was happy it compiled. Last time I
looked, it looked really bad, but I didn’t want to spend time fixing it.


#10

Wow I didn’t notice that stan_summary took in multiple files…

Document reading skills -1


#11

Lemme add to that incomplete (only rhat), slow (Python), and for a slightly different format (from custom HMC) haha. Not something I’d wish on the outside world.


#12

I’ve been trying to avoid doing stuff like that too but… deadlines… :)


#13

My life has become a series of “system(cmd)” calls held together with duct tape so I don’t judge. :P


#14

There’s not too much we can do without moving away from the FFT calculation of the effective sample sizes which will always require keeping all of the samples in memory.


#15

We could certainly make the failure more graceful and I since the number of parameters matters to whether the code fails or not I believe we are currently loading all parameters at the same time whereas it could be parameter-by-parameter (for everything except the global R-hat).


#16

read_stan_csv can read multiple files. why not using it? Did I miss anything?


#17

Like @Maverick said, read_stan_csv accepts multiple file names, so you can do multiple chains and then use RStan as you usually would with a stanfit object.


#18

hey, that’s true, I just missed it completely. Oh well.


#19

It would be great to have something in stan::services for this or for stansummary in general, so we didn’t have 3 different implementations.

I’ll eventually get around to submitting a PR but I wouldn’t complain of someone beat me to it!