Posterior draws objects => recover original array data structure?

Hi!

I am running a cmdstanR fit and getting back “draws” objects which do not make sense to me in that I can’t deal with them as I want to. The issue is that the returned “draws” objects flatten out all the dimensions, but I don’t want that. Essentially I would like to get the same output format as I am getting things from the “extract” method in rstan - that is a list of all the variables where each variable still has it’s strucutre.

So I do not want to see things like “theta[1]”, “theta[2]”, … etc., but rather “theta” which is structured accordingly (just as the extract method would do it from rstan).

It’s not clear to me how to reformat that in an elegant way. tidybayes does not seem to solve it since it targets tidy data structures.

Many thanks for any help on this.

Sebastian

1 Like

That wide structure is “original” structure.

To get ndim structure, have you tried posterior package?

(E.g. in CmdStanPy you can use ArviZ InferenceData to get ndim structure idata = az.from_cmdstanpy(fit))

Not completely sure, but I think posteriors extract_variable and extract_variable_matrix from posterior can be used for that.

I have looked at the posterior package, but the documentation is not helping me… maybe I overlook something?

Nope. extract_variable gives me back a flat 1D vector, but not the original structure.

Yeah, at least their examples show ndim structure.

But if posterior don’t want to implement that kind of functionality, then there is always option to do it manually (this should then be inside CmdStanR)

Get a table of all theta vars --> (check order -->) reshape to correct order.

Not sure how easy this would be in R (probably similar as in python).

tidybayes has support for posterior draws (and thus cmdstanr as well) on a branch. That branch works directly with cmdstan fit. I have seen tweets from @mjskay with demos of that with gather_draws a while back. Not sure when that will hit cran (guessing posterior needs to be put on cran first).

I remembered we have an issue for that: https://github.com/stan-dev/cmdstanr/issues/183
Though not sure whether this falls under cmdstanr or posterior. My feeling is more the latter, but idk.

Cc: @jonah

Edit: i was reffering to this: https://twitter.com/mjskay/status/1289987974973685760?s=20

In the meantime you can simply use rstan::read_stan_csv(fit$output_files()) to get the rstan stanfit on which you can then use extract.

Nope… I tried that and it failed for me.

EDIT: Ok… so this approach fails if there is only warmup in the csv file, but no iterations from the sampling phase.

> rstan::read_stan_csv(scale_fit$warmup$output_files())
Error in `[<-`(`*tmp*`, buffer.pointer, , value = scan(con, nlines = 1,  : 
  subscript out of bounds

but when there are also samples from a sampling phase, it does seem to work. Maybe a rstan bug.

1 Like

That is a bug in rstan’s read_stan_csv it seems. Will take a look.

It sounds like you might be looking for something like the rvar interface I am working on for posterior. It is very close to me making a PR onto the main branch (I got interrupted by the beginning of the fall quarter), but you can see a description of it here or try it out on the rv-like brach

4 Likes

The rvar idea sounds like what I am looking for… hopefully you can resume your efforts on this. Looking forward to it.

1 Like

That works for me, but it’d be nice to have this in cmdstanr without having to load rstan. One of the main advantages of cmdstanr is not having to install rstan!

I only need the draws, not anything else that read_stan_csv returns. I only want to be able to get the draws in a structured way that lets me avoid having to build strings representing indexed variables.

2 Likes

I think cmdstanr has this already (similar functionality is in cmdstanpy too)

1 Like