cmdstanPy to cmdStanR

Hello.

I was wondering if there is a way to load the output from cmdStanPy into cmdStanR. I am working in Stan with a project involving Python and I would like to use the shinystan package for diagnosing convergence.

As this package is available only on R I thought that perhaps there is a way to load the output of cmdstanpy into cmdstanR provided that both packages generate the same files under the good as they use cmdStan

Thanks in advance

Hi, yes, that should be very easy.

cmdstanpy has a method save_csvfiles:

fit.save_csvfiles(dir='some/path')

Once you have those, you can use

vector_of_files <- c("path1", "path2", "path3")
f <- cmdstanr::read_cmdstan_csv(vector_of_files)

and that will give you all the draws etc.

2 Likes

Thanks, that’s fantastic.

I guess the R posterior package can also be used through this?

Indeed. cmdstanr’s read_cmdstan_csv returns posterior draws_arrays

f <- cmdstanr::read_cmdstan_csv(vector_of_files)
f$post_warmup_draws
f$warmup_draws
f$post_warmup_sampler_diagnostics
f$warmup_sampler_diagnostics

so summarise draws etc all work

3 Likes

Thank you so much. I am totally new to R plus the fact that my collaborators usually work on python. So just trying to figure out the best way to use the super-useful libraries for Stan-output analysis provided by R

4 Likes

For the posterior package, like @rok_cesnovar said the arrays in the list returned by cmdstanr::read_cmdstan_csv() are already compatible with posterior.

But for shinystan, I think the easiest way to use it with the output from CmdStanPy would be to use RStan rather than CmdStanR to read in the CSV files to R. This is just because the stanfit objects that RStan creates are directly compatible with shinystan and it’s not currently as straightforward to use shinystan with the output from cmdstanr::read_cmdstan_csv() because it just returns a list of arrays (I will work on making that easier). So you could try this:

stanfit <- rstan::read_stan_csv(vector_of_files)
shinystan::launch_shinystan(stanfit)
3 Likes

Thank you so much as well. I will make a simple script and post here the final solution with the python code and R code for other users.

2 Likes

[UPDATED] Okei finally got some time to continue on this. I am providing updated comments after installing UBUNTU20. Library V8 has to be installed statically, but still works. I installed nodejs following docs through apt-get but still required static V8

# 1. install.packages("posterior", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
# 2. install.packages("cmdstanr", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
# 3. cmdstanr::set_cmdstan_path('/home/jmaronasm/.cmdstan/cmdstan-2.25.0/')
# 4. sudo apt install nodejs and Sys.setenv(DOWNLOAD_STATIC_LIBV8 = 1)
# 5. install.packages("rstan", repos = "https://cloud.r-project.org/", dependencies = TRUE)



# options(browser="google-chrome") 
# library("shinystan")
# library("posterior")

Hi Juan,

The first call to shinystan will always take a while, because it is calculating the summary statistics for each parameter in the stanfit object before launching the interactive environment. After this is done the first time, the results are cached in the stanfit object such that subsequent calls to shinystan launch straight away.

1 Like

Thanks. How long can it take? Just so I can wait the reasonable amount of time

That’s not really something that has a concrete answer. The general rule is that more parameters and more samples will result in a longer wait time, so this will vary depending on your model.

Okei. My model is quite small so shouldn’t take too much. Will check after upgrading to ubuntu20. Attach a screen shot. It stays like this for over 5 minutes in a model involving just 4 variables to sample. The 4 plots that should appear in this screen doesnt appear.

I’m really happy to see that this is working for you -
a main goal in developing CmdStanPy and CmdStanR (and future CmdStanX interfaces) is to allow this kind of inter-team co-operation. the Stan CSV output format, although clunky, is programming-environment neutral, as is JSON data format (more or less).

my impression - please correct me - is that in a corporate setting, the research folks work in R to develop models and the production folks prefer Python - is this your situation?

also note that the Python package Arviz has a lot of output-analysis tools for folks who like to work in Python and already understand how to do Bayesian modeling via MCMC.

Not really. I agree python might be more useful for production as it has many good features. I am actually doing research.

I usually use python because I have access to many good libraries like numpy, matplotlib or pytorch if I wish to have access to automatic differentiation and GPU; and also it is easier for me to integrate within other Linux programs.

Actually the reason I need to move R is because there are additional diagnosis tools to those provided by Stan, like the posterior package which provide the bulk and tail Effective sample size as example.

I am quite sure those tools will be available earlier in R than in Python and I don’t want to wait to have them in python ;D

Although still having some troubles that hopefully will be udpated once I update to ubuntu20.

Will check arviz as well, but still feel is a step behind the tools provided by R. Basically I have seen that package like posterior are developed by the stan developers

1 Like