cmdstanPy to cmdStanR

jmaronas · November 25, 2020, 4:42pm

Hello.

I was wondering if there is a way to load the output from cmdStanPy into cmdStanR. I am working in Stan with a project involving Python and I would like to use the shinystan package for diagnosing convergence.

As this package is available only on R I thought that perhaps there is a way to load the output of cmdstanpy into cmdstanR provided that both packages generate the same files under the good as they use cmdStan

Thanks in advance

rok_cesnovar · November 25, 2020, 4:50pm

Hi, yes, that should be very easy.

cmdstanpy has a method save_csvfiles:

fit.save_csvfiles(dir='some/path')

Once you have those, you can use

vector_of_files <- c("path1", "path2", "path3")
f <- cmdstanr::read_cmdstan_csv(vector_of_files)

and that will give you all the draws etc.

jmaronas · November 25, 2020, 4:52pm

Thanks, that’s fantastic.

I guess the R posterior package can also be used through this?

rok_cesnovar · November 25, 2020, 4:58pm

Indeed. cmdstanr’s read_cmdstan_csv returns posterior draws_arrays

f <- cmdstanr::read_cmdstan_csv(vector_of_files)
f$post_warmup_draws
f$warmup_draws
f$post_warmup_sampler_diagnostics
f$warmup_sampler_diagnostics

so summarise draws etc all work

jmaronas · November 25, 2020, 5:03pm

Thank you so much. I am totally new to R plus the fact that my collaborators usually work on python. So just trying to figure out the best way to use the super-useful libraries for Stan-output analysis provided by R

jonah · November 25, 2020, 7:18pm

For the posterior package, like @rok_cesnovar said the arrays in the list returned by cmdstanr::read_cmdstan_csv() are already compatible with posterior.

But for shinystan, I think the easiest way to use it with the output from CmdStanPy would be to use RStan rather than CmdStanR to read in the CSV files to R. This is just because the stanfit objects that RStan creates are directly compatible with shinystan and it’s not currently as straightforward to use shinystan with the output from cmdstanr::read_cmdstan_csv() because it just returns a list of arrays (I will work on making that easier). So you could try this:

stanfit <- rstan::read_stan_csv(vector_of_files)
shinystan::launch_shinystan(stanfit)

jmaronas · November 25, 2020, 8:38pm

Thank you so much as well. I will make a simple script and post here the final solution with the python code and R code for other users.

jmaronas · December 2, 2020, 3:07pm

[UPDATED] Okei finally got some time to continue on this. I am providing updated comments after installing UBUNTU20. Library V8 has to be installed statically, but still works. I installed nodejs following docs through apt-get but still required static V8

# 1. install.packages("posterior", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
# 2. install.packages("cmdstanr", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
# 3. cmdstanr::set_cmdstan_path('/home/jmaronasm/.cmdstan/cmdstan-2.25.0/')
# 4. sudo apt install nodejs and Sys.setenv(DOWNLOAD_STATIC_LIBV8 = 1)
# 5. install.packages("rstan", repos = "https://cloud.r-project.org/", dependencies = TRUE)



# options(browser="google-chrome") 
# library("shinystan")
# library("posterior")

andrjohns · December 2, 2020, 3:17pm

Hi Juan,

The first call to shinystan will always take a while, because it is calculating the summary statistics for each parameter in the stanfit object before launching the interactive environment. After this is done the first time, the results are cached in the stanfit object such that subsequent calls to shinystan launch straight away.

jmaronas · December 2, 2020, 3:20pm

Thanks. How long can it take? Just so I can wait the reasonable amount of time

andrjohns · December 2, 2020, 3:24pm

That’s not really something that has a concrete answer. The general rule is that more parameters and more samples will result in a longer wait time, so this will vary depending on your model.

jmaronas · December 2, 2020, 3:27pm

Okei. My model is quite small so shouldn’t take too much. Will check after upgrading to ubuntu20. Attach a screen shot. It stays like this for over 5 minutes in a model involving just 4 variables to sample. The 4 plots that should appear in this screen doesnt appear.

mitzimorris · December 2, 2020, 4:43pm

I’m really happy to see that this is working for you -
a main goal in developing CmdStanPy and CmdStanR (and future CmdStanX interfaces) is to allow this kind of inter-team co-operation. the Stan CSV output format, although clunky, is programming-environment neutral, as is JSON data format (more or less).

my impression - please correct me - is that in a corporate setting, the research folks work in R to develop models and the production folks prefer Python - is this your situation?

also note that the Python package Arviz has a lot of output-analysis tools for folks who like to work in Python and already understand how to do Bayesian modeling via MCMC.

jmaronas · December 2, 2020, 4:49pm

Not really. I agree python might be more useful for production as it has many good features. I am actually doing research.

I usually use python because I have access to many good libraries like numpy, matplotlib or pytorch if I wish to have access to automatic differentiation and GPU; and also it is easier for me to integrate within other Linux programs.

Actually the reason I need to move R is because there are additional diagnosis tools to those provided by Stan, like the posterior package which provide the bulk and tail Effective sample size as example.

I am quite sure those tools will be available earlier in R than in Python and I don’t want to wait to have them in python ;D

Although still having some troubles that hopefully will be udpated once I update to ubuntu20.

Will check arviz as well, but still feel is a step behind the tools provided by R. Basically I have seen that package like posterior are developed by the stan developers

Topic		Replies	Views
Make a Stanfit object from cmdstan output files CmdStan cmdstan	4	1430	November 10, 2022
Import certain parameters from cmdstan .csv output files to Stanfit object General cmdstan , techniques	3	909	March 19, 2021
Reading cmdstanr csv files CmdStan	2	370	October 16, 2023
Problem Loading cmdstan VB Draws into R for Analysis Developers	1	505	September 11, 2020
cmdStan output to R, or dev branch in Rstan Developers	7	880	July 26, 2018

cmdstanPy to cmdStanR

Related topics