Problem using rstan::read_stan_csv

sdaza · October 28, 2021, 8:00am

I am trying to read the output file using rstan::read_stan_csv, but I get an error.

rr = cmdstan_model("src/rethinking.stan", cpp_options = list(stan_threads = TRUE))
dat = list(
    mx = sm$smx,
    time = sm$time, 
    period = sm$period, 
    time_period = sm$time_period, 
    county = sm$county, 
    N = nrow(sm)
)
fit_rr = rr$sample(data = dat,
    chains = 1,
    iter_warmup = 1000,
    iter_sampling = 1000,
    threads_per_chain = 10
)

stanfit = rstan::read_stan_csv(fit_rr$output_files())
Error in numeric(iter.count) : vector size cannot be NA
In addition: Warning messages:
1: In read_csv_header(csvfiles[i]) :
  NAs introduced by coercion to integer range
2: In read_csv_header(csvfiles[i]) :
  NAs introduced by coercion to integer range

The first part of the CSV file is:

# stan_version_major = 2
# stan_version_minor = 27
# stan_version_patch = 0
# model = rethinking_model
# start_datetime = 2021-10-28 07:52:56 UTC
# method = sample (Default)
#   sample
#     num_samples = 1000 (Default)
#     num_warmup = 1000 (Default)
#     save_warmup = 0 (Default)
#     thin = 1 (Default)
#     adapt
#       engaged = 1 (Default)
#       gamma = 0.050000000000000003 (Default)
#       delta = 0.80000000000000004 (Default)
#       kappa = 0.75 (Default)
#       t0 = 10 (Default)
#       init_buffer = 75 (Default)
#       term_buffer = 50 (Default)
#       window = 25 (Default)
#     algorithm = hmc (Default)
#       hmc
#         engine = nuts (Default)
#           nuts
#             max_depth = 10 (Default)
#         metric = diag_e (Default)
#         metric_file =  (Default)
#         stepsize = 1 (Default)
#         stepsize_jitter = 0 (Default)
# id = 1
# data
#   file = /tmp/RtmphsRohn/standata-2fa66c6eac5ca0.json
# init = 2 (Default)
# random
#   seed = 1624081893
# output
#   file = /tmp/RtmphsRohn/rethinking-202110280252-1-434739.csv
#   diagnostic_file =  (Default)
#   refresh = 100 (Default)
#   sig_figs = -1 (Default)
#   profile_file = /tmp/RtmphsRohn/rethinking-profile-202110280252-1-54c303.csv
# num_threads = 10
# stanc_version = stanc3 v2.27.0
# stancflags = --name=rethinking_model

The packages I am using:
other attached packages:
[1] cmdstanr_0.4.0 rethinking_2.13 rstan_2.21.2
[4] StanHeaders_2.21.0-7

Any ideas on what the problem might be?

jonah · November 6, 2021, 9:25pm

I’m not sure what the problem is unfortunately. Are you able to share the csv file (you should be able to upload/attach files to a post here)?

Also, is there something in particular you need the stanfit object for that you can’t do with the fit from cmdstanr? (This should still work, I’m just curious about the use case)

Michael_Peck · November 7, 2021, 8:58pm

I can’t speak for the original poster, but I do fairly extensive pre-processing of data in R and post-processing of model output and data for a specific model. I was using rstan until cmdstan_2.23.0 was released, which produced compelling speed gains using reduce_sum for threading. It was just more convenient for me to use rstan::read_stan_csv() to get stanfit objects to pass to my post-processing code than to rewrite that portion of the code.

I may change my mind about rewriting code if I decide to abandon rstan, but I’m not yet prepared to do that.

I’ve encountered no errors using rstan::read_stan_csv() on cmdstan output so I can’t help with the OP’s problem. Sorry.

jonah · November 7, 2021, 10:10pm

Yeah that makes sense, thanks for sharing!

sdaza · November 8, 2021, 1:58pm

Thanks for your reply.
I am trying to run a model originally created using the rethinking package, but there was a bug that didn’t allow me to run parallel threads (threads > 1 in ulam producing a compilation error · Issue #331 · rmcelreath/rethinking · GitHub)

So, I got the stan code, and run the model using cmstanr.

Then, I want to put the model output into a reasonable format (stanfit, rethinking), but got issues with read_stan_csv.

library(cmdstanr)
sm = data.table::fread("example.csv")
rr = cmdstan_model("rethinking.stan", cpp_options = list(stan_threads = TRUE))
dat = list(
    mx = sm$smx,
    time = sm$time, 
    period = sm$period, 
    time_period = sm$time_period, 
    county = sm$county, 
    N = nrow(sm)
)
fit_rr = rr$sample(data = dat,
    chains = 1,
    iter_warmup = 1000,
    iter_sampling = 1000,
    threads_per_chain = 10
)

stanfit = rstan::read_stan_csv(fit_rr$output_files())

example.csv (49.4 KB)
rethinking.stan (1.2 KB)

jonah · November 8, 2021, 4:37pm

Hi @sdaza, thanks for sharing the code and data. However, when I try running your code I get

Chain 1 Unrecoverable error evaluating the log probability at the initial value. 
Chain 1 Exception: array[uni, ...] index: accessing element out of range. index 1 out of range; container is empty and cannot be indexed (in '/var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/RtmpZ3UhK6/model-1674b7789d2d4.stan', line 39, column 8 to column 165)

and the model doesn’t run.

sdaza · November 8, 2021, 5:26pm

Hi @jonah, I uploaded the files again. They are working on my laptop.

Please, let me know.
Thanks, Sebastian

jonah · November 8, 2021, 6:14pm

Thanks, now it runs for me and I can reproduce the error from read_stan_csv(). It turns out this is due to a really silly bug in one of the internal functions called by read_stan_csv(). Can you try renaming your stan program to “test.stan” instead of “rethinking.stan” and then recompile and refit? I think that should fix the problem.

The bug is that there’s a poorly constructed regular expression (written many many years ago) that looks for “thin” in the csv file header (in order to detect if the chains are being thinned) but it picks up the “thin” in “rethinking”:

github.com

stan-dev/rstan/blob/da2fc9c079534a82d3d26adda51ad17bf22f5e2b/rstan/rstan/R/misc.R#L1527-L1529

    
      
          if (grepl("#.*thin", input)){
            thin <- as.integer(gsub("[^0-9]*([0-9]*).*","\\1",input))
          }

Sorry for the hassle!

sdaza · November 8, 2021, 7:11pm

Thanks so much @jonah, it’s working…

jonah · November 9, 2021, 7:00pm

To be fair to whoever wrote the regular expression originally, I don’t think they would have had reason to imagine this problem. At that time the lines of the csv header that caused the problem didn’t exist (they were added later).

Also I opened a bug report:

github.com/stan-dev/rstan

read_stan_csv fails due to problems with regular expressions

opened 07:02PM - 09 Nov 21 UTC

jgabry

bug

#### Summary: `read_csv_header`, which is called internally by `read_stan_csv…`, will fail if the model name contains the string `"thin"` anywhere. In an example on the forums reported by @sdaza the Stan program was called `"rethinking.stan"`. The problem is that this use of grep https://github.com/stan-dev/rstan/blob/da2fc9c079534a82d3d26adda51ad17bf22f5e2b/rstan/rstan/R/misc.R#L1527-L1529 doesn't account for the possibility that `"thin"` shows up in other parts of the header besides the `thin` argument. Unfortunately it can show up in several other places if the model name contains `"thin"`, e.g. in `file`, `profile_file`, and `stancflags`: ``` # output # file = /tmp/RtmphsRohn/rethinking-202110280252-1-434739.csv # diagnostic_file = (Default) # refresh = 100 (Default) # sig_figs = -1 (Default) # profile_file = /tmp/RtmphsRohn/rethinking-profile-202110280252-1-54c303.csv # num_threads = 10 # stanc_version = stanc3 v2.27.0 # stancflags = --name=rethinking_model ``` This eventually results in an error when `read_stan_csv` tries to use the value of `thin`. #### Reproducible Steps: Run the code provided in the discourse post by @sdaza: https://discourse.mc-stan.org/t/problem-using-rstan-read-stan-csv/25017/5 #### Current Output: Error #### Expected Output: No error #### RStan Version: 2.21.2 #### R Version: 4.1.1 #### Operating System: Mac big sur

Topic		Replies	Views
`rstan::read_stan_csv` Fails to Read CSV Output from cmdstan When `thin=` is Specified Developers	2	412	April 7, 2020
Rstan::read_stan_csv throwing error with cmdstan models (versions 2.35) General rstan , cmdstanr	8	423	November 10, 2024
Problem with read_stan_csv CmdStan	4	758	November 12, 2018
cmdStan output to R, or dev branch in Rstan Developers	7	873	July 26, 2018
Program fails in both rstan and CmdStanR Modeling	1	268	March 16, 2021

Problem using rstan::read_stan_csv

Related topics