Problem using rstan::read_stan_csv

I am trying to read the output file using rstan::read_stan_csv, but I get an error.

rr = cmdstan_model("src/rethinking.stan", cpp_options = list(stan_threads = TRUE))
dat = list(
    mx = sm$smx,
    time = sm$time, 
    period = sm$period, 
    time_period = sm$time_period, 
    county = sm$county, 
    N = nrow(sm)
)
fit_rr = rr$sample(data = dat,
    chains = 1,
    iter_warmup = 1000,
    iter_sampling = 1000,
    threads_per_chain = 10
)

stanfit = rstan::read_stan_csv(fit_rr$output_files())
Error in numeric(iter.count) : vector size cannot be NA
In addition: Warning messages:
1: In read_csv_header(csvfiles[i]) :
  NAs introduced by coercion to integer range
2: In read_csv_header(csvfiles[i]) :
  NAs introduced by coercion to integer range

The first part of the CSV file is:

# stan_version_major = 2
# stan_version_minor = 27
# stan_version_patch = 0
# model = rethinking_model
# start_datetime = 2021-10-28 07:52:56 UTC
# method = sample (Default)
#   sample
#     num_samples = 1000 (Default)
#     num_warmup = 1000 (Default)
#     save_warmup = 0 (Default)
#     thin = 1 (Default)
#     adapt
#       engaged = 1 (Default)
#       gamma = 0.050000000000000003 (Default)
#       delta = 0.80000000000000004 (Default)
#       kappa = 0.75 (Default)
#       t0 = 10 (Default)
#       init_buffer = 75 (Default)
#       term_buffer = 50 (Default)
#       window = 25 (Default)
#     algorithm = hmc (Default)
#       hmc
#         engine = nuts (Default)
#           nuts
#             max_depth = 10 (Default)
#         metric = diag_e (Default)
#         metric_file =  (Default)
#         stepsize = 1 (Default)
#         stepsize_jitter = 0 (Default)
# id = 1
# data
#   file = /tmp/RtmphsRohn/standata-2fa66c6eac5ca0.json
# init = 2 (Default)
# random
#   seed = 1624081893
# output
#   file = /tmp/RtmphsRohn/rethinking-202110280252-1-434739.csv
#   diagnostic_file =  (Default)
#   refresh = 100 (Default)
#   sig_figs = -1 (Default)
#   profile_file = /tmp/RtmphsRohn/rethinking-profile-202110280252-1-54c303.csv
# num_threads = 10
# stanc_version = stanc3 v2.27.0
# stancflags = --name=rethinking_model

The packages I am using:
other attached packages:
[1] cmdstanr_0.4.0 rethinking_2.13 rstan_2.21.2
[4] StanHeaders_2.21.0-7

Any ideas on what the problem might be?

I’m not sure what the problem is unfortunately. Are you able to share the csv file (you should be able to upload/attach files to a post here)?

Also, is there something in particular you need the stanfit object for that you can’t do with the fit from cmdstanr? (This should still work, I’m just curious about the use case)

I can’t speak for the original poster, but I do fairly extensive pre-processing of data in R and post-processing of model output and data for a specific model. I was using rstan until cmdstan_2.23.0 was released, which produced compelling speed gains using reduce_sum for threading. It was just more convenient for me to use rstan::read_stan_csv() to get stanfit objects to pass to my post-processing code than to rewrite that portion of the code.

I may change my mind about rewriting code if I decide to abandon rstan, but I’m not yet prepared to do that.

I’ve encountered no errors using rstan::read_stan_csv() on cmdstan output so I can’t help with the OP’s problem. Sorry.

2 Likes

Yeah that makes sense, thanks for sharing!

Thanks for your reply.
I am trying to run a model originally created using the rethinking package, but there was a bug that didn’t allow me to run parallel threads (threads > 1 in ulam producing a compilation error · Issue #331 · rmcelreath/rethinking · GitHub)

So, I got the stan code, and run the model using cmstanr.

Then, I want to put the model output into a reasonable format (stanfit, rethinking), but got issues with read_stan_csv.

library(cmdstanr)
sm = data.table::fread("example.csv")
rr = cmdstan_model("rethinking.stan", cpp_options = list(stan_threads = TRUE))
dat = list(
    mx = sm$smx,
    time = sm$time, 
    period = sm$period, 
    time_period = sm$time_period, 
    county = sm$county, 
    N = nrow(sm)
)
fit_rr = rr$sample(data = dat,
    chains = 1,
    iter_warmup = 1000,
    iter_sampling = 1000,
    threads_per_chain = 10
)

stanfit = rstan::read_stan_csv(fit_rr$output_files())

example.csv (49.4 KB)
rethinking.stan (1.2 KB)

Hi @sdaza, thanks for sharing the code and data. However, when I try running your code I get

Chain 1 Unrecoverable error evaluating the log probability at the initial value. 
Chain 1 Exception: array[uni, ...] index: accessing element out of range. index 1 out of range; container is empty and cannot be indexed (in '/var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/RtmpZ3UhK6/model-1674b7789d2d4.stan', line 39, column 8 to column 165) 

and the model doesn’t run.

Hi @jonah, I uploaded the files again. They are working on my laptop.

Please, let me know.
Thanks, Sebastian

Thanks, now it runs for me and I can reproduce the error from read_stan_csv(). It turns out this is due to a really silly bug in one of the internal functions called by read_stan_csv(). Can you try renaming your stan program to “test.stan” instead of “rethinking.stan” and then recompile and refit? I think that should fix the problem.

The bug is that there’s a poorly constructed regular expression (written many many years ago) that looks for “thin” in the csv file header (in order to detect if the chains are being thinned) but it picks up the “thin” in “rethinking”:

Sorry for the hassle!

2 Likes

Thanks so much @jonah, it’s working…

1 Like

To be fair to whoever wrote the regular expression originally, I don’t think they would have had reason to imagine this problem. At that time the lines of the csv header that caused the problem didn’t exist (they were added later).

Also I opened a bug report:

1 Like