Problem with unserialize with reasonably large model

saudiwin · November 10, 2017, 8:14pm

Hi all -

I am running a large IRT model in RStan (approx 100K parameters with 2m rows in the response) and I keep getting this error that I have not received before while using Stan:

Error in unserialize(socklist[[n]]) : error reading from connection

The model appears to run fine, 1500 iterations, and the error seems to pop up near the end of the chains. Then I only get an empty object in R instead of the full stan object. But the console output shows that the chains all finished.

I should mention too that I have run other large IRT models (maybe not quite this large) without a problem on the same machine, so I am at a bit of a loss and not sure how to try and diagnose the issue.

I am running on Rstan version 2.16.2 and R version 3.4.1 in Mac OS Sierra. I have rebooted the computer and also tried it on a different Mac OS machine with same result.

Is this some kind of limitation in Rstan’s ability to read in the final posterior draws?

bgoodri · November 10, 2017, 9:22pm

You may just be running out of RAM. Try it with cores = 1.

saudiwin · December 13, 2017, 7:38pm

Yes that seems to have been the issue. Thanks much for your help!

saudiwin · February 6, 2018, 4:37pm

Hey @bgoodri,

I just got a different CSV error:

Error in read_one_stan_csv(attr(vbres, “args”)$sample_file) :
‘csvfile’ does not exist on the disk

I’m not sure if this is also a RAM issue because I had been running this model without any issues. Rstan version 2.17.3. The model is quite big (>100K parameters) and I am fitting it with vb.

Any ideas? Thanks for your help!

bgoodri · February 6, 2018, 5:06pm

It should exist. Sometimes overly aggressive servers automatically delete files that they think are not in use. But is there a file in tempdir() that has the name you specified for sample_file?

saudiwin · February 6, 2018, 5:22pm

Thanks for the tip, looked in tempdir() but nothing with a .csv suffix. I looked at other R tmp folders and can’t find anything else there either.

It’s running on a desktop machine, so it shouldn’t have been a server. I do have concurrent R sessions running: could one of them have written over the temp file?

bgoodri · February 6, 2018, 5:39pm

Different R sessions should have different temporary directories. Try this:

Start a new R session
Look at what tempdir() is
Start a Stan model that takes at least a minute to run but specify sample_file
In Windows Explorer or whatever, look if a CSV file is created in that temporary directory while it is running

mikey_T · May 18, 2019, 6:05pm

I’m having a similar problem in rstan and wondering if this was resolved. I’m running a large model and a large dataset with vb and I get this this error after the model has converged and it draws a sample of size 1000 from the approximate posterior:
Error in scan(csvfile, what = double(), sep = ",", comment.char = "", : too many items Calls: vb -> vb -> .local -> read_one_stan_csv -> scan

The call I am using to run stan is

vb(m_init, data = stan_d, pars = pars,
init = 0, tol_rel_obj = 0.007,
adapt_engaged = FALSE, iter=100000,
eta = 0.1)

My session info is below. Thanks in advance,
Mikey

R version 3.5.3 (2019-03-11)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.4 (Maipo)
Matrix products: default
BLAS: /pfs/tsfs1/apps/el7-x86_64/u/gcc/7.3.0/r/3.5.3-3m5f3ae/rlib/R/lib/libRblas.so
LAPACK: /pfs/tsfs1/apps/el7-x86_64/u/gcc/7.3.0/r/3.5.3-3m5f3ae/rlib/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rstan_2.18.2 StanHeaders_2.18.1 ggplot2_3.1.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 magrittr_1.5 tidyselect_0.2.5 munsell_0.5.0
[5] colorspace_1.4-0 R6_2.4.0 rlang_0.3.1 plyr_1.8.4
[9] dplyr_0.8.0.1 parallel_3.5.3 pkgbuild_1.0.2 grid_3.5.3
[13] gtable_0.2.0 loo_2.0.0 cli_1.0.1 withr_2.1.2
[17] matrixStats_0.54.0 lazyeval_0.2.1 assertthat_0.2.0 tibble_2.0.1
[21] crayon_1.3.4 processx_3.2.1 gridExtra_2.3 purrr_0.3.0
[25] callr_3.1.1 ps_1.3.0 inline_0.3.15 glue_1.3.0
[29] compiler_3.5.3 pillar_1.3.1 prettyunits_1.0.2 scales_1.0.0
[33] stats4_3.5.3 pkgconfig_2.0.2

bgoodri · May 19, 2019, 12:02am

It is too big to be read in as a CSV file.

mikey_T · May 19, 2019, 10:28am

Thank you @bgoodri. Is there a solution to this beyond reducing my dataset size or reducing the size of the model?

bgoodri · May 19, 2019, 4:29pm

You could reduce the number of unknowns written to the CSV file by putting more of what you don’t need in the model block rather than transformed parameters.

mikey_T · May 19, 2019, 5:17pm

Thank you for the tip @bgoodri! I’ll give this a try.

mikey_T · May 24, 2019, 7:45pm

Moving parameters to the model block helped. Another simple solution is to draw a smaller sample from the posterior by setting output_samples=100 instead of the default, which is 1000. This gives a much smaller approximate posterior, but it is still useful for what I’m doing, and the file containing the posteriors is 10% of the size.

Topic		Replies	Views
Error in unserialize(socklist[[n]]) : error reading from connection RStan	8	2352	August 8, 2019
My favorite Error in unserialize is back :-) RStan rstan	16	1261	August 1, 2020
Stan Connection problem Error in unserialize(socklist[[n]]) : error reading from connection General rstan , r , stan , paralellization	1	1221	September 11, 2021
Unserialize(socklist[[1]]) error on Mac Catalina RStan bug	4	564	August 25, 2021
Stan abort, R crash, or other error when sampling relatively simple model RStan rstan	4	1579	September 26, 2019

Problem with unserialize with reasonably large model

Related topics