Rstan stuck AFTER iterations complete, only when using many observations

bmfazio · November 4, 2018, 1:41pm

The closest match to my issue that I could find is this:

However, there are no stuck chains, no divergences or maxed out treedepths. Fits on simulated data complete quickly and without issue. When I try to use my actual data, it will also do great at 1000 to 1500 observations but starts halting randomly beyond that and I only once succeeded with 2000.

The actual iterations are completing quite quickly but once all chains reach 100% and the “Elapsed time” message shows up, the process seems to freeze. I left a fit that took 5 minutes took complete in the 100% phase overnight but it just stayed in that state.

I’m attaching the model for reference. I don’t have priors on most things, which I understand can be a bad idea, but as far as I understand, issues with diagnostics or fitting time would pop up if that was actually a problem.

eibb-regression-model.stan (1.6 KB)

bbbales2 · November 5, 2018, 3:37pm

That’s weird. Maybe give this a try in cmdstan?

If you’re using Rstan, you can use stan_rdump to make the data file you’d need. Something like:

If N, Kx, Kz, n, y, x, and z are variables in your environment, just use:

stan_rdump(c("N", "Kx", "Kz", "n", "y", "x", "z"), "filename.dat")

Build your model with cmdstan, and then run with:

./modelname sample data file=filename.dat output file=output.csv refresh=1

And see if the behavior is different. You can watch the output with:

tail -f output.csv

To see it update live.

Or post a file that makes fake data here and I’d be happy to run it.

bmfazio · November 5, 2018, 3:59pm

I was just about to give some updates on this.

I did try CmdStan and things went smoothly with the entire dataset. I wasn’t able to find a way to specify number of chains though… do I simply call the process more times? Will it know to run on another core or how do I specify that?

After the success above I went back to RStan but using a single chain and it worked again.

Armed with these results, I made a more specific search and came upon this thread:

However, unlike the poster there, I was able to run both examples successfully and verified that my hosts file was not empty.

I’m attaching a file to generate data. Perhaps it’s also relevant to mention that I’m on Ubuntu.

makeData.R (1.9 KB)

bbbales2 · November 5, 2018, 4:01pm

do I simply call the process more times?

Yeah. I use gnu parallel for this stuff.

Lemme pop open ye olde Rstudio and give this script a whirl

bmfazio · November 5, 2018, 4:18pm

FYI the example data at N = 10**3 runs fine but the issue occurs at 10**4 (my real data has around 30k obs)

bbbales2 · November 5, 2018, 5:22pm

This ran for me (this is the last three lines from my makeData.R):

eibb.sim(N = 10**4, n = 10, bx = c(0, 0.5, -0.5), rho = 0.2, s = 0.5) -> example.data

model = stan_model("~/Downloads/eibb-regression-model.stan")

fit = sampling(model, example.data, cores = 4, iter = 2000)

It just doesn’t seem like this would be an out of memory thing. The Stanfit object is only like 1 meg.

My sessionInfo() is:

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bbmle_1.0.20       rstan_2.18.1       StanHeaders_2.18.0 ggplot2_3.1.0      magrittr_1.5      

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.19       pillar_1.3.0       compiler_3.5.1     plyr_1.8.4         bindr_0.1.1        prettyunits_1.0.2 
 [7] base64enc_0.1-3    tools_3.5.1        pkgbuild_1.0.2     lattice_0.20-35    tibble_1.4.2       gtable_0.2.0      
[13] pkgconfig_2.0.2    rlang_0.3.0.1      cli_1.0.1          rstudioapi_0.8     parallel_3.5.1     yaml_2.2.0        
[19] loo_2.0.0          bindrcpp_0.2.2     gridExtra_2.3      withr_2.1.2        dplyr_0.7.7        grid_3.5.1        
[25] tidyselect_0.2.5   glue_1.3.0         inline_0.3.15      R6_2.3.0           processx_3.2.0     purrr_0.2.5       
[31] callr_3.0.0        codetools_0.2-15   matrixStats_0.54.0 scales_1.0.0       ps_1.2.0           assertthat_0.2.0  
[37] colorspace_1.3-2   numDeriv_2016.8-1  lazyeval_0.2.1     munsell_0.5.0      crayon_1.3.4

I’m not sure what’s happening. Do you see anything radically different about our R or Rstan versions? Have you tried this on a different computer?

wds15 · November 5, 2018, 6:56pm

rstan has some problems whenever your model creates a lot of outputs. At least this was the state of affairs a while ago. Try limiting the output from your model by avoiding generated quantities and move stuff from the transformed parameter block into the model block. That should help.

sakrejda · November 5, 2018, 7:05pm

This has been my experience too. We’ve talked about a few solutions but we’d need to change how rstan stores output.

bgoodri · November 5, 2018, 7:14pm

You can also basically get the behavior of CmdStan from R by specifying the include = TRUE and the sample_file arguments to stan or sampling. But then you have to read the CSV files off the disk using read_stan_csv.

bmfazio · November 5, 2018, 7:30pm

My model doesn’t have any generated quantities or transformed parameters! In general, it’s much simpler than other things I’ve managed to fit before.

I think the issue here came from some interaction with running the code inside an RStudio project and/or using packrat as I just managed to get an issue-free fit when I re-ran the full data on a “clean slate”.

I’ll probably follow @bgoodri’s suggestion as it’ll play nicely with my current drake workflow. Thanks all.

Topic		Replies	Views
Rstan 2.26 - threads_per_chain always set to max even with rstan_options( threads_per_chain = 1 ) General rstan	6	1075	January 21, 2024
Speed up the Rstan run RStan	1	1036	September 4, 2019
Parallel the same model fitting for differen data CmdStan techniques	8	1341	November 22, 2023
Unexpected error when running Rstan in Windows computer Modeling	5	467	April 20, 2022
Advice for parallelizing many Stan models with multiple chains Modeling	1	621	September 20, 2022

Rstan stuck AFTER iterations complete, only when using many observations

Related topics