Rstan stuck AFTER iterations complete, only when using many observations

The closest match to my issue that I could find is this:

However, there are no stuck chains, no divergences or maxed out treedepths. Fits on simulated data complete quickly and without issue. When I try to use my actual data, it will also do great at 1000 to 1500 observations but starts halting randomly beyond that and I only once succeeded with 2000.

The actual iterations are completing quite quickly but once all chains reach 100% and the “Elapsed time” message shows up, the process seems to freeze. I left a fit that took 5 minutes took complete in the 100% phase overnight but it just stayed in that state.

I’m attaching the model for reference. I don’t have priors on most things, which I understand can be a bad idea, but as far as I understand, issues with diagnostics or fitting time would pop up if that was actually a problem.

eibb-regression-model.stan (1.6 KB)

That’s weird. Maybe give this a try in cmdstan?

If you’re using Rstan, you can use stan_rdump to make the data file you’d need. Something like:

If N, Kx, Kz, n, y, x, and z are variables in your environment, just use:

stan_rdump(c("N", "Kx", "Kz", "n", "y", "x", "z"), "filename.dat")

Build your model with cmdstan, and then run with:

./modelname sample data file=filename.dat output file=output.csv refresh=1

And see if the behavior is different. You can watch the output with:

tail -f output.csv

To see it update live.

Or post a file that makes fake data here and I’d be happy to run it.

I was just about to give some updates on this.

I did try CmdStan and things went smoothly with the entire dataset. I wasn’t able to find a way to specify number of chains though… do I simply call the process more times? Will it know to run on another core or how do I specify that?

After the success above I went back to RStan but using a single chain and it worked again.

Armed with these results, I made a more specific search and came upon this thread:

However, unlike the poster there, I was able to run both examples successfully and verified that my hosts file was not empty.

I’m attaching a file to generate data. Perhaps it’s also relevant to mention that I’m on Ubuntu.

makeData.R (1.9 KB)

do I simply call the process more times?

Yeah. I use gnu parallel for this stuff.

Lemme pop open ye olde Rstudio and give this script a whirl

FYI the example data at N = 10**3 runs fine but the issue occurs at 10**4 (my real data has around 30k obs)

This ran for me (this is the last three lines from my makeData.R):

eibb.sim(N = 10**4, n = 10, bx = c(0, 0.5, -0.5), rho = 0.2, s = 0.5) ->

model = stan_model("~/Downloads/eibb-regression-model.stan")

fit = sampling(model,, cores = 4, iter = 2000)

It just doesn’t seem like this would be an out of memory thing. The Stanfit object is only like 1 meg.

My sessionInfo() is:

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/
LAPACK: /usr/lib/x86_64-linux-gnu/

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bbmle_1.0.20       rstan_2.18.1       StanHeaders_2.18.0 ggplot2_3.1.0      magrittr_1.5      

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.19       pillar_1.3.0       compiler_3.5.1     plyr_1.8.4         bindr_0.1.1        prettyunits_1.0.2 
 [7] base64enc_0.1-3    tools_3.5.1        pkgbuild_1.0.2     lattice_0.20-35    tibble_1.4.2       gtable_0.2.0      
[13] pkgconfig_2.0.2    rlang_0.3.0.1      cli_1.0.1          rstudioapi_0.8     parallel_3.5.1     yaml_2.2.0        
[19] loo_2.0.0          bindrcpp_0.2.2     gridExtra_2.3      withr_2.1.2        dplyr_0.7.7        grid_3.5.1        
[25] tidyselect_0.2.5   glue_1.3.0         inline_0.3.15      R6_2.3.0           processx_3.2.0     purrr_0.2.5       
[31] callr_3.0.0        codetools_0.2-15   matrixStats_0.54.0 scales_1.0.0       ps_1.2.0           assertthat_0.2.0  
[37] colorspace_1.3-2   numDeriv_2016.8-1  lazyeval_0.2.1     munsell_0.5.0      crayon_1.3.4

I’m not sure what’s happening. Do you see anything radically different about our R or Rstan versions? Have you tried this on a different computer?

1 Like

rstan has some problems whenever your model creates a lot of outputs. At least this was the state of affairs a while ago. Try limiting the output from your model by avoiding generated quantities and move stuff from the transformed parameter block into the model block. That should help.

This has been my experience too. We’ve talked about a few solutions but we’d need to change how rstan stores output.

You can also basically get the behavior of CmdStan from R by specifying the include = TRUE and the sample_file arguments to stan or sampling. But then you have to read the CSV files off the disk using read_stan_csv.

1 Like

My model doesn’t have any generated quantities or transformed parameters! In general, it’s much simpler than other things I’ve managed to fit before.

I think the issue here came from some interaction with running the code inside an RStudio project and/or using packrat as I just managed to get an issue-free fit when I re-ran the full data on a “clean slate”.

I’ll probably follow @bgoodri’s suggestion as it’ll play nicely with my current drake workflow. Thanks all.

1 Like