The closest match to my issue that I could find is this:
However, there are no stuck chains, no divergences or maxed out treedepths. Fits on simulated data complete quickly and without issue. When I try to use my actual data, it will also do great at 1000 to 1500 observations but starts halting randomly beyond that and I only once succeeded with 2000.
The actual iterations are completing quite quickly but once all chains reach 100% and the “Elapsed time” message shows up, the process seems to freeze. I left a fit that took 5 minutes took complete in the 100% phase overnight but it just stayed in that state.
I’m attaching the model for reference. I don’t have priors on most things, which I understand can be a bad idea, but as far as I understand, issues with diagnostics or fitting time would pop up if that was actually a problem.
I did try CmdStan and things went smoothly with the entire dataset. I wasn’t able to find a way to specify number of chains though… do I simply call the process more times? Will it know to run on another core or how do I specify that?
After the success above I went back to RStan but using a single chain and it worked again.
Armed with these results, I made a more specific search and came upon this thread:
However, unlike the poster there, I was able to run both examples successfully and verified that my hosts file was not empty.
I’m attaching a file to generate data. Perhaps it’s also relevant to mention that I’m on Ubuntu.
rstan has some problems whenever your model creates a lot of outputs. At least this was the state of affairs a while ago. Try limiting the output from your model by avoiding generated quantities and move stuff from the transformed parameter block into the model block. That should help.
You can also basically get the behavior of CmdStan from R by specifying the include = TRUE and the sample_file arguments to stan or sampling. But then you have to read the CSV files off the disk using read_stan_csv.
My model doesn’t have any generated quantities or transformed parameters! In general, it’s much simpler than other things I’ve managed to fit before.
I think the issue here came from some interaction with running the code inside an RStudio project and/or using packrat as I just managed to get an issue-free fit when I re-ran the full data on a “clean slate”.
I’ll probably follow @bgoodri’s suggestion as it’ll play nicely with my current drake workflow. Thanks all.