All chains finished unexpectedly!

I’m using CmdStanr to do my modelling. Inside my stan file (functions block), I have a custom function which I call inside the transformed parameters block to generate a parameter. Normally the sampling procedure of my model works fine with no problem. But whenever I want to print a real value in my custom function (for debugging), the sampling stops after a while, telling me: “No chains finished successfully. Unable to retrieve the draws”. If I remove the print line again, everything goes back to normal. It looks like a bug, but I’m not sure. My sampler is running on 4 parallel_chains. When I run in on 4 normal chains, only the last chain finishes unexpectedly.

I remember there used to be a horrifyingly bad practice for the multi-threading procedure in which treads had some wait-timer to communicate (instead of status checking). If one thread couldn’t finish fast enough, everything would go wrong. Since the print function introduces some delay in the function, I’m suspicious that: 1- the error I’m receiving is due to bad threading, 2- or at some point, the printing function somehow receives a value which can’t print, and my custom function exits without a return value. I appreciate any suggestion regarding this issue.

Have you tried just running on one chain? You don’t even have to set a lot of iterations just let it run for 100 or so.

When I run on 1 chain, it fails:

Warning message:
"Chain 1 finished unexpectedly!
"
Warning message:
“No chains finished successfully. Unable to retrieve the fit.”

And again, when I remove the print function, it works fine. So it can’t be the threading issue… strange! The print function is print(some real number)?

Could you make the model small enough to reproduce the error? For example just the function with the skeleton model? Or if your model is not too large could you share the code to see what’s going on? Currently, it might be a bit difficult to tell what’s wrong.

Was this ever resolved? I’d like to revive this discussion but the “Warning: 1 Chain Finished Unexpectedly” is a frequent problem that seems to be coming up in multiple contexts with newer versions of Cmdstan and CmdstanR. I have a model that ran successfully with good convergence properties on an older version of Cmdstan, but now after updating (I don’t even remember what version I was previously on), I’m getting this error all the time for models and data that NEVER had a problem. Searching the forum and Github, this issue comes up frequently and all of the discussions just stop with no resolution (see below). Debugging is a total PITA because objects are never stored in R if the “Warning …” message comes up so you cannot even use $output(chain_id) to examine the details of the problem.

Here are other examples (all with no responses or resolutions from DEV’s; btw, if you want people to stop using STAN, ignoring their problems is a way to make them do it, so congratulations) …

Sorry you’ve been running into this issue and I get that it’s frustrating to keep seeing this error, especially when it feels like there hasn’t been a clear resolution in past threads. Unfortunately, the message ‘Chain finished unexpectedly’ can be triggered by a variety of unrelated issues, which makes it tricky to diagnose. Some past cases were issues with Stan or one of the interfaces, some were user mistakes, some were configuration issues, and some were never figured out.

Even though your examples have similar warning/error messages, they’re not all caused by the same issue. Just to clarify, in the GitHub issue you linked to a developer did reply and they improved the error message in the case where user indexing error was the culprit. And in this thread that you’re reviving the original poster never replied when asked for a reproducible example and their Stan code. But you’re right that not all threads reached resolution.

I understand your frustration. At the same time, we ask everyone on the forum to keep discussions polite and constructive. Stan is free software maintained mostly by volunteers with other jobs, so sometimes things slip through the cracks. We are constantly working on fixing issues reported by users and adding new functionality requested by users, but time is limited.

Regarding the error you’re getting, I suggest starting a new thread and posting a reproducible example. Unfortunately it’s basically impossible to debug this issue without being able to run it. Hopefully one of the developers or experienced users on the forum will be able to help figure out the problem.

2 Likes

Yes. Nobody replies when asked to provide a reproducible example, because by the time you can construct a small enough reproducible example, you’re better off literally coding up a completely different model in some other function like gmm() or something just so you can actually get results that you can discuss. If the warning messages were written in a way that they weren’t so vague with the same warning for myriad issues that have seemingly nothing to do with each other, then perhaps people would be able to better diagnose their problems. Further, if there was a way to compile into “lldb” or “gdb” or something directly in R, we could also figure out where the fault lies, but there isn’t and then sometimes when you try to run the same model in CmdStan from the command line, the error cannot be reproduced or it throws different errors.

Finally, that the latest versions of CmdStan perform more poorly (i.e., I get way more warnings/failed chains now than I ever used to) than earlier versions really speaks to the software being possibly over-engineered, and not in a way that actually yields marginal benefits to those using it

While it is obviously nice if the example is small, we’re also happy to receive someone’s full model as long as they are able to share it and the data they provide.

This would already be informative to report. If sampling on the command line consistently succeeds, then the issue is going to be only somewhere in cmdstanr. If it leads to any errors, those are likely the cause of the “finished unexpectedly” error in some way

2 Likes

Okay, but the dataset I am using is very large. I cannot just upload it here. And if I cut it, sometimes the error doesn’t happen. The problem is with Stan is that the warning messages are so opaque that if the problem relates to some memory issue happening at scale, there’s no way for us to know this vs. it being an actual, coding issue (I suspect the latter, since the problem only emerged when I started updating things). Is there a place I can upload a 2 GB dataset, a “.stan” file, and an R script that executes the estimator which someone else can look at?

2gb might be too big for GitHub, which would be my first suggestion. A public google drive link or similar would be acceptable