Model with adjoint ODE solver terminated with signal -11

ncowie · November 19, 2021, 1:18pm

Hi All,

I’ve had some interesting behaviour with a model I’ve been running. The model includes a system of ODEs to solve the steady state of chemical species. After the first few samples where it appears to be running correctly the chains end up being terminated by signal -11 with the same code in the return codes output. I’ve seen posts previously indicating that this is a segment fault, however I can’t see why it would only occur after running a few samples with a few thousand steps within each sample. The code is quite long, so I’ll just include the link to the GitHub where the Stan code is: Model Code | Function Block Code

I’m running stan v2.28.1

Kind Regards,
Nicholas Cowie

yizhang · November 19, 2021, 4:55pm

Ping @wds15 @Funko_Unko @stevebronder who have been working on adj solver. I also changed the title.

rfc · November 19, 2021, 9:48pm

If you only have to run the model for a relatively short time before it crashes with a segmentation fault, you may be able to run it using valgrind memcheck, which may be able to diagnose the location of the fault.

e.g. if you’re running a linux distribution such as debian you can install valgrind with apt install valgrind and then run your program under valgrind memcheck by following: The Valgrind Quick Start Guide.

Maybe a good place to start would be to first figure out a reproducible way of triggering the crash, that only involves building and running a stan model binary using cmdstan, without any layer of python. E.g. fixed input data file + fixed input stan model + fixed command to run sampler that always or often triggers the crash – the crash could be nondeterministic and depend on the whims of the memory allocator even if the inputs don’t change. If the crash is caused by some out of bounds array access, then it could also be data dependent, if some array dimensions or array indices are defined by data.

ncowie · November 23, 2021, 4:19pm

I attempted to use valgrind by setting my optimisation to -O:g in the cmdstan/make/local file. And than ran it using just the executable directed at the data. It almost immediately failed, however, the pointers didn’t make much sense to me. I’ll try running again tomorrow to attach a screen shot. However, after switching to the bdf solver the error doesn’t appear to occur. Furthermore, running the program in fixed_param=True mode with a known input/output for the adjoint solver was correct.

wds15 · November 23, 2021, 5:51pm

Thanks for investigating!

rfc · November 23, 2021, 9:12pm

Cool, at least it seems easy to reproduce the crash.

As well as sharing a screenshot or copy of valgrind output, please also share a copy of the entire C++ file for your model that Stan has generated – some of the developers may be able to correlate the issues that valgrind is reporting (especially if there are source file names and line numbers in valgrind output) with parts of the model code or the depths of stan’s library code.

Topic		Replies	Views
CmdStanPy - terminated by signal 11 Developers	1	708	November 14, 2022
CmdStan 2.27.0 release candidate General	36	2156	June 10, 2021
Adjoint ODE Prototype - RFC - Please Test General ode	41	2755	April 7, 2021
Troubleshooting ODE model Modeling specification , ode	10	905	August 21, 2020
Adjoint task force for ODEs Developers	15	1112	May 12, 2020

Model with adjoint ODE solver terminated with signal -11

Related topics