Hi,
Running a fairly simple model, with one random effect, and it always crashes with a segmentation fault. Ive tried different parameterizations, different priors, et. Always segmentation fault after running for a few hours.
I’m happy to also share some example data, but don’t know how to attach it in discourse.
I’ve tried many variants of this model, and can always produce a similar crash, always from somewhere in the stan_math library. (sometimes “/”, sometimes “+”)
Interesting note: I asked cmdstan to save_warmup. The actual values written to output.csv look reasonable, and are in the range expected.
Second interesting note: Running the same model on a Macbook (M3 chip) does not produce a crash, but runs VERY slowly
This is the latest model file I ran:
data{
int<lower=1> N;
int<lower=1> N_group;
array[N] int group;
vector<lower=0>[N] y;
}
parameters{
real<lower=0> a0;
real<lower=0> sg;
vector[N_group] group_eta;
real<lower=0> group_scale;
}
transformed parameters{
vector[N_group] a_group;
a_group = group_scale * group_eta;
}
model{
for (i in 1:N){
real mu = a0 + a_group[group[i]];
y[i] ~ normal(mu, sg);
}
// priors
a0 ~ normal(0, 0.1);
sg ~ normal(0, 0.1);
group_eta ~ normal(0, 1);
group_scale ~ normal(0, 1);
}
I compiled the cmdstan code with debugging on, so we can see the error. The resulting crash in gdb is:
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: normal_lpdf: Location parameter is inf, but must be finite! (in 'model_8.stan', line 21, column 8 to column 34)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Iteration: 100 / 2000 [ 5%] (Warmup)
Program received signal SIGSEGV, Segmentation fault.
0x0000555555587ee4 in stan::math::operator+(stan::math::var_value<double, void> const&, stan::math::var_value<double, void> const&)::{lambda(auto:1 const&)#1}::operator()<stan::math::internal::callback_vari<double, {lambda(auto:1 const&)#1}> >(stan::math::internal::callback_vari<double, {lambda(auto:1 const&)#1}> const&) (vi=warning: RTTI symbol not found for class 'stan::math::internal::callback_vari<double, stan::math::operator+(stan::math::var_value<double, void> const&, stan::math::var_value<double, void> const&)::{lambda(auto:1 const&)#1}>'
...,
__closure=0x7ffff755b858) at stan/lib/stan_math/stan/math/rev/core/operator_addition.hpp:56
56 avi->adj_ += vi.adj_;
I then asked gdb for the arguments passed to that function.
(gdb) info args
vi = warning: RTTI symbol not found for class 'stan::math::internal::callback_vari<double, stan::math::operator+(stan::math::var_value<double, void> const&, stan::math::var_value<double, void> const&)::{lambda(auto:1 const&)#1}>'
@0x7ffff755b840: {<stan::math::vari_value<double, void>> = {<stan::math::vari_base> = {
_vptr.vari_base = 0x555555756080 <vtable for stan::math::internal::callback_vari<double, stan::math::operator+(stan::math::var_value<double, void> const&, stan::math::var_value<double, void> const&)::{lambda(auto:1 const&)#1}>+16>}, val_ = 0.059424749080934487, adj_ = 52.870598166439621}, rev_functor_ = {
__avi = 0x800555555770b70, __bvi = 0x555555771a88}}
__closure = 0x7ffff755b858
Environment:
- cmdstan 2.34.1
- New install of Debian 12:
Linux bsc 6.1.0-18-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux
- g++ versions:
g++ (Debian 12.2.0-14) 12.2.0
Hardware
System:
Host: bsc Kernel: 6.1.0-18-amd64 arch: x86_64 bits: 64 Console: pty pts/3 Distro: Debian
GNU/Linux 12 (bookworm)
Machine:
Type: Desktop System: ASUS product: N/A v: N/A serial: N/A
Mobo: ASUSTeK model: PRIME B560M-A v: Rev 1.xx serial: 210585046001202
UEFI: American Megatrends v: 0820 date: 04/27/2021
Memory:
RAM: total: 125.57 GiB used: 1.7 GiB (1.4%)
Array-1: capacity: 128 GiB note: est. slots: 4 EC: None
Device-1: Controller0-ChannelA-DIMM0 type: DDR4 size: 32 GiB speed: 3200 MT/s
Device-2: Controller0-ChannelA-DIMM1 type: DDR4 size: 32 GiB speed: 3200 MT/s
Device-3: Controller0-ChannelB-DIMM0 type: DDR4 size: 32 GiB speed: 3200 MT/s
Device-4: Controller0-ChannelB-DIMM1 type: DDR4 size: 32 GiB speed: 3200 MT/s
CPU:
Info: 8-core model: 11th Gen Intel Core i7-11700 bits: 64 type: MT MCP cache: L2: 4 MiB
Speed (MHz): avg: 800 min/max: 800/4800:4900 cores: 1: 800 2: 800 3: 800 4: 800 5: 800 6: 800
7: 800 8: 800 9: 800 10: 800 11: 800 12: 800 13: 800 14: 800 15: 800 16: 800