Sure, I can further specify the issue:
I am running a Hurdle Model with a hidden Markov process. I think what’s causing the segmentation fault is the lognormal in the Hurdle. I am selecting model parts that might be relevant to answer this question.
data{
int<lower = 1> N; // number of observations
int<lower = 0> S; // number of states
int<lower = 1> H; // number of individuals
int<lower = 0> K1; // number of covariates inside delta in Bernoulli part
int<lower = 0> K2; // number of covariates inside delta in Lognormal part
matrix[N, K1] C1; // matrix of covariates for Bernoulli part
matrix[N, K2] C2; // matrix of covariates for Lognormal part
int<lower = 0, upper = 1> y[N]; // binary decision
real q[N]; // Hurdle: Dependent variable we want to model conditional on y
int<lower = 1> id[N]; // identifier of individuals
}
parameters {
ordered[S] mu; // state-dependent intercepts in Bernoulli part
vector[S] nu; // state-dependent intercepts in Lognormal part
real alphaj[H]; // individual-specific intercept in Bernoulli part
real alphai[H]; // individual-specific intercept in Lognormal part
real<lower = 0> sigma_alphai;
real<lower = 0> sigma_alphaj;
real<lower = 0> sigma_q;
vector[K1] delta1;
vector[K2] delta2;
}
model {
...
for (t in 2:N) {
target += log_sum_exp(gamma);
for (k in 1:S){
gamma_prev[k] = bernoulli_logit_lpmf(y[t] | alphaj[id[t]] + mu[k] + C1[t]*delta1);
if(y[t] == 1){
gamma_prev[k] += lognormal_lpdf(q[t] | alphai[id[t]] + nu[k] + C2[t]*delta2, sigma_q);
...
-
C1 and C2 are design matrices of categorical predictors converted into dummy variables and then stored in a design matrix, e.g.
C1 <- structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,...).Dim = c(7192, 17))
-
The variable q we are modeling in the lognormal part has a range between 0 and 1, as I converted real q to q = real_q/max(real_q) before giving the data to stan, e.g.
q ← c(0.372852812394746, 0.175479959582351, 0.143482654092287, 0.121926574604244, 0.122600202088245, 0.112495789828225, 0.16503873358033, 0.0919501515661839,
0.0963287302121927, 0.100370495116201, 0.11485348602223, 0.107106769956214, 0.106433142472213, 0.149208487706298, 0.108790838666218, 0.102728191310205,
0.0929605927921859, 0.0929605927921859, 0.105422701246211, 0.105422701246211, 0.047490737622095, 0.101717750084203, 0.120916133378242, 0.106096328730212, 0,
0.1000336813742, 0.113169417312226, 0.112495789828225, 0.117884809700236, 0.0983496126641967, 0.0848770629841697, 0.0771303469181543, 0.11485348602223,
0.0788144156281576, 0.0747726507241495, 0.0191983832940384, 0.113169417312226, 0.0936342202761873, 0.0717413270461435, 0.082519366790165, 0.107443583698215,
0.0771303469181543, 0.0811721118221623, 0.0892556416301785, 0.0269450993600539, 0.0855506904681711, 0.11519029976423, 0.0932974065341866, 0.0848770629841697,
0.10474907376221, 0.0781407881441563, 0, 0.017514314584035, 0.0757830919501516, 0.125631525766251, 0.0862243179521724, 0.11013809363422, 0.0939710340181879, 0, 0,
0, 0, 0, 0.0181879420680364, 0, 0, 0, 0, 0, 0.0101044122600202, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
- mu and nu are state-dependent intercepts (one for each state in each equation)
- alphaj and alphai are individual-specific intercepts
When I run the model on a Linux machine with cmdstan and a lognormal in the Hurdle, it throws the following error message at the start of the sampling 6 times per chain and only during warmup. The segmentation fault occurs at ~ 42% of the estimation.
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: lognormal_lpdf: Scale parameter is 0, but must be > 0! (in '/home/user1/cmdstan-2.25.0/examples/noncp50/lognormal_2state.stan', line 80, column 4 to line 81, column 79)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
When I run the same model with a normal instead of a lognormal in the Hurdle, it throws this message only once within a chain. Most important: It does not stop the estimation due to a segmentation fault.
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: normal_lpdf: Scale parameter is 0, but must be > 0! (in '/home/user1/cmdstan-2.25.0/examples/lognormal2502/lognormal_2state.stan', line 50, column 2 to column 35)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
The segmentation fault does not occur if I sample data of 50 individuals, but it does when I run the model with data of 250 or 500 individuals.
Any ideas what’s happening here?