With the code below, I get a large number of divergent transitions and the errors listed below. The exceptions with cmdstanr are different from the warnings with rstan. A model with complete pooling ran without divergent transitions. However, modelling the effects of different grouping factors is important to understanding the data. btw, the data is simulated in python.
Other formula versions, such as “count ~ 0 + Intercept + (species||sday) + (species|g1|site), zi ~ species|g1|site” give the same exceptions.
Do the exceptions with cmdstanr point to a problem with priors? If so, which prior?
I have done extensive diagnostics. for example priorsense points to prior-likelihood conflict with b_zi_Intercept, sd_site_Intercept, cor_site__Intercept__species, r_site[2,Intercept] (all sites) amongst others. I’ve tried to weaken the priors, but the exceptions remain.
The divergences in the pairs plot are mainly in the centre, not concentrated in any far corner. On the other hand, rhats, ESS’s are ok, and all pareto k<0.7.
a plot comparing data to posterior draws looks like the model predicts quite well.
A simpler model with complete pooling did not give the same exceptions and only 4 divergent transitions after warmup. The formula was: count ~ 0 + Intercept + sday + site + species, zi ~ species
with cmdstanr (sometimes poisson_log_lpmf and other times bernoulli_logit_lpmf):
Run 1
Exception: poisson_log_lpmf: Log rate parameter is -nan, but must be not nan! (in ‘/tmp/RtmpBfWwL1/model-75d152821d17.stan’, line 78, column 6 to line 80, column 54) (in ‘/tmp/RtmpBfWwL1/model-75d152821d17.stan’, line 173, column 6 to column 74)
Run 2
Exception: bernoulli_logit_lpmf: Logit transformed probability parameter is -nan, but must be not nan! (in ‘/tmp/RtmpoL6rYU/model-25537f7a97fb.stan’, line 78, column 6 to line 80, column 54) (in ‘/tmp/RtmpoL6rYU/model-25537f7a97fb.stan’, line 173, column 6 to column 74)
with rstan:
Warning: There were 830 divergent transitions after warmup. See
“Runtime warnings and convergence problems”
to find out why this is a problem and how to eliminate them.Warning: Examine the pairs() plot to diagnose sampling problems
the brms code:
zifamily = zero_inflated_poisson(link = "log", link_zi = "logit")
sp_formula = bf(count ~ 0 + Intercept + sday + (species|g1|site), zi ~ species|g1|site)
zip_prior <- c(set_prior("normal(0,10)", class = "b"),
set_prior("lkj(2)", class = "cor"),
set_prior("student_t(3, 0, 5)", class = "sd"),
set_prior("student_t(3, 0, 5)", class = "Intercept", dpar = "zi"),
set_prior("student_t(3, 0, 5)", class = "sd", dpar = "zi"))
fit <-
brm(data = my_data,
family = zifamily,
formula = sp_formula,
prior = zip_prior,
iter = 2000, warmup = 1000, thin = 1, chains = 4, cores = 4,
# seed = 9,
refresh = 0,
backend = "cmdstanr",
sample_prior = FALSE
)
if it helps, this is the generated Stan code:
// generated with brms 2.20.4
functions {
/* compute correlated group-level effects
* Args:
* z: matrix of unscaled group-level effects
* SD: vector of standard deviation parameters
* L: cholesky factor correlation matrix
* Returns:
* matrix of scaled group-level effects
*/
matrix scale_r_cor(matrix z, vector SD, matrix L) {
// r is stored in another dimension order than z
return transpose(diag_pre_multiply(SD, L) * z);
}
/* zero-inflated poisson log-PDF of a single response
* Args:
* y: the response value
* lambda: mean parameter of the poisson distribution
* zi: zero-inflation probability
* Returns:
* a scalar to be added to the log posterior
*/
real zero_inflated_poisson_lpmf(int y, real lambda, real zi) {
if (y == 0) {
return log_sum_exp(bernoulli_lpmf(1 | zi),
bernoulli_lpmf(0 | zi) + poisson_lpmf(0 | lambda));
} else {
return bernoulli_lpmf(0 | zi) + poisson_lpmf(y | lambda);
}
}
/* zero-inflated poisson log-PDF of a single response
* logit parameterization of the zero-inflation part
* Args:
* y: the response value
* lambda: mean parameter of the poisson distribution
* zi: linear predictor for zero-inflation part
* Returns:
* a scalar to be added to the log posterior
*/
real zero_inflated_poisson_logit_lpmf(int y, real lambda, real zi) {
if (y == 0) {
return log_sum_exp(bernoulli_logit_lpmf(1 | zi),
bernoulli_logit_lpmf(0 | zi)
+ poisson_lpmf(0 | lambda));
} else {
return bernoulli_logit_lpmf(0 | zi) + poisson_lpmf(y | lambda);
}
}
/* zero-inflated poisson log-PDF of a single response
* log parameterization for the poisson part
* Args:
* y: the response value
* eta: linear predictor for poisson distribution
* zi: zero-inflation probability
* Returns:
* a scalar to be added to the log posterior
*/
real zero_inflated_poisson_log_lpmf(int y, real eta, real zi) {
if (y == 0) {
return log_sum_exp(bernoulli_lpmf(1 | zi),
bernoulli_lpmf(0 | zi) + poisson_log_lpmf(0 | eta));
} else {
return bernoulli_lpmf(0 | zi) + poisson_log_lpmf(y | eta);
}
}
/* zero-inflated poisson log-PDF of a single response
* log parameterization for the poisson part
* logit parameterization of the zero-inflation part
* Args:
* y: the response value
* eta: linear predictor for poisson distribution
* zi: linear predictor for zero-inflation part
* Returns:
* a scalar to be added to the log posterior
*/
real zero_inflated_poisson_log_logit_lpmf(int y, real eta, real zi) {
if (y == 0) {
return log_sum_exp(bernoulli_logit_lpmf(1 | zi),
bernoulli_logit_lpmf(0 | zi)
+ poisson_log_lpmf(0 | eta));
} else {
return bernoulli_logit_lpmf(0 | zi) + poisson_log_lpmf(y | eta);
}
}
// zero-inflated poisson log-CCDF and log-CDF functions
real zero_inflated_poisson_lccdf(int y, real lambda, real zi) {
return bernoulli_lpmf(0 | zi) + poisson_lccdf(y | lambda);
}
real zero_inflated_poisson_lcdf(int y, real lambda, real zi) {
return log1m_exp(zero_inflated_poisson_lccdf(y | lambda, zi));
}
}
data {
int<lower=1> N; // total number of observations
array[N] int Y; // response variable
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
// data for group-level effects of ID 1
int<lower=1> N_1; // number of grouping levels
int<lower=1> M_1; // number of coefficients per level
array[N] int<lower=1> J_1; // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_1;
vector[N] Z_1_2;
int<lower=1> NC_1; // number of group-level correlations
// data for group-level effects of ID 2
int<lower=1> N_2; // number of grouping levels
int<lower=1> M_2; // number of coefficients per level
array[N] int<lower=1> J_2; // grouping indicator per observation
// group-level predictor values
vector[N] Z_2_zi_1;
vector[N] Z_2_zi_2;
int<lower=1> NC_2; // number of group-level correlations
int prior_only; // should the likelihood be ignored?
}
transformed data {
}
parameters {
vector[K] b; // regression coefficients
real Intercept_zi; // temporary intercept for centered predictors
vector<lower=0>[M_1] sd_1; // group-level standard deviations
matrix[M_1, N_1] z_1; // standardized group-level effects
cholesky_factor_corr[M_1] L_1; // cholesky factor of correlation matrix
vector<lower=0>[M_2] sd_2; // group-level standard deviations
matrix[M_2, N_2] z_2; // standardized group-level effects
cholesky_factor_corr[M_2] L_2; // cholesky factor of correlation matrix
}
transformed parameters {
matrix[N_1, M_1] r_1; // actual group-level effects
// using vectors speeds up indexing in loops
vector[N_1] r_1_1;
vector[N_1] r_1_2;
matrix[N_2, M_2] r_2; // actual group-level effects
// using vectors speeds up indexing in loops
vector[N_2] r_2_zi_1;
vector[N_2] r_2_zi_2;
real lprior = 0; // prior contributions to the log posterior
// compute actual group-level effects
r_1 = scale_r_cor(z_1, sd_1, L_1);
r_1_1 = r_1[ : , 1];
r_1_2 = r_1[ : , 2];
// compute actual group-level effects
r_2 = scale_r_cor(z_2, sd_2, L_2);
r_2_zi_1 = r_2[ : , 1];
r_2_zi_2 = r_2[ : , 2];
lprior += normal_lpdf(b | 0, 5);
lprior += logistic_lpdf(Intercept_zi | 0, 1);
lprior += normal_lpdf(sd_1 | 0, 5) - 2 * normal_lccdf(0 | 0, 5);
lprior += lkj_corr_cholesky_lpdf(L_1 | 1);
lprior += normal_lpdf(sd_2 | 0, 5) - 2 * normal_lccdf(0 | 0, 5);
lprior += lkj_corr_cholesky_lpdf(L_2 | 1);
}
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = rep_vector(0.0, N);
// initialize linear predictor term
vector[N] zi = rep_vector(0.0, N);
mu += X * b;
zi += Intercept_zi;
for (n in 1 : N) {
// add more terms to the linear predictor
mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_1_2[J_1[n]] * Z_1_2[n];
}
for (n in 1 : N) {
// add more terms to the linear predictor
zi[n] += r_2_zi_1[J_2[n]] * Z_2_zi_1[n]
+ r_2_zi_2[J_2[n]] * Z_2_zi_2[n];
}
for (n in 1 : N) {
target += zero_inflated_poisson_log_logit_lpmf(Y[n] | mu[n], zi[n]);
}
}
// priors including constants
target += lprior;
target += std_normal_lpdf(to_vector(z_1));
target += std_normal_lpdf(to_vector(z_2));
}
generated quantities {
// actual population-level intercept
real b_zi_Intercept = Intercept_zi;
// compute group-level correlations
corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
vector<lower=-1, upper=1>[NC_1] cor_1;
// compute group-level correlations
corr_matrix[M_2] Cor_2 = multiply_lower_tri_self_transpose(L_2);
vector<lower=-1, upper=1>[NC_2] cor_2;
// extract upper diagonal of correlation matrix
for (k in 1 : M_1) {
for (j in 1 : (k - 1)) {
cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
}
}
// extract upper diagonal of correlation matrix
for (k in 1 : M_2) {
for (j in 1 : (k - 1)) {
cor_2[choose(k - 1, 2) + j] = Cor_2[j, k];
}
}
}