I am running location scale models (for my first time) in R using brms and running into a host of convergence issues (no issues when running a location-only model). One issue that may be related concerns the starting values for the scale parameters. The error (repeated dozens of times):
Chain 4 Rejecting initial value
Chain 4 Error evaluating the log probability at the initial value.
Chain 4 Exception: normal_lpdf: Scale parameter[941] is 0, but must be positive! (in '/tmp/RtmpC3QSN2/model-e45b353285cf.stan', line 109, column 4 to column 41)
Chain 4 Exception: normal_lpdf: Scale parameter[941] is 0, but must be positive! (in '/tmp/RtmpC3QSN2/model-e45b353285cf.stan', line 109, column 4 to column 41)
Is there a way to prevent this from happening within brms?
My model code:
priors_h1ab <- c(set_prior("normal(50, 10)", class = "b", coef = "Intercept"),
set_prior("normal(0, 0.5)", class = "b", coef = "loneliness_state_pmc"))
model_h1ab <- bf(depressedmood_state ~ 0 + Intercept + dayNumber + pingNumber + weekend +
loneliness_state_pmc + loneliness_trait_gmc + loneliness_state_pmc:loneliness_trait_gmc +
(0 + Intercept + loneliness_state_pmc | corr | participantId) +
ar(time = pingTotal, gr = participantId, p = 1),
sigma ~ 1 + dayNumber + pingNumber + weekend + +loneliness_state_pmc +
loneliness_trait_gmc + loneliness_state_pmc: loneliness_trait_gmc +
(1 + loneliness_state_pmc | corr | participantId))
fit_h1ab <- brm_multiple(formula = model_h1ab, data = ema_imp, family = gaussian(),
prior = priors_h1ab, sample_prior = "yes", iter = 3000,
chains = 4, cores = 4, backend = "cmdstanr", seed = 50211)
scholz
April 27, 2022, 7:57am
2
Could you check if the model is using a log link on sigma?
@scholz It looks like it is. Here’s the stan code generated by brms:
// generated with brms 2.16.3
functions {
/* compute correlated group-level effects
* Args:
* z: matrix of unscaled group-level effects
* SD: vector of standard deviation parameters
* L: cholesky factor correlation matrix
* Returns:
* matrix of scaled group-level effects
*/
matrix scale_r_cor(matrix z, vector SD, matrix L) {
// r is stored in another dimension order than z
return transpose(diag_pre_multiply(SD, L) * z);
}
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
// data needed for ARMA correlations
int<lower=0> Kar; // AR order
int<lower=0> Kma; // MA order
// number of lags per observation
int<lower=0> J_lag[N];
int<lower=1> K_sigma; // number of population-level effects
matrix[N, K_sigma] X_sigma; // population-level design matrix
// data for group-level effects of ID 1
int<lower=1> N_1; // number of grouping levels
int<lower=1> M_1; // number of coefficients per level
int<lower=1> J_1[N]; // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_1;
vector[N] Z_1_2;
vector[N] Z_1_sigma_3;
vector[N] Z_1_sigma_4;
int<lower=1> NC_1; // number of group-level correlations
int prior_only; // should the likelihood be ignored?
}
transformed data {
int max_lag = max(Kar, Kma);
int Kc_sigma = K_sigma - 1;
matrix[N, Kc_sigma] Xc_sigma; // centered version of X_sigma without an intercept
vector[Kc_sigma] means_X_sigma; // column means of X_sigma before centering
for (i in 2:K_sigma) {
means_X_sigma[i - 1] = mean(X_sigma[, i]);
Xc_sigma[, i - 1] = X_sigma[, i] - means_X_sigma[i - 1];
}
}
parameters {
vector[K] b; // population-level effects
vector[Kar] ar; // autoregressive coefficients
vector[Kc_sigma] b_sigma; // population-level effects
real Intercept_sigma; // temporary intercept for centered predictors
vector<lower=0>[M_1] sd_1; // group-level standard deviations
matrix[M_1, N_1] z_1; // standardized group-level effects
cholesky_factor_corr[M_1] L_1; // cholesky factor of correlation matrix
}
transformed parameters {
matrix[N_1, M_1] r_1; // actual group-level effects
// using vectors speeds up indexing in loops
vector[N_1] r_1_1;
vector[N_1] r_1_2;
vector[N_1] r_1_sigma_3;
vector[N_1] r_1_sigma_4;
// compute actual group-level effects
r_1 = scale_r_cor(z_1, sd_1, L_1);
r_1_1 = r_1[, 1];
r_1_2 = r_1[, 2];
r_1_sigma_3 = r_1[, 3];
r_1_sigma_4 = r_1[, 4];
}
model {
// likelihood including constants
if (!prior_only) {
// matrix storing lagged residuals
matrix[N, max_lag] Err = rep_matrix(0, N, max_lag);
vector[N] err; // actual residuals
// initialize linear predictor term
vector[N] mu = X * b;
// initialize linear predictor term
vector[N] sigma = Intercept_sigma + Xc_sigma * b_sigma;
for (n in 1:N) {
// add more terms to the linear predictor
mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_1_2[J_1[n]] * Z_1_2[n];
}
for (n in 1:N) {
// add more terms to the linear predictor
sigma[n] += r_1_sigma_3[J_1[n]] * Z_1_sigma_3[n] + r_1_sigma_4[J_1[n]] * Z_1_sigma_4[n];
}
for (n in 1:N) {
// apply the inverse link function
sigma[n] = exp(sigma[n]);
}
// include ARMA terms
for (n in 1:N) {
err[n] = Y[n] - mu[n];
for (i in 1:J_lag[n]) {
Err[n + 1, i] = err[n + 1 - i];
}
mu[n] += Err[n, 1:Kar] * ar;
}
target += normal_lpdf(Y | mu, sigma);
}
// priors including constants
target += student_t_lpdf(Intercept_sigma | 3, 0, 2.5);
target += student_t_lpdf(sd_1 | 3, 0, 15.6)
- 4 * student_t_lccdf(0 | 3, 0, 15.6);
target += std_normal_lpdf(to_vector(z_1));
target += lkj_corr_cholesky_lpdf(L_1 | 1);
}
generated quantities {
// actual population-level intercept
real b_sigma_Intercept = Intercept_sigma - dot_product(means_X_sigma, b_sigma);
// compute group-level correlations
corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
vector<lower=-1,upper=1>[NC_1] cor_1;
// extract upper diagonal of correlation matrix
for (k in 1:M_1) {
for (j in 1:(k - 1)) {
cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
}
}
}
Line 109 that is throwing the warning:
target += std_normal_lpdf(to_vector(z_1));
scholz
April 28, 2022, 9:31am
4
The only idea I currently have is to try out different initialization values via the init
argument. Try something small like 0.1 or even 0 just to see if that fixes the problem maybe?
@scholz I tried this with init = 0
and ran into the same issue. For some reason when I try init = 0.1
my models fail with the following error:
Stan model 'model_40c60317027a903f80c0cf96b9b92a89-202205061603-1-3da6bd' does not contain samples after warmup.
Error in validate_rhat(s$summary[, "Rhat"]) : is.numeric(x) is not TRUE
You could try starting with a simpler model and then slowly building up to the more complex ones. Might help you identify what breaks the model.
1 Like
@scholz Thanks for this suggestion - very helpful. The problem seems to be with the addition of a time-varying term in the scale model. Currently running into the following error when trying to estimate this model:
Error in if (any(efbmi_per_chain < threshold)) { :
missing value where TRUE/FALSE needed
Calls: brm_multiple -> <Anonymous> -> value.Future -> signalConditions
Going to start a separate thread for this particular issue.