Rejecting initial values in location scale models (BRMS)

I am running location scale models (for my first time) in R using brms and running into a host of convergence issues (no issues when running a location-only model). One issue that may be related concerns the starting values for the scale parameters. The error (repeated dozens of times):

Chain 4 Rejecting initial value
Chain 4 Error evaluating the log probability at the initial value.
Chain 4 Exception: normal_lpdf: Scale parameter[941] is 0, but must be positive! (in '/tmp/RtmpC3QSN2/model-e45b353285cf.stan', line 109, column 4 to column 41)
Chain 4 Exception: normal_lpdf: Scale parameter[941] is 0, but must be positive! (in '/tmp/RtmpC3QSN2/model-e45b353285cf.stan', line 109, column 4 to column 41)   

Is there a way to prevent this from happening within brms?

My model code:

priors_h1ab <- c(set_prior("normal(50, 10)", class = "b", coef = "Intercept"),
                 set_prior("normal(0, 0.5)", class = "b", coef = "loneliness_state_pmc"))

model_h1ab <- bf(depressedmood_state ~ 0 + Intercept + dayNumber + pingNumber + weekend + 
                   loneliness_state_pmc + loneliness_trait_gmc + loneliness_state_pmc:loneliness_trait_gmc +
                   (0 + Intercept + loneliness_state_pmc | corr | participantId) +
                   ar(time = pingTotal, gr = participantId, p = 1),
                 
                 sigma ~ 1 + dayNumber + pingNumber + weekend + +loneliness_state_pmc +
                   loneliness_trait_gmc + loneliness_state_pmc: loneliness_trait_gmc + 
                   (1 + loneliness_state_pmc | corr | participantId))

fit_h1ab <- brm_multiple(formula = model_h1ab, data = ema_imp, family = gaussian(),
                         prior = priors_h1ab, sample_prior = "yes", iter = 3000, 
                         chains = 4, cores = 4, backend = "cmdstanr", seed = 50211)

Could you check if the model is using a log link on sigma?

@scholz It looks like it is. Here’s the stan code generated by brms:

// generated with brms 2.16.3
functions {
  /* compute correlated group-level effects
  * Args: 
    *   z: matrix of unscaled group-level effects
  *   SD: vector of standard deviation parameters
  *   L: cholesky factor correlation matrix
  * Returns: 
    *   matrix of scaled group-level effects
  */ 
    matrix scale_r_cor(matrix z, vector SD, matrix L) {
      // r is stored in another dimension order than z
      return transpose(diag_pre_multiply(SD, L) * z);
    }
}
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  // data needed for ARMA correlations
  int<lower=0> Kar;  // AR order
  int<lower=0> Kma;  // MA order
  // number of lags per observation
  int<lower=0> J_lag[N];
  int<lower=1> K_sigma;  // number of population-level effects
  matrix[N, K_sigma] X_sigma;  // population-level design matrix
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  vector[N] Z_1_2;
  vector[N] Z_1_sigma_3;
  vector[N] Z_1_sigma_4;
  int<lower=1> NC_1;  // number of group-level correlations
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  int max_lag = max(Kar, Kma);
  int Kc_sigma = K_sigma - 1;
  matrix[N, Kc_sigma] Xc_sigma;  // centered version of X_sigma without an intercept
  vector[Kc_sigma] means_X_sigma;  // column means of X_sigma before centering
  for (i in 2:K_sigma) {
    means_X_sigma[i - 1] = mean(X_sigma[, i]);
    Xc_sigma[, i - 1] = X_sigma[, i] - means_X_sigma[i - 1];
  }
}
parameters {
  vector[K] b;  // population-level effects
  vector[Kar] ar;  // autoregressive coefficients
  vector[Kc_sigma] b_sigma;  // population-level effects
  real Intercept_sigma;  // temporary intercept for centered predictors
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  matrix[M_1, N_1] z_1;  // standardized group-level effects
  cholesky_factor_corr[M_1] L_1;  // cholesky factor of correlation matrix
}
transformed parameters {
  matrix[N_1, M_1] r_1;  // actual group-level effects
  // using vectors speeds up indexing in loops
  vector[N_1] r_1_1;
  vector[N_1] r_1_2;
  vector[N_1] r_1_sigma_3;
  vector[N_1] r_1_sigma_4;
  // compute actual group-level effects
  r_1 = scale_r_cor(z_1, sd_1, L_1);
  r_1_1 = r_1[, 1];
  r_1_2 = r_1[, 2];
  r_1_sigma_3 = r_1[, 3];
  r_1_sigma_4 = r_1[, 4];
}
model {
  // likelihood including constants
  if (!prior_only) {
    // matrix storing lagged residuals
    matrix[N, max_lag] Err = rep_matrix(0, N, max_lag);
    vector[N] err;  // actual residuals
    // initialize linear predictor term
    vector[N] mu = X * b;
    // initialize linear predictor term
    vector[N] sigma = Intercept_sigma + Xc_sigma * b_sigma;
    for (n in 1:N) {
      // add more terms to the linear predictor
      mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_1_2[J_1[n]] * Z_1_2[n];
    }
    for (n in 1:N) {
      // add more terms to the linear predictor
      sigma[n] += r_1_sigma_3[J_1[n]] * Z_1_sigma_3[n] + r_1_sigma_4[J_1[n]] * Z_1_sigma_4[n];
    }
    for (n in 1:N) {
      // apply the inverse link function
      sigma[n] = exp(sigma[n]);
    }
    // include ARMA terms
    for (n in 1:N) {
      err[n] = Y[n] - mu[n];
      for (i in 1:J_lag[n]) {
        Err[n + 1, i] = err[n + 1 - i];
      }
      mu[n] += Err[n, 1:Kar] * ar;
    }
    target += normal_lpdf(Y | mu, sigma);
  }
  // priors including constants
  target += student_t_lpdf(Intercept_sigma | 3, 0, 2.5);
  target += student_t_lpdf(sd_1 | 3, 0, 15.6)
  - 4 * student_t_lccdf(0 | 3, 0, 15.6);
  target += std_normal_lpdf(to_vector(z_1));
  target += lkj_corr_cholesky_lpdf(L_1 | 1);
}
generated quantities {
  // actual population-level intercept
  real b_sigma_Intercept = Intercept_sigma - dot_product(means_X_sigma, b_sigma);
  // compute group-level correlations
  corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
  vector<lower=-1,upper=1>[NC_1] cor_1;
  // extract upper diagonal of correlation matrix
  for (k in 1:M_1) {
    for (j in 1:(k - 1)) {
      cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
    }
  }
}

Line 109 that is throwing the warning:

  target += std_normal_lpdf(to_vector(z_1));

The only idea I currently have is to try out different initialization values via the init argument. Try something small like 0.1 or even 0 just to see if that fixes the problem maybe?

@scholz I tried this with init = 0 and ran into the same issue. For some reason when I try init = 0.1 my models fail with the following error:

Stan model 'model_40c60317027a903f80c0cf96b9b92a89-202205061603-1-3da6bd' does not contain samples after warmup.
Error in validate_rhat(s$summary[, "Rhat"]) : is.numeric(x) is not TRUE

You could try starting with a simpler model and then slowly building up to the more complex ones. Might help you identify what breaks the model.

1 Like

@scholz Thanks for this suggestion - very helpful. The problem seems to be with the addition of a time-varying term in the scale model. Currently running into the following error when trying to estimate this model:

Error in if (any(efbmi_per_chain < threshold)) { :
   missing value where TRUE/FALSE needed
Calls: brm_multiple -> <Anonymous> -> value.Future -> signalConditions  

Going to start a separate thread for this particular issue.