Beta regression - help with error evaluating the log probability at the initial value

Hi everyone,
I’ve had some trouble fitting a beta regression model, with an error relating to the initial value. Notably, using these same priors in similar data I haven’t had any issues, though those models involved fewer predictors. This model did eventually compute, but starts out by throwing an initialisation error. Several times it just failed to initialise at all and just didn’t sample. The error/warning message was:

Chain 4: Rejecting initial value:
Chain 4: Error evaluating the log probability at the initial value.
Chain 4: Exception: beta_lpdf: Second shape parameter[17] is 0, but must be > 0! (in ‘model20a4383764c7_3cc7b493a3f2d2c6fdea015d730f6238’ at line 58)

This offending line would appear to be:

target += beta_lpdf(Y | mu * phi, (1 - mu) * phi)

Is there are way to solve this initialisation issue? Clearly from the message, the issue is something to do with stopping the model from trying to initialise at 0, but I am wondering what to change in order that it avoids that. I am using brms rather than stan directly, so if there is a way this can be solved within brms then that would also be great!

As my first thought, I wonder if it is to do with this line regarding phi, which seems to put a lower bound of 0 on it, when phi should be positive and likely is also the shape parameter referred to in the warning:

real<lower=0> phi

However, as I usually use brms, I don’t really know exactly how to specify things in stan itself.

// generated with brms 2.12.0
functions {
}
data {
  int<lower=1> N;  // number of observations
  vector[N] Y;  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
}
parameters {
  vector[K] b;  // population-level effects
  real<lower=0> phi;  // precision parameter
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  vector[N_1] z_1[M_1];  // standardized group-level effects
}
transformed parameters {
  vector[N_1] r_1_1;  // actual group-level effects
  r_1_1 = (sd_1[1] * (z_1[1]));
}
model {
  // initialize linear predictor term
  vector[N] mu = X * b;
  for (n in 1:N) {
    // add more terms to the linear predictor
    mu[n] += r_1_1[J_1[n]] * Z_1_1[n];
  }
  for (n in 1:N) {
    // apply the inverse link function
    mu[n] = inv_logit(mu[n]);
  }
  // priors including all constants
  target += normal_lpdf(b[1] | 0 , 1);
  target += normal_lpdf(b[2] | 0 , 1);
  target += normal_lpdf(b[3] | 0 , 1);
  target += normal_lpdf(b[4] | 0 , .125);
  target += normal_lpdf(b[5] | 0 , .125);
  target += normal_lpdf(b[6] | 0 , .05);
  target += normal_lpdf(b[7] | 0 , .05);
  target += normal_lpdf(b[8] | 0 , .05);
  target += normal_lpdf(b[9] | 0 , .05);
  target += normal_lpdf(b[10] | 0 , .05);
  target += normal_lpdf(b[11] | 0 , .05);
  target += gamma_lpdf(phi | 0.01, 0.01);
  target += student_t_lpdf(sd_1 | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10);
  target += normal_lpdf(z_1[1] | 0, 1);
  // likelihood including all constants
  if (!prior_only) {
    target += beta_lpdf(Y | mu * phi, (1 - mu) * phi);
  }
}
generated quantities {
}

Any help would be much appreciated!

As one further note, I saw some messages in other people’s posts about similar issues indicating that their priors were not very good or ‘non-differentiable’ - is there a way to know if your priors are non-differentiable?!

1 Like

Since the error mentions the second shape parameter, my guess is that (1 - mu) is zero. It is not clear whether X has been scaled to have a mean of zero and sd = 1. From the few times I have used BRMS, I thought that was the standard behavior but I could be wrong. The initial values of b will be in the range [-2, 2]. If some values in X are large enough, inv_logit(mu[n]) could be 1 and 1 - mu would then be zero.

I am just a novice with Stan, so treat my suggestions with caution.

2 Likes

You can try using the beta proportion distribution, 19.2 Beta Proportion Distribution | Stan Functions Reference

or as a quick fix use:

target += beta_lpdf(Y | mu * phi + 1.0e-9, (1 - mu) * phi + 1.0e-9)

You may also specify initial values to phi, thanks to @FJCC.

2 Likes

Thanks for your responses both of you - I will try these out tomorrow and see how it goes! Regarding your quick fix suggestion, I see the point being to just make sure these don’t evaluate to 0. Could you actually interpret the output having used this fix - as in, would the results be reliable because all you’ve done is add a tiny amount to get the iterations going, or is it just a quick way to get the estimation running that you couldn’t reliably interpret?

@andre.pfeuffer and @FJCC thanks for your suggestions. I actually think I may have managed to avoid the issues another way. I realised that for several of the predictors, the values stan would be trying to estimate would always be very low and need to be assessed at a level beyond 2 decimal places to be interpretable. This may be completely wrong but it seemed that maybe this might cause some difficulties in estimation if the parameters had to be very small and precise. In any case, I just divided the predictors by a constant so that their estimated effects would be larger. I then found that running the regression, this issue simply does not come up and all the chains run very fast, and I also get a larger nEff.

1 Like