Initialisation issues when using student-t for hierarchical priors

I am trying to set hierarchical priors in my brms model. I am following the approach described by @wpetry here and @paul.buerkner here. However, instead of a normal distribution, I want to use a student-t distribution.

The approach described with a normal distribution is like the following and works fine:

# simulate data
set.seed(1)
library(tidyverse)
library(brms)
options(mc.cores = parallel::detectCores())

data <- tibble(x_1 = runif(100, min = 0, max = 1)) |>
  rowwise() |>
  mutate(x_2 = rgamma(n = 1, 1/x_1, 2),
       y = rnbinom(n = 1, size = x_2, prob = x_1))

# fit model
bpriors <- prior(normal(3, sigma), class = "sd") +
  prior("target += exponential_lpdf(sigma | 1)", check = FALSE)
  
stanvars <- stanvar(scode = "real<lower = 0> sigma;",
                    block = "parameters")

m1 <- brms::brm(
    bf(
      y ~ (1 | x_1)
    ),
    prior = bpriors,
    family = zero_inflated_negbinomial(link = "log", link_shape = "log", link_zi = "logit"),
    data = data,
    stanvars = stanvars
  )

Adapting this approach to student-t, I wrote the following code:

bpriors2 <- prior(student_t(3, 0, sigma), class = "sd") +
  prior("target += exponential_lpdf(sigma | 1)", check = FALSE)

m2 <- brms::brm(
    bf(
      y ~ (1 | x_1)
    ),
    prior = bpriors2,
    family = zero_inflated_negbinomial(link = "log", link_shape = "log", link_zi = "logit"),
    data = data,
    stanvars = stanvars
  )

However, this model can no longer find initial points. I tried the suggestions here on changing the initial points but that didn’t help.

Does anyone have any suggestions on how to get this model to fit?

  • Operating system: macOS Big Sur 11.5.2
  • brms version: 2.16.1

Are there any resources available to help me answer this?

Sorry your question isn’t being answered, @jobi. I’m afraid I don’t know brms well enough to answer myself and you’ve already pinged @paul.buerkner.

What’s the motivation for Student-t priors? Unless you have a model operating at multiple scales, it’s usually sufficient to just use a normal prior that roughly identifies the scale (what we call “weakly informative”).

Thanks for replying @Bob_Carpenter!

For context, the model is for the UK government assessment of fish catches. There is a wide variety of catch numbers between different species, with a few species caught far more than any others. I felt that a Student-t prior on the species random effect would capture this variety in scale better than a normal prior. Does this sound like a situation where a normal prior might be sufficient?

Please find below the Stan code that brms compiles for m2. Perhaps this will help a broader group of people in this forum identify where the initialisation error may be coming from.

// generated with brms 2.16.1
functions {
  /* zero-inflated negative binomial log-PDF of a single response 
   * Args: 
   *   y: the response value 
   *   mu: mean parameter of negative binomial distribution
   *   phi: shape parameter of negative binomial distribution
   *   zi: zero-inflation probability
   * Returns:  
   *   a scalar to be added to the log posterior 
   */ 
  real zero_inflated_neg_binomial_lpmf(int y, real mu, real phi, 
                                       real zi) { 
    if (y == 0) { 
      return log_sum_exp(bernoulli_lpmf(1 | zi), 
                         bernoulli_lpmf(0 | zi) + 
                         neg_binomial_2_lpmf(0 | mu, phi)); 
    } else { 
      return bernoulli_lpmf(0 | zi) +  
             neg_binomial_2_lpmf(y | mu, phi); 
    } 
  } 
  /* zero-inflated negative binomial log-PDF of a single response 
   * logit parameterization of the zero-inflation part
   * Args: 
   *   y: the response value 
   *   mu: mean parameter of negative binomial distribution
   *   phi: shape parameter of negative binomial distribution
   *   zi: linear predictor for zero-inflation part
   * Returns:  
   *   a scalar to be added to the log posterior 
   */ 
  real zero_inflated_neg_binomial_logit_lpmf(int y, real mu, 
                                             real phi, real zi) { 
    if (y == 0) { 
      return log_sum_exp(bernoulli_logit_lpmf(1 | zi), 
                         bernoulli_logit_lpmf(0 | zi) + 
                         neg_binomial_2_lpmf(0 | mu, phi)); 
    } else { 
      return bernoulli_logit_lpmf(0 | zi) +  
             neg_binomial_2_lpmf(y | mu, phi); 
    } 
  }
  /* zero-inflated negative binomial log-PDF of a single response 
   * log parameterization for the negative binomial part
   * Args: 
   *   y: the response value 
   *   eta: linear predictor for negative binomial distribution 
   *   phi: shape parameter of negative binomial distribution
   *   zi: zero-inflation probability
   * Returns:  
   *   a scalar to be added to the log posterior 
   */ 
  real zero_inflated_neg_binomial_log_lpmf(int y, real eta, 
                                           real phi, real zi) { 
    if (y == 0) { 
      return log_sum_exp(bernoulli_lpmf(1 | zi), 
                         bernoulli_lpmf(0 | zi) + 
                         neg_binomial_2_log_lpmf(0 | eta, phi)); 
    } else { 
      return bernoulli_lpmf(0 | zi) +  
             neg_binomial_2_log_lpmf(y | eta, phi); 
    } 
  } 
  /* zero-inflated negative binomial log-PDF of a single response
   * log parameterization for the negative binomial part
   * logit parameterization of the zero-inflation part
   * Args: 
   *   y: the response value 
   *   eta: linear predictor for negative binomial distribution 
   *   phi: shape parameter of negative binomial distribution
   *   zi: linear predictor for zero-inflation part 
   * Returns:  
   *   a scalar to be added to the log posterior 
   */ 
  real zero_inflated_neg_binomial_log_logit_lpmf(int y, real eta, 
                                                 real phi, real zi) { 
    if (y == 0) { 
      return log_sum_exp(bernoulli_logit_lpmf(1 | zi), 
                         bernoulli_logit_lpmf(0 | zi) + 
                         neg_binomial_2_log_lpmf(0 | eta, phi)); 
    } else { 
      return bernoulli_logit_lpmf(0 | zi) +  
             neg_binomial_2_log_lpmf(y | eta, phi); 
    } 
  }
  // zero_inflated negative binomial log-CCDF and log-CDF functions
  real zero_inflated_neg_binomial_lccdf(int y, real mu, real phi, real hu) { 
    return bernoulli_lpmf(0 | hu) + neg_binomial_2_lccdf(y | mu, phi);
  }
  real zero_inflated_neg_binomial_lcdf(int y, real mu, real phi, real hu) { 
    return log1m_exp(zero_inflated_neg_binomial_lccdf(y | mu, phi, hu));
  }
}
data {
  int<lower=1> N;  // total number of observations
  int Y[N];  // response variable
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
}
parameters {
  real Intercept;  // temporary intercept for centered predictors
  real<lower=0> shape;  // shape parameter
  real<lower=0,upper=1> zi;  // zero-inflation probability
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  vector[N_1] z_1[M_1];  // standardized group-level effects
  real<lower = 0> sigma;
}
transformed parameters {
  vector[N_1] r_1_1;  // actual group-level effects
  r_1_1 = (sd_1[1] * (z_1[1]));
}
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] mu = Intercept + rep_vector(0.0, N);
    for (n in 1:N) {
      // add more terms to the linear predictor
      mu[n] += r_1_1[J_1[n]] * Z_1_1[n];
    }
    for (n in 1:N) {
      target += zero_inflated_neg_binomial_log_lpmf(Y[n] | mu[n], shape, zi);
    }
  }
  // priors including constants
  target += student_t_lpdf(Intercept | 3, -2.3, 2.5);
  target += gamma_lpdf(shape | 0.01, 0.01);
  target += beta_lpdf(zi | 1, 1);
  target += student_t_lpdf(sd_1 | 3, 0, sigma)
    - 1 * student_t_lccdf(0 | 3, 0, sigma);
  target += std_normal_lpdf(z_1[1]);
  target += exponential_lpdf(sigma | 1);
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept;
}

I don’t quite understand why you are adding sigma here in the first place. You already have a standard deviation parameter by design of the model (class sd) and now you are adding yet another one on top (sigma) that is only informed by this 1 standard deviation parameter. This cannot go well. What you did was not to set a normal or student t prior on the random effects but a (half) normal or student t prior on their SD with yet another hyperparameter. If you really want to use a student-t random effects you drop your prior statements and speciy in the formula y ~ (1 | gr(x_1, dist = "student")). Please note that this feature is experimental, not documented and not officially supported.

3 Likes