Pathologies with respect to lp__ and energy__

I am trying to understand the reason the sampler is having difficulties with the following model and hopefully find an alternative formulation:

library(brms)
library(rstan)

formula <- brmsformula(data ~ 1)
code <- make_stancode(formula, data, skew_normal)
writeLines(code, 'model.stan')

model <- stan_model('model.stan')
fit <- sampling(model, data = list(N = length(data), Y = data, prior_only = 0))

Invoking pairs gives the following:

If you look exclusively at the parameters (Intercept, sigma, and alpha), the scatter plots look good. There are no any funnels or alike, which is what is usually discussed when explaining divergent iterations. Yet, divergent iterations are scattered all over the place.

Accidentally, I included lp__ and energy__ in the graph and observed what one can see in the above picture. The shapes are not longer that well behaved.

I would you to get some help interpreting these plots. What is going on, and what can be done about it? Thank you!

1 Like

Hi, I think your MWE is missing the source of the data object - I’m unable to generate the Stan code. Without the model it’s hard to tell what’s going on. Maybe you can upload the ‘model.stan’ file, that would be helpful!

Indeed, I left out the data. Here is a histogram:

The code is as follows:

// generated with brms 2.13.5
functions {
}
data {
  int<lower=1> N;  // number of observations
  vector[N] Y;  // response variable
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  real sqrt_2_div_pi = sqrt(2 / pi());
}
parameters {
  real Intercept;  // temporary intercept for centered predictors
  real<lower=0> sigma;  // residual SD
  real alpha;  // skewness parameter
}
transformed parameters {
}
model {
  // initialize linear predictor term
  vector[N] mu = Intercept + rep_vector(0, N);
  // parameters used to transform the skew-normal distribution
  real delta;  // transformed alpha parameter
  real omega;  // scale parameter
   // use efficient skew-normal parameterization
  delta = alpha / sqrt(1 + alpha^2);
  omega = sigma / sqrt(1 - sqrt_2_div_pi^2 * delta^2);
  for (n in 1:N) {
    mu[n] = mu[n] - omega * delta * sqrt_2_div_pi;
  }
  // priors including all constants
  target += student_t_lpdf(Intercept | 3, 0.3, 2.5);
  target += student_t_lpdf(sigma | 3, 0, 2.5)
    - 1 * student_t_lccdf(0 | 3, 0, 2.5);
  target += normal_lpdf(alpha | 0, 4);
  // likelihood including all constants
  if (!prior_only) {
    target += skew_normal_lpdf(Y | mu, omega, alpha);
  }
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept;
}

Honestly I don’t know anything about the skew normal distribution, but looking at it’s wikipedia page, it seems like brms is attempting a noncentral parameterization by solving for the mean (Intercept) and the sd (sigma). The parameters in the initial parameter space are not correlated, but there is dependence in posterior via the transformation. This is usually done to remove the issue you are facing, but it might explain the funneling effects you see with lp__. Also, even though there are no funnels in the model parameters, they are still correlated to some degree.

The first thing I would try is to fit the naïve, centrally parameterized model

model {
  target += student_t_lpdf(Intercept | 3, 0.3, 2.5);
  target += student_t_lpdf(sigma | 3, 0, 2.5)
    - 1 * student_t_lccdf(0 | 3, 0, 2.5);
  target += normal_lpdf(alpha | 0, 4);

  target += skew_normal_lpdf(Y | Intercept, sigma, alpha);
}

and see what happens.

1 Like