I am trying to understand the reason the sampler is having difficulties with the following model and hopefully find an alternative formulation:

```
library(brms)
library(rstan)
formula <- brmsformula(data ~ 1)
code <- make_stancode(formula, data, skew_normal)
writeLines(code, 'model.stan')
model <- stan_model('model.stan')
fit <- sampling(model, data = list(N = length(data), Y = data, prior_only = 0))
```

Invoking `pairs`

gives the following:

If you look exclusively at the parameters (`Intercept`

, `sigma`

, and `alpha`

), the scatter plots look good. There are no any funnels or alike, which is what is usually discussed when explaining divergent iterations. Yet, divergent iterations are scattered all over the place.

Accidentally, I included `lp__`

and `energy__`

in the graph and observed what one can see in the above picture. The shapes are not longer that well behaved.

I would you to get some help interpreting these plots. What is going on, and what can be done about it? Thank you!

1 Like

Hi, I think your MWE is missing the source of the `data`

object - I’m unable to generate the Stan code. Without the model it’s hard to tell what’s going on. Maybe you can upload the ‘model.stan’ file, that would be helpful!

Indeed, I left out the data. Here is a histogram:

The code is as follows:

```
// generated with brms 2.13.5
functions {
}
data {
int<lower=1> N; // number of observations
vector[N] Y; // response variable
int prior_only; // should the likelihood be ignored?
}
transformed data {
real sqrt_2_div_pi = sqrt(2 / pi());
}
parameters {
real Intercept; // temporary intercept for centered predictors
real<lower=0> sigma; // residual SD
real alpha; // skewness parameter
}
transformed parameters {
}
model {
// initialize linear predictor term
vector[N] mu = Intercept + rep_vector(0, N);
// parameters used to transform the skew-normal distribution
real delta; // transformed alpha parameter
real omega; // scale parameter
// use efficient skew-normal parameterization
delta = alpha / sqrt(1 + alpha^2);
omega = sigma / sqrt(1 - sqrt_2_div_pi^2 * delta^2);
for (n in 1:N) {
mu[n] = mu[n] - omega * delta * sqrt_2_div_pi;
}
// priors including all constants
target += student_t_lpdf(Intercept | 3, 0.3, 2.5);
target += student_t_lpdf(sigma | 3, 0, 2.5)
- 1 * student_t_lccdf(0 | 3, 0, 2.5);
target += normal_lpdf(alpha | 0, 4);
// likelihood including all constants
if (!prior_only) {
target += skew_normal_lpdf(Y | mu, omega, alpha);
}
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept;
}
```

Honestly I don’t know anything about the skew normal distribution, but looking at it’s wikipedia page, it seems like brms is attempting a noncentral parameterization by solving for the mean (`Intercept`

) and the sd (`sigma`

). The parameters in the initial parameter space are not correlated, but there is dependence in posterior via the transformation. This is usually done to remove the issue you are facing, but it might explain the funneling effects you see with `lp__`

. Also, even though there are no funnels in the model parameters, they are still correlated to some degree.

The first thing I would try is to fit the naïve, centrally parameterized model

```
model {
target += student_t_lpdf(Intercept | 3, 0.3, 2.5);
target += student_t_lpdf(sigma | 3, 0, 2.5)
- 1 * student_t_lccdf(0 | 3, 0, 2.5);
target += normal_lpdf(alpha | 0, 4);
target += skew_normal_lpdf(Y | Intercept, sigma, alpha);
}
```

and see what happens.

1 Like