[Clarification] Divergences during sample_prior = "only" model fitting

I am running a few predictive prior checks using brm(model, data, prior, sample_prior=“only”).
To my surprise, I am seeing quite a lot of divergences during the “model fitting”.
If the sampling is indeed only happening for the prior, I am not sure how I should interpret the divergences.

Same way you would if you condition on the data. But in your situation, the funnel-like or other weird geometry is due to your priors rather than a result of conditioning on the data. So, I’m guessing some of your priors are improper.

Thanks! That’s what I was suspecting.

What kind of model are you fitting exactly?

The problems may arise because some priors need to be actually sampled using NUTS rather than using _rng function in the generated quantities. This holds basically for all priors, which have lower or upper bounds. For instance, a half-normal prior is only implicitely coded in Stan and does not have a correspending _rng function.

It is the usual circular inference model. We have no bounded priors, so the issue might (still) be in the model:

circular_prior = c(prior(normal(0,1), nlpar = "wSelf"),
                   prior(normal(0,1), nlpar = "wOthers"),
                   prior(normal(0,1), nlpar = "aSelf"),
                   prior(normal(0,1), nlpar = "aOthers"),
                   prior(normal(0,1), nlpar = "bias"),
                   prior_string("target += normal_lpdf(sd_1 | 0, 1) - 1 * normal_lccdf(0 | 0, 1)", check = FALSE),
                   prior_("lkj(5)", class = "cor"))

F_stancode = "
real F3(real a_raw, real L_raw, real w_raw) {
real a;
real L;
real w;
a = exp(a_raw);
L = exp(L_raw * a);
w = 0.5 + inv_logit(w_raw)/2;
return log((w * L + 1 - w)./((1 - w) * L + w));
}
"
F3 <- function(a_raw, L_raw, w_raw) {
    # used by brms to back-translate
    a = exp(a_raw)
    L = exp(L_raw * a)
    w = .5 + inv.logit(w_raw) / 2
    log((w * L + 1 - w)/((1 - w) * L + w))
    
}

circular_f = bf(l_confidence ~
                   F3(0, l_prior + I, wSelf) +
                   F3(0, l_sensory + I, wOthers),
               wSelf + wOthers + bias ~ 1 + (1|p|Participant) ,
               aSelf + aOthers ~ 1 + (1|p|Participant) ,
               nlf(I ~ F3(aSelf, l_prior, wSelf) +F3(aOthers, l_sensory, wOthers)),
               nl = TRUE)

prior_circular_m = brm(circular_f,
               combined_data,
               stan_funs = F_stancode,
               prior = circular_prior,
               sample_prior="only",
               chains = N_CORES, cores = N_CORES, 
               iter = ITER, 
               refresh = 50,
               control = STAN_CONTROL)

The Prior on sd_1 looks like a bounded prior to me.

Also what is the reason of specfiying this prior so awkwardly instead of using a standard prior specification with class = "sd" and an appropriate value for argument coef?

You are perfectly right, sorry :-)
The problem though persists even without priors on sd:

circular_prior = c(prior(normal(0,1), nlpar = "wSelf"),
                   prior(normal(0,1), nlpar = "wOthers"),
                   prior(normal(0,1), nlpar = "aSelf"),
                   prior(normal(0,1), nlpar = "aOthers"),
                   prior(normal(0,1), nlpar = "bias"),
                   prior_("lkj(5)", class = "cor"))

Don’t forget we still have default priors on the SDs, which are (half)Student-t(3, 0, 10) priors and I can imagine that even though we have 3 degrees of freedom, NUTS may have problems to sample from this prior.

right :-)

Do the divergences go away if you choose another prior on the SDs, for instance a half-normal prior?

I run a series of tests with priors like the following

circular_prior = c(prior(normal(0, 1), nlpar = "wSelf"),
                   prior(normal(0, 1), nlpar = "wOthers"),
                   prior(normal(0, 1), nlpar = "aSelf"),
                   prior(normal(0, 1), nlpar = "aOthers"),
                   prior(normal(0, 1), nlpar = "bias"),
                   prior(normal(0, .5), nlpar = "wSelf",class="sd"),
                   prior(normal(0, .5), nlpar = "wOthers",class="sd"),
                   prior(normal(0, .5), nlpar = "aSelf",class="sd"),
                   prior(normal(0, .5), nlpar = "aOthers",class="sd"),
                   prior(normal(0, .5), nlpar = "bias",class="sd"),
                   prior_("lkj(5)", class = "cor"))

The divergences still happen til I reduce the sd prior to variance equal or less than normal(0, .2), tho’ that might be too narrow, since at that point when adding likelihood the posterior does not move compared to the prior