[Clarification] Divergences during sample_prior = "only" model fitting

fusaroli · June 16, 2018, 3:44pm

I am running a few predictive prior checks using brm(model, data, prior, sample_prior=“only”).
To my surprise, I am seeing quite a lot of divergences during the “model fitting”.
If the sampling is indeed only happening for the prior, I am not sure how I should interpret the divergences.

bgoodri · June 16, 2018, 3:53pm

Same way you would if you condition on the data. But in your situation, the funnel-like or other weird geometry is due to your priors rather than a result of conditioning on the data. So, I’m guessing some of your priors are improper.

fusaroli · June 16, 2018, 3:59pm

Thanks! That’s what I was suspecting.

paul.buerkner · June 18, 2018, 12:16pm

What kind of model are you fitting exactly?

The problems may arise because some priors need to be actually sampled using NUTS rather than using _rng function in the generated quantities. This holds basically for all priors, which have lower or upper bounds. For instance, a half-normal prior is only implicitely coded in Stan and does not have a correspending _rng function.

fusaroli · June 25, 2018, 1:45pm

It is the usual circular inference model. We have no bounded priors, so the issue might (still) be in the model:

circular_prior = c(prior(normal(0,1), nlpar = "wSelf"),
                   prior(normal(0,1), nlpar = "wOthers"),
                   prior(normal(0,1), nlpar = "aSelf"),
                   prior(normal(0,1), nlpar = "aOthers"),
                   prior(normal(0,1), nlpar = "bias"),
                   prior_string("target += normal_lpdf(sd_1 | 0, 1) - 1 * normal_lccdf(0 | 0, 1)", check = FALSE),
                   prior_("lkj(5)", class = "cor"))

F_stancode = "
real F3(real a_raw, real L_raw, real w_raw) {
real a;
real L;
real w;
a = exp(a_raw);
L = exp(L_raw * a);
w = 0.5 + inv_logit(w_raw)/2;
return log((w * L + 1 - w)./((1 - w) * L + w));
}
"
F3 <- function(a_raw, L_raw, w_raw) {
    # used by brms to back-translate
    a = exp(a_raw)
    L = exp(L_raw * a)
    w = .5 + inv.logit(w_raw) / 2
    log((w * L + 1 - w)/((1 - w) * L + w))
    
}

circular_f = bf(l_confidence ~
                   F3(0, l_prior + I, wSelf) +
                   F3(0, l_sensory + I, wOthers),
               wSelf + wOthers + bias ~ 1 + (1|p|Participant) ,
               aSelf + aOthers ~ 1 + (1|p|Participant) ,
               nlf(I ~ F3(aSelf, l_prior, wSelf) +F3(aOthers, l_sensory, wOthers)),
               nl = TRUE)

prior_circular_m = brm(circular_f,
               combined_data,
               stan_funs = F_stancode,
               prior = circular_prior,
               sample_prior="only",
               chains = N_CORES, cores = N_CORES, 
               iter = ITER, 
               refresh = 50,
               control = STAN_CONTROL)

paul.buerkner · June 25, 2018, 3:14pm

The Prior on sd_1 looks like a bounded prior to me.

Also what is the reason of specfiying this prior so awkwardly instead of using a standard prior specification with class = "sd" and an appropriate value for argument coef?

fusaroli · June 25, 2018, 6:42pm

You are perfectly right, sorry :-)
The problem though persists even without priors on sd:

circular_prior = c(prior(normal(0,1), nlpar = "wSelf"),
                   prior(normal(0,1), nlpar = "wOthers"),
                   prior(normal(0,1), nlpar = "aSelf"),
                   prior(normal(0,1), nlpar = "aOthers"),
                   prior(normal(0,1), nlpar = "bias"),
                   prior_("lkj(5)", class = "cor"))

paul.buerkner · June 25, 2018, 8:03pm

Don’t forget we still have default priors on the SDs, which are (half)Student-t(3, 0, 10) priors and I can imagine that even though we have 3 degrees of freedom, NUTS may have problems to sample from this prior.

fusaroli · June 26, 2018, 8:43am

right :-)

paul.buerkner · June 26, 2018, 10:31am

Do the divergences go away if you choose another prior on the SDs, for instance a half-normal prior?

fusaroli · June 26, 2018, 8:02pm

I run a series of tests with priors like the following

circular_prior = c(prior(normal(0, 1), nlpar = "wSelf"),
                   prior(normal(0, 1), nlpar = "wOthers"),
                   prior(normal(0, 1), nlpar = "aSelf"),
                   prior(normal(0, 1), nlpar = "aOthers"),
                   prior(normal(0, 1), nlpar = "bias"),
                   prior(normal(0, .5), nlpar = "wSelf",class="sd"),
                   prior(normal(0, .5), nlpar = "wOthers",class="sd"),
                   prior(normal(0, .5), nlpar = "aSelf",class="sd"),
                   prior(normal(0, .5), nlpar = "aOthers",class="sd"),
                   prior(normal(0, .5), nlpar = "bias",class="sd"),
                   prior_("lkj(5)", class = "cor"))

The divergences still happen til I reduce the sd prior to variance equal or less than normal(0, .2), tho’ that might be too narrow, since at that point when adding likelihood the posterior does not move compared to the prior

Topic		Replies	Views
Divergence warning when sampling from priors with order restrictions Modeling fitting-issues , brms	2	836	September 7, 2022
Weird output for a very simple model with sample_prior = "only" brms prior-predictive	16	1668	September 22, 2020
Meaning of divergences in prior predictive checks Modeling	9	1701	September 29, 2020
Divergences when adding prior sampling Modeling cmdstanr , cognitive-science	4	747	September 6, 2021
One chain not moving: sampling for complex nonlinear mixed effects model brms techniques , fitting-issues , specification , ecology	14	3517	November 6, 2020

[Clarification] Divergences during sample_prior = "only" model fitting

Related topics