Advice setting powerscaled priors for longitudinal betabinomial spline model

I am modeling heroin use during the first year of opioid agonist treatment based on different levels of amphetamine use at start of treatment. I am using brms. Days of amphetamine use at baseline in the 28 days prior to start of treatment (ats_baseline) is a bounded count variable. Time from start of treatment is in years (variable yearsFromStart) modelled using a thin-plate regression spline to allow for departures from linearity. The outcome, days of heroin use in the 28 days prior to each measurement (heroin28), is also a bounded 0-28 count, hence I am using a betabinomial regression model.

Here is the dataset

atsUseInOTP_alloutcomes_noMissing_inf.RData (17.3 KB)

I want to perform a prior sensitivity check using the priorsense package. As @n-kall describes in this vignette for the priorsense package when using splines it is better to powerscale the predictions than the parameters themselves. This requires tagging each prior using the new prior syntax in brms. Like so.

fit_betabinomial_heroin_s2_ps <- brm(formula = heroin28 | trials(set) ~ ats_baseline + s(yearsFromStart) + s(yearsFromStart, ats_baseline) + (1 | encID),
                                     data = d,
                                     family = beta_binomial(),
                                     prior = c(prior(prior = normal(0,4),
                                                     class = "b",
                                                     coef = "ats_baseline",
                                                     tag = "b"),
                                               prior(prior = normal(0,4),
                                                     class = "b",
                                                     coef = "syearsFromStart_1",
                                                     tag = "b"),
                                               prior(prior = normal(0,4),
                                                     class = "b",
                                                     coef = "syearsFromStartats_baseline_1",
                                                     tag = "b"),
                                               prior(prior = normal(0,4),
                                                     class = "b",
                                                     coef = "syearsFromStartats_baseline_2",
                                                     tag = "b"),
                                               prior(prior = normal(0,10),
                                                     class = "sds",
                                                     coef = "s(yearsFromStart)",
                                                     tag = "sds"),
                                               prior(prior = normal(0,10),
                                                     class = "sds",
                                                     coef = "s(yearsFromStart, ats_baseline)",
                                                     tag = "sds")),
                                     save_pars = save_pars(all=TRUE),
                                     seed = 1234,
                                     refresh = 0,
                                     chains = 4,
                                     warmup = 1e3,
                                     iter = 2e3)

The predictions I would like to extract are levels of heroin28 at
each of five candidate levels of ats_baseline: 0, 7, 14, 21, and 28 days use, at each of five timepoints: 0 years, 0.25, 0.5, 0.75, and 1 year. Now we follow the procedure in the vignette. We bind the draws of the predictions and measures to the posterior draws using posterior::bind_draws(). For this we will use the predictions_as_draws helper function provided by priorsense which transforms predictions from brms to draws objects. The idea is that we are attaching predictions at various levels of the outcome to the posterior draws object, and then doing power-scaling on THOSE predictions to assess prior sensitivity.

# We first define a log-score function.

logscore <- function(x) {
  as.matrix(rowSums(log_lik(x)))
}

# now let's get the grid of predictions
tibble(encID=nEnc+1, set=28) %>%
  data_grid(yearsFromStart = seq(0,1,by=0.25),
            ats_baseline=c(0,7,14,21,28),
            encID,
            set) -> predGrid

# how many cells?
nGrid <- nrow(predGrid)

# and the names of each cell
expand_grid(paste0("yearsFromStart",seq(0,1,by=0.25)),
            paste0("ats_baseline",c(0,7,14,21,28))) %>%
  unite(col = "cell") %>%
    pull -> cellNames

# now to add the predictions to the posterior draws
post_draws_s2_ps <- as_draws_df(fit_betabinomial_heroin_s2_ps) %>%
                      bind_draws(log_lik_draws(fit_betabinomial_heroin_s2_ps)) %>%
                        bind_draws(predictions_as_draws(x = fit_betabinomial_heroin_s2_ps,
                                                        predict_fn = bayes_R2,
                                                        prediction_names = "R2",
                                                        summary = FALSE)) %>%
                          bind_draws(predictions_as_draws(x = fit_betabinomial_heroin_s2_ps,
                                                          predict_fn = logscore,
                                                          prediction_names = "logscore")) %>%
                            bind_draws(predictions_as_draws(x = fit_betabinomial_heroin_s2_ps,
                                                            predict_fn = posterior_epred,
                                                            prediction_names = cellNames,
                                                            newdata = predGrid,
                                                            allow_new_levels = TRUE))

# number of variables created in the function above
nVarPredDraws <- length(variables(as_draws(post_draws_s2_ps)))

# The first ten variables are the posterior estimates, and the variables we want to power scale, that we created in the post_draws_s2_ps object immediately above, were pasted onto the end of the posterior draws objects AFTER the R squared and logscore variable (i.e. the last 25 columns). So we want R-squared, logscore, plus all the new prediction columns. We'll extract their names first
variables(as_draws(post_draws_s2_ps))[c(1:10,(nVarPredDraws-(nGrid+1)):nVarPredDraws)] -> varsToPS

# finally to get the powerscale assessment for the coefs and the predictions. The function actually power-scales all priors unless we tell it what we are specifically interested in (specified above in varsToPS object), the fixed effects coefficients, R-squared, logscore, and the predictions
powerscale_sensitivity(x = post_draws_s2_ps,
                       variable=varsToPS) %>%
  tt() %>%
    format_tt(num_fmt="decimal") -> powerscale_sens_heroin_s2_ps

powerscale_sens_heroin_s2_ps

This is what should print in the viewer. Lots of variables so please excuse the lengthy output. You can see that there are quiet a few prior-data conflicts on the parameters

…but, as pointed out in the vignette, these are not what we are interested in. We are interested in the predictions. In the past I have had a lot of success with powerscaling these predictions using similar betabinomial models and the same priors. But this time, unfortunately there are some prior-data conflicts on the predictions as well.



Can anyone (e.g. @n-kall @avehtari?) give me some advice on how to resolve these conflicts. potentially on what priors to set on the spline parameters?

The message says “potential prior-data conflict”. In all these cases the likelihood sensitivity is much higher than prior sensitivity, which means the likelihood is more informative. The threshold for printing that message is ad hoc, and all prior sensitivities are just barely over that threshold, so that the prior sensitivity is small. Knowing your model and data, I’m not surprised that there is some prior sensitivity. priorsense is meant to be a tool that can be used to diagnose when more thought is needed, but there is no need to get rid of those diagnostic messages. Now that you did get those diagnostic messages, you know that you need to think about the priors and likelihood informativeness more carefully. The next step would be to look at what the consequences would be for any final conclusions of the research given slight variation in the prior.

Thank you @avehtari. So that next step would involve this code in the vignette?

What is your research question and final quantity of interest? Is the answer to your research question sensitive to the prior choice?

Thank you for clarifying. The research question is what are the differences in days of heroin use between each of the five levels of ats use at baseline at each of the five timepoints. How would you suggest testing whether these quantities change based on prior? I thought that was the whole point of power-scaling and the priorsense package, to give an indication if the model is sensitive to the prior, i.e.e I thought based on what I posted, that my model is senstivie to the prior

Then look at the prior and likelihood sensitivity for these distributions.

Obtain draws from these distributions, and follow the priorsense vignette on how to use priorsense for posterior draws object.

The challenge is that in the case of multivariate posterior with posterior dependencies, it is not sufficient to examine marginal posteriors, as the parameters may be individually weakly informed by likelihood while jointly strongly informed. The spline coefficients are an excellent example of this; while the prediction based on splines may be well informed by the likelihood and thus are not prior sensitive, the individual parameter marginals can be prior sensitive. We have tried to say this in the paper and the vignette, but clearly we still need to clarify more how we say it. For simple models, the parameters might have independent or low dependence posteriors, and then priorsense is useful directly for parameters. Even then, after the initial quick check with parameters, it is useful to check the sensitivity of the actual quantity of interest, For more complex models, it is more likely that we should just look at the prior sensitivity of the parameters and focus on the predictions (or differences of predictions).

I added to the Nabiximols treatment efficiency case study, code example how to use priorsense for the treatment effect posterior. It requires more code lines for computing the treatment effect posteriors and then combining that with log prior density and log likelihood. I’ll ask @n-kall if he knows if that piece of code can be simplified even a little bit.

Thank you so much. I really appreciate your continued input. I will have a look and get back to you.

Hi @avehrair. In the new section of the vignette can you please give a ‘for dummies’ explanation of this graph


specifically what in the graph allows you to arrive at the conclusion: ‘The treatment effect posteriors are not sensitive to prior and are informed by the likelihood.’

That statement specifically refers to the the table just above the plot

Sensitivity based on cjs_dist
Prior selection: all priors
Likelihood selection: all data

                     variable prior likelihood diagnosis
 nabiximols - placebo week  4 0.016      0.087         -
 nabiximols - placebo week  8 0.012      0.068         -
 nabiximols - placebo week 12 0.016      0.108         -

How to read this kind of table is discussed in the paper. The figure provides visualization, and eyeballing we see that powerscaling the prior using alpha from 0.8 to 1.25 has visually no effect on the posterior KDE, while powerscaling the likelihood does change the posterior KDE

So what we are looking for is that the different-coloured dot-bars below the distributions, and the distributions themselves, on the top row are all in about the same position. Thank you.